x86_64 vs ARMv8

Things are getting interesting in the SPO 600 course. It’s time to get familiar with modern processor architectures: the x86_64, which powers all most everything today and the new ARMv8 that is gaining traction mostly because of its energy efficiency. Also, for the first time, we will “forget” assembly and focus on the compiler.

So, what is the difference between x86_64 and ARMv8?

Making a processor is hard and expensive, so instead, they decided to make the x86 (32bits) to work as 64bits – x86_64. That strategy popularized the 64bit environment. On the other side, the ARMv8 was designed for 64bits from the beginning, and its energy efficiency made it accessible on mobile applications. Who remembers the RISC vs CISC competition? The RISC concept tells us to execute simple operations quickly. The CISC concept is quite the opposite: complex operations will perform better than a bunch of simple ones. Who won? Well, everybody won! Nowadays, the RISC introduced some CISC features and vice-versa and we have faster CPUs. Lastly, they all differ on registers, instructions, cache and many more.

Luckily, for the ones that don’t want to go assembly - like me - the compiler will know how to produce the executable for each architecture. However, we might end up with a not so well performed product as we were making it in assembly. To help us with that, there are compiler options (-static | -g | -fno-builtin | -O0 | -O3) that we can play around that could improve the overall performance.

We know that compilation is a painful process and can take a lot of time. To address this, the “make” utility will configure and compile our package. It is smart enough to know the proper execution sequence and what doesn’t need to be recompiled – due to no changes, for example.

My next post will talk more about the compiler. See you.

Comments

Going Faster

Photo by Anders Jildén on Unsplash Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable. How does the compiler make it faster if my code is perfect? I’m going to put some methods here, but if you want, here is more detail . Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to...

SIMD - Single Instruction Multiple Data

Photo by Vladimir Patkachakov on Unsplash Hi! Today’s lecture, we learned SIMD - Single Instruction Multiple Data. This is a great tool to process data in a bulk fashion. So, instead of doing one by one, based on the variable size, we can do 16, 8, 4 or 2 at the time. This technique is called auto-vectorization resources, and it falls into the category of machine instruction optimization that I mentioned in my last post. If the machine is SIMD enabled, the compiler can use it when translating a sum loop, for example. If we are summing 8 bits numbers, using SIMD, it will be 16 times faster. However, the compiler can figure that it is not safe to use SIMD due to overlapping or non-aligned data. In fact, the compiler will not apply SIMD in most cases, so we need to get our hands dirty and inject some assembly. I’ll show you how to do it in a second. Here are the lanes of the 128-bit AArch64 Advanced SIMD: 16 x 8 bits 8 x 16 bits 4 x 32 bits 2 x 64 bits 1 x ...

Project Stage 3

Photo by NASA on Unsplash Hello! In this post, I’ll make a list of optimization opportunities that I identified on the AWK project based on what I’ve learned in the SPO600 classes. There are two types of optimizations: portable and platform-specific. Portable optimizations are the ones that work everywhere, like better algorithms and implementations, and also compiler building flags. Platform-specific, on the other hand, works only for a targeted architecture. Like the SIMD instructions available only on Arch64 and many others specific for x86_64. It is possible to “force” the usage of such instructions according to the targeted hardware. We can do that on compilation time, and also on run-time. Now that we know our options, let’s dig in. According to my previous post , the functions nematch and readrec are the hotspots. Here is the command line used to run the awk: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "cou...

Rodrigo

Search This Blog

x86_64 vs ARMv8

Labels

Comments

Post a Comment

Popular posts from this blog

Going Faster

SIMD - Single Instruction Multiple Data

Project Stage 3