Skip to main content

Going Faster

Photo by Anders Jildén on Unsplash

Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable.

How does the compiler make it faster if my code is perfect?

I’m going to put some methods here, but if you want, here is more detail. Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to illustrate such modifications.

Strength Reduction: replace slow operations to faster ones.

Before
After
int x;
for (x=0; x < 10; x++) {
    printf("%d\n", x*6);
}
int x;
for (x=0; x < 60; x+=6) {
    printf("%d", x);
}

Instead of multiply each time, it is faster to keep adding the factor to the accumulator.

Hoisting: move operations outside the loop

Before
After
int t, x;
double c;
t = readtemp();
for (x = 0; x < 200; x++) {
    c = (t-32)/1.8000 + 273.15;
    foo(x,c);
}
int t, x;
double c;
t = readtemp();
c = (t-32)/1.8000 + 273.15;
for (x = 0; x < 200; x++) {
    foo(x,c);
}

Instead of performing the calculation “n” times inside the loop, it can be done once outside. This works because “c” is constant for the entire loop’s life. The optimizer will find those constants and move them outside no matter where they are (variable, expressions, loop condition, etc.).

Loop Unswitching: swap loop-condition to condition-loop

Before
After
int foo(float ctl) {
    int x;
    for (x=0; x < 10000; x++) {
        if (ctl == 0) {
            bar(x);
        } else {
            qux(x);
        }
    }
}
int foo(float ctl) {
    int x;
    if (ctl == 0) {
        for (x=0; x < 10000; x++) {
            bar(x);
        }
    } else {
        for (x=0; x < 10000; x++) {
            qux(x);
        }
    }
}

Instead of evaluating the condition “n” times inside the loop, it is faster to do it once and then run the appropriated loop. Please note that the condition doesn’t change inside the loop’s life, and the executable might be bigger.

And there are more! The point here is to write the code the way we can better understand and let the compiler rewrite and optimize it. Additionally, the compiler can use machine-specific instructions or rearrange the code blocks to improve the performance. If you want to see how those parameters affect the performance, check this out. See you in the next post.

Comments

Popular posts from this blog

Project Stage 2

Photo by  SpaceX  on  Unsplash Hey! Were you curious about the results of profiling AWK ? Me too! Quick recap, what is profiling, and how to do it? Profiling is a technique to map the time execution of each part of the application. We can add instrumentation to the executable, or use interruption sampling to generate that map. Here, I’ll use both. Click here for more details on profiling . For the instrumentation, we have to tell the compiler to add the tools needed to collect the execution data. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og -pg” and ran the make command. Then, I just ran the awk the same way I did to benchmark it. Here is the command line: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt This awk version, instrumented, generates a file gmon.out, which contains all execution data. This is the raw material to create a profile report using gp

Assembly?

Photo by  Jonas Svidras  on  Unsplash Last week on my SPO course, I had my first experience writing Assembly code. I won’t lie; it was struggling. For me, Assembly is like the Latin of the codding languages and “carpe diem” wasn’t my first lesson. Hexadecimal, binary and a list of instructions is a must know to guarantee survival. Our instructor introduced us to the 6502 processor: it is an old school chip that was used in many home solutions such as PCs and video games. Internally, it has three general-purpose registers, three special-purpose registers, memory and input and output ports. Fortunately, there are emulators on the internet that helps us to focus on the development, hiding the electronic part from us. http://6502.cdot.systems/ Using the emulator, our first task was to copy, paste and execute a piece of code to change the colour of every pixel in the display matrix. That was easy! The result was a yellow screen. Then we were asked to introduce so

x86_64 vs ARMv8

Photo by  Brian Kostiuk  on  Unsplash Things are getting interesting in the SPO 600 course. It’s time to get familiar with modern processor architectures: the x86_64, which powers all most everything today and the new ARMv8 that is gaining traction mostly because of its energy efficiency. Also, for the first time, we will “forget” assembly and focus on the compiler. So, what is the difference between x86_64 and ARMv8? Making a processor is hard and expensive, so instead, they decided to make the x86 (32bits) to work as 64bits – x86_64. That strategy popularized the 64bit environment. On the other side, the ARMv8 was designed for 64bits from the beginning, and its energy efficiency made it accessible on mobile applications. Who remembers the RISC vs CISC competition? The RISC concept tells us to execute simple operations quickly. The CISC concept is quite the opposite: complex operations will perform better than a bunch of simple ones. Who won? Well, everybody won! Nowadays, the