Skip to main content

Project Stage 3

Photo by NASA on Unsplash

Hello! In this post, I’ll make a list of optimization opportunities that I identified on the AWK project based on what I’ve learned in the SPO600 classes. There are two types of optimizations: portable and platform-specific.

Portable optimizations are the ones that work everywhere, like better algorithms and implementations, and also compiler building flags.
Platform-specific, on the other hand, works only for a targeted architecture. Like the SIMD instructions available only on Arch64 and many others specific for x86_64. It is possible to “force” the usage of such instructions according to the targeted hardware. We can do that on compilation time, and also on run-time.

Now that we know our options, let’s dig in. According to my previous post, the functions nematch and readrec are the hotspots. Here is the command line used to run the awk:

./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt

The nematch function is designed to match the regular expression informed in the command line with the characters of each line of the file bmark-data.txt. We have an opportunity here. We could change the regex algorithm for a better one. The same applies to the readrec function, which is designed to read the bmark-data.txt. If we could find faster algorithms, it would make a significant improvement. 

It seems that the awk is not multithreading. It would be great to have a flag or so to spawn multiple processes to deal with a heavy load. In my experience, there is always gain doing things in parallel. That is valid for nametch and readrec, and its parents refldbld and getrec.

All the suggestions above are portable. Also, there are some platform-specific that it would help as well.

I don’t know if it is possible to implement due to the uncertainty of the input (regex and data file). Still, I would try to implement SIMD as low-level parallelization. Instead of processing one by one, making two or four each time would improve the performance drastically. Here, I think that the refldbld and getrec are good candidates for vectorization too.

Also, we could search for applicable platform-specific optimizations that the GCC compiler is not applying, so we can rewrite the code to make it easy for the compiler.

I didn’t find any hardware-specific optimization in the awk project. That might be a clue that there is space for such improvement, not only for Arch64 but also for x86_64. I recognize that this path is hard and requires a lot of knowledge of the hardware and its instructions. Hopefully, the community can help with that.

Finally, playing with the compiler optimization flags can produce excellent results, as demonstrated in Project Stage 1. It doesn’t require changing the code, but definitely, it needs testing.

So, here is the list of my suggestions:

1 – Replace regular expressions algorithm on nametch;
2 – Replace read text file algorithm on readrec;
3 – Implement parallelism on nametch, readrec, refldbld and getrec;
4 – Restructure the code to facilitate SIMD and other hardware-specific optimizations;
5 – There are no platform-specific optimizations, so check everywhere, starting from nametch and readrec;
6 - Try out compiler optimization flags;

This is my last post. We’ve reached the end of the SPO600 course. It was a bumpy journey into assembly language, instructions, bits and bytes, but I liked it. It gave me more details on the software’s lowest level. Even being an upper-level developer (Java, C++, Typescript, PLSQL), I’ll store all knowledge acquired on SPO600 in a special place in my toolbox. Thank you, Chris, all the best.

Comments

Popular posts from this blog

Project Stage 2

Photo by  SpaceX  on  Unsplash Hey! Were you curious about the results of profiling AWK ? Me too! Quick recap, what is profiling, and how to do it? Profiling is a technique to map the time execution of each part of the application. We can add instrumentation to the executable, or use interruption sampling to generate that map. Here, I’ll use both. Click here for more details on profiling . For the instrumentation, we have to tell the compiler to add the tools needed to collect the execution data. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og -pg” and ran the make command. Then, I just ran the awk the same way I did to benchmark it. Here is the command line: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt This awk version, instrumented, generates a file gmon.out, which contains all execution data. This is the raw material to create a profile report using gp

Assembly?

Photo by  Jonas Svidras  on  Unsplash Last week on my SPO course, I had my first experience writing Assembly code. I won’t lie; it was struggling. For me, Assembly is like the Latin of the codding languages and “carpe diem” wasn’t my first lesson. Hexadecimal, binary and a list of instructions is a must know to guarantee survival. Our instructor introduced us to the 6502 processor: it is an old school chip that was used in many home solutions such as PCs and video games. Internally, it has three general-purpose registers, three special-purpose registers, memory and input and output ports. Fortunately, there are emulators on the internet that helps us to focus on the development, hiding the electronic part from us. http://6502.cdot.systems/ Using the emulator, our first task was to copy, paste and execute a piece of code to change the colour of every pixel in the display matrix. That was easy! The result was a yellow screen. Then we were asked to introduce so

x86_64 vs ARMv8

Photo by  Brian Kostiuk  on  Unsplash Things are getting interesting in the SPO 600 course. It’s time to get familiar with modern processor architectures: the x86_64, which powers all most everything today and the new ARMv8 that is gaining traction mostly because of its energy efficiency. Also, for the first time, we will “forget” assembly and focus on the compiler. So, what is the difference between x86_64 and ARMv8? Making a processor is hard and expensive, so instead, they decided to make the x86 (32bits) to work as 64bits – x86_64. That strategy popularized the 64bit environment. On the other side, the ARMv8 was designed for 64bits from the beginning, and its energy efficiency made it accessible on mobile applications. Who remembers the RISC vs CISC competition? The RISC concept tells us to execute simple operations quickly. The CISC concept is quite the opposite: complex operations will perform better than a bunch of simple ones. Who won? Well, everybody won! Nowadays, the