Final Project

We are approaching the end of the course, and as part of the curriculum, we have to pick one open source project to be our raw material for benchmarking, profiling and maybe improve it.

I decided to go with AWK, which is a great tool to manipulate files. In the early days of my career, when I had to sort and filter huge text files from the billing system, the awk was my favourite.

Check the awk's manual.

So, here is the link to the awk repo. If you are interested in my test results, check it here. See you.

Comments

Going Faster

Photo by Anders Jildén on Unsplash Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable. How does the compiler make it faster if my code is perfect? I’m going to put some methods here, but if you want, here is more detail . Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to...

SIMD - Single Instruction Multiple Data

Photo by Vladimir Patkachakov on Unsplash Hi! Today’s lecture, we learned SIMD - Single Instruction Multiple Data. This is a great tool to process data in a bulk fashion. So, instead of doing one by one, based on the variable size, we can do 16, 8, 4 or 2 at the time. This technique is called auto-vectorization resources, and it falls into the category of machine instruction optimization that I mentioned in my last post. If the machine is SIMD enabled, the compiler can use it when translating a sum loop, for example. If we are summing 8 bits numbers, using SIMD, it will be 16 times faster. However, the compiler can figure that it is not safe to use SIMD due to overlapping or non-aligned data. In fact, the compiler will not apply SIMD in most cases, so we need to get our hands dirty and inject some assembly. I’ll show you how to do it in a second. Here are the lanes of the 128-bit AArch64 Advanced SIMD: 16 x 8 bits 8 x 16 bits 4 x 32 bits 2 x 64 bits 1 x ...

Project Stage 3

Photo by NASA on Unsplash Hello! In this post, I’ll make a list of optimization opportunities that I identified on the AWK project based on what I’ve learned in the SPO600 classes. There are two types of optimizations: portable and platform-specific. Portable optimizations are the ones that work everywhere, like better algorithms and implementations, and also compiler building flags. Platform-specific, on the other hand, works only for a targeted architecture. Like the SIMD instructions available only on Arch64 and many others specific for x86_64. It is possible to “force” the usage of such instructions according to the targeted hardware. We can do that on compilation time, and also on run-time. Now that we know our options, let’s dig in. According to my previous post , the functions nematch and readrec are the hotspots. Here is the command line used to run the awk: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "cou...

Rodrigo

Search This Blog

Final Project

Labels

Comments

Post a Comment

Popular posts from this blog

Going Faster

SIMD - Single Instruction Multiple Data

Project Stage 3