Skip to main content

Posts

Project Stage 3

Photo by  NASA  on  Unsplash Hello! In this post, I’ll make a list of optimization opportunities that I identified on the AWK project based on what I’ve learned in the SPO600 classes. There are two types of optimizations: portable and platform-specific. Portable optimizations are the ones that work everywhere, like better algorithms and implementations, and also compiler building flags. Platform-specific, on the other hand, works only for a targeted architecture. Like the SIMD instructions available only on Arch64 and many others specific for x86_64. It is possible to “force” the usage of such instructions according to the targeted hardware. We can do that on compilation time, and also on run-time. Now that we know our options, let’s dig in. According to my previous post , the functions nematch and readrec are the hotspots. Here is the command line used to run the awk: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "cou
Recent posts

Project Stage 2

Photo by  SpaceX  on  Unsplash Hey! Were you curious about the results of profiling AWK ? Me too! Quick recap, what is profiling, and how to do it? Profiling is a technique to map the time execution of each part of the application. We can add instrumentation to the executable, or use interruption sampling to generate that map. Here, I’ll use both. Click here for more details on profiling . For the instrumentation, we have to tell the compiler to add the tools needed to collect the execution data. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og -pg” and ran the make command. Then, I just ran the awk the same way I did to benchmark it. Here is the command line: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt This awk version, instrumented, generates a file gmon.out, which contains all execution data. This is the raw material to create a profile report using gp

SIMD - Single Instruction Multiple Data

Photo by  Vladimir Patkachakov  on  Unsplash Hi! Today’s lecture, we learned SIMD - Single Instruction Multiple Data. This is a great tool to process data in a bulk fashion. So, instead of doing one by one, based on the variable size, we can do 16, 8, 4 or 2 at the time. This technique is called auto-vectorization resources, and it falls into the category of machine instruction optimization that I mentioned in my last post. If the machine is SIMD enabled, the compiler can use it when translating a sum loop, for example. If we are summing 8 bits numbers, using SIMD, it will be 16 times faster. However, the compiler can figure that it is not safe to use SIMD due to overlapping or non-aligned data. In fact, the compiler will not apply SIMD in most cases, so we need to get our hands dirty and inject some assembly. I’ll show you how to do it in a second. Here are the lanes of the 128-bit AArch64 Advanced SIMD: 16 x 8 bits 8 x 16 bits 4 x 32 bits 2 x 64 bits 1 x 128 bits Rea

Profiling

Photo by  Jack Millard  on  Unsplash Hi! Do you want to know which part of the code is taking more time to run? Profiling is the technique to collect runtime data that shows exactly that. We did that manually in the previous labs by adding the elapsed time for the function under analysis – this is called instrumentation. The other way is to interrupt the execution multiple times, taking snapshots along the way – this is called sampling. Sampling doesn’t change the binary, but it might not get all data. Let’s say that if a task starts and finishes between the snapshots, we won’t get it in the report. On the other hand, the instrumentation will get everything, but it has to change the executable. As a result, we will not test the final version. We have to keep that in mind to use the right tool for the situation. Speaking about tools, here they are gprof and perf. The gprof does sampling and instrumentation, while perf only does sampling. To use gprof, we need to pas