Skip to main content

Profiling

Photo by Jack Millard on Unsplash

Hi! Do you want to know which part of the code is taking more time to run? Profiling is the technique to collect runtime data that shows exactly that. We did that manually in the previous labs by adding the elapsed time for the function under analysis – this is called instrumentation. The other way is to interrupt the execution multiple times, taking snapshots along the way – this is called sampling.
Sampling doesn’t change the binary, but it might not get all data. Let’s say that if a task starts and finishes between the snapshots, we won’t get it in the report. On the other hand, the instrumentation will get everything, but it has to change the executable. As a result, we will not test the final version. We have to keep that in mind to use the right tool for the situation.

Speaking about tools, here they are gprof and perf.

The gprof does sampling and instrumentation, while perf only does sampling.

To use gprof, we need to pass -pg to the compiler, so it will add the necessary tools to collect our runtime data. Below are the steps to add the instrumentation into the executable, run it and see the profiling report.

# Generate the binary with instrumentation
> ./configure CFLAGS=”-g -Og -pg”
> make

# Run and produce the gmon.out
> ./gzip </tmp/services1000 >/dev/null

# Text report
> gprof ./gzip | less

# Graphical report
> gprof ./gzip | gprof2dot | dot -T x11
> gprof ./gzip | gprof2dot | dot -T png -o profile.png

To use perf, we don’t need the -pg parameter. So, we can use the original executable. Here are the steps to use it.

# Record the execution
> perf record ./gzip </tmp/services1000 >/dev/null

# Text report
> perf report | less

# Interactive mode
> perf report

Now we have the right tools to profile our final project, which, in my case, is the awk. I was about to start studying the awk source code to add instrumentation. These tools will save me days! Thanks for reading and see you.

Comments

Popular posts from this blog

Going Faster

Photo by  Anders Jildén  on  Unsplash Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable. How does the compiler make it faster if my code is perfect? I’m going to put some methods here, but if you want, here is more detail . Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to...

Colour Selection

Photo by Scott Webb  on  Unsplash Today I'll talk about Lab 4. We had to pick two tasks out of four and develop the solution using my least favour language: assembly. Our group chose the options 2 (data input form) and 4 (screen colour selector) thinking that would be the easiest ones. The other options were adding calculator and hexdump. This post will talk about the colour selector, and my next will be the input form. The colour selector project was quite easy to do relatively. There are only 16 colours available (0 to F in hex). We have to list them in the text area and allow the selection using the cursor (up and down). Once the colour is selected, we have to paint the graph area. The graph area we did before. Basically, we have to store the colour code for every pixel in the display using the memory location between $0200 and $05FF. In the last post, we deal with up and down keys to change the numeric display. However, we never dealt with character display bef...

Project Stage 3

Photo by  NASA  on  Unsplash Hello! In this post, I’ll make a list of optimization opportunities that I identified on the AWK project based on what I’ve learned in the SPO600 classes. There are two types of optimizations: portable and platform-specific. Portable optimizations are the ones that work everywhere, like better algorithms and implementations, and also compiler building flags. Platform-specific, on the other hand, works only for a targeted architecture. Like the SIMD instructions available only on Arch64 and many others specific for x86_64. It is possible to “force” the usage of such instructions according to the targeted hardware. We can do that on compilation time, and also on run-time. Now that we know our options, let’s dig in. According to my previous post , the functions nematch and readrec are the hotspots. Here is the command line used to run the awk: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "cou...