Skip to main content

Performance Tuning Hero

Photo by Ayo Ogunseinde on Unsplash
Week 8! I’m Rodrigo, and this is my blog about my SPO 600 course. I’ve posted since January, and so far, I didn’t tell you what is SPO, right? It is Software Portability and Optimization. Today we will approach the Optimization part differently. Instead of squeezing the compiler, we will care about how the software is working.

I have a pretty good experience in performance and tuning in Oracle database, PL/SQL and SQL. I can say that, by far, the significant gain in execution time lies in how the software is designed. The same steps that I’ve used, we are going to use in the course.

First, do not touch the code without knowing how bad it is. Benchmarking is a must. Before, during and after, these metrics will guide our work and justify hours of analysis and development. It must be done right like a methodic scientist collecting vital data and not rushing. The more depth of info you get, the easiest will be the next steps.
Second, target the right piece of the software. Don’t waste time analyzing tasks that perform well. Instead, aim the ones that take more time, consume more CPU or memory.
Third, experiment changes and compare the execution time with the first step. It will guide us like a trial and test.
Lastly, implement the solution and compare the results of the first step.

We can use simple tools to collect data, for example, using a bash script or the time binary. There are commercial tools out there that collect much more data. Sometimes, too much data is overwhelming. Try to keep it simple at first and get more if needed.

So, we find the bad guy. Now what? Try to use a completely different approach. I like the sound volume example explained by our professor. Changing the music’s volume is not as trivial as I thought. Digital audio takes the wave sound and translate it as tiny dots equally spaced throughout the wave. This process is called sampling. The Y coordinate is the position of the number in the list, and the number represents the X. Then, we end up with a massive list of values—all of them at the volume 1.000 (the maximum). Our volume ranges from 0.000 to 1.000. If we need to keep it down not to disturb the neighbours, we need to multiply every number by the volume factor. Easy right? There is no other way?

I caught myself amazed that our professor listed FOUR ways to do it. The other one is the lookup table. Using a 16bit sampling, we can get only 65,536 values. What if we create a table with the volume, the original value and the calculated one? In this case, instead of doing math, we simply query the value. This approach takes more memory, though. 
The third method will do the math, but it will do in binary and using 32bits. This will avoid the float-point conversion used in the first method.
The fourth is called fixed-point math. It will use SIMD instructions and do the math in parallel. By the way, parallelism is a valid tool to deal with performance issues and is relatively easy to implement.

To conclude, being creative and think out-of-the-box is an excellent skill for tuning applications. Also, knowing in-depth, the business and the program tend to produce better results. I like the feeling of being the one that defeated the bad guy that slows down our lovely software. Don't you?

See you.

Comments

Popular posts from this blog

Two-digit Numeric Display - Final

Photo by  Nick Hillier  on  Unsplash In this post, I’ll continue the two-digit numeric display. If you miss it, click here and check it out . To finish this project, we just need to show the numbers in the matrix-pixel (the black-box in the 6502 emulator ). To kickstart, our instructor gave us one example of how to display graphs, which was a lot helpful. The first thing that I’ve noticed was the bitmap table at the bottom. So, I mimic it and made ten tables like that to represent each number (zero to nine). So far, so good! Then I grabbed the logic to display one digit, and then my nightmares just started. How to place two graphs (one for each digit)? How to switch from one number to another? How to reuse code? Where is my coffee?! To emulate some if-elseif-else statements, I used jmp (jump). They are all over the place! However, the 6502 limits the jump range from -127 to 128. That means moving the code-blocks to satisfy all jumps limit. For e...

Project Stage 1

Photo by  SpaceX  on  Unsplash Hello! This is my SPO 600 blog, and this post will be long – sorry. The goal is to pick one project that is CPU intensive, written in C or C++, and experiment different compiler options and present the results. That’s why it will be long – lots of data to show. I choose the AWK project ( https://github.com/onetrueawk/awk ). It is a handy tool to process files. Parse, sort, and filter are some trivial operations that are CPU intensive. To make it harder, I created a huge XML file to parse it and count the tags. I've described the machines in my last post, if you miss it, here it is . I also created a script to run and collect the data. I planned to run each candidate 10 times, but a few attempts didn’t receive any data. So, I decided to nest the loop in a way that even if someone kills my process, the data could be used. Guess what? It happened! To produce the candidates, I just changed the CFLAGS inside the makefile and ran the...

Assembly?

Photo by  Jonas Svidras  on  Unsplash Last week on my SPO course, I had my first experience writing Assembly code. I won’t lie; it was struggling. For me, Assembly is like the Latin of the codding languages and “carpe diem” wasn’t my first lesson. Hexadecimal, binary and a list of instructions is a must know to guarantee survival. Our instructor introduced us to the 6502 processor: it is an old school chip that was used in many home solutions such as PCs and video games. Internally, it has three general-purpose registers, three special-purpose registers, memory and input and output ports. Fortunately, there are emulators on the internet that helps us to focus on the development, hiding the electronic part from us. http://6502.cdot.systems/ Using the emulator, our first task was to copy, paste and execute a piece of code to change the colour of every pixel in the display matrix. That was easy! The result was a yellow screen. Then we were asked t...