Skip to main content

Benchmarking

Photo by Alex manlyx on Unsplash

Time to get the hands dirty and do some benchmarking. The goal of Lab6 is to run the different sound volume algorithms described in my last post in the five different machines and compare them.
I talked about the algorithm in the last post, so now it’s time to talk about the machines. Here they are:


AARCHIE
BBETTY
CCHARLIE
ISRAEL
XERXES
OS
Fedora 28
Fedora 31
Fedora 30
Ubuntu 19.04
Fedora 30
Architecture
aarch64
aarch64
aarch64
aarch64
x86_64
CPU(s)
24
8
8
16
8
Thread(s) per core
1
1
1
1
2
Model name
Cortex-A53
Cortex-A57
X-Gene
Cortex-A72
Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70GHz
L1d cache
32K
-
unknown
32K
32K
L1i cache
32K
-
unknown
48K
32K
L2 cache
256K
-
unknown
1024K
256K
L3 cache
4096K
-
-
-
10240K

Before running anything, we need to make sure to get the time consumed by the algorithm only. So, I’ve to change the code provided to get the initial and final dates at the right time, do the elapsed time math and display it. Here is an example.


To get more accurate data possible, I choose to run each one 100 times. I also put a delay of 5 minutes between executions. Then I set to run around 10 pm to collect the data in the next morning. With the data, I extracted the average elapsed time, along with the fastest and slowest. Here is my script to do the hard work for me.



Here are the results (numbers in milliseconds):



AARCHIE
BBETTY
CCHARLIE
ISRAEL
XERXES
Multiplication Method
Min
7571.00
933.00
1715.00
1455.00
340.00
Max
11548.00
942.00
1722.00
1456.00
394.00
Avg
7622.43
934.68
1716.26
1455.53
353.02
Lookup Table Method
Min
12732.00
1376.00
2220.00
2558.00
268.00
Max
34445.00
1390.00
2574.00
2591.00
348.00
Avg
13083.50
1379.64
2406.17
2572.67
281.33
Binary Math Method
Min
4079.00
782.00
1231.00
503.00
211.00
Max
4442.00
795.00
1237.00
505.00
254.00
Avg
4101.68
782.91
1232.35
503.02
218.30

We can see a difference between the algorithms. The binary math method is faster on all platforms. The surprise here is that the multiplication method performs better in aarch64 than in x86_64. And the lookup the opposite, performing better in the x86_64 than in the aarch64. However, we can't compare between machines due to incompatibility. See you!

Comments

Popular posts from this blog

SIMD - Single Instruction Multiple Data

Photo by  Vladimir Patkachakov  on  Unsplash Hi! Today’s lecture, we learned SIMD - Single Instruction Multiple Data. This is a great tool to process data in a bulk fashion. So, instead of doing one by one, based on the variable size, we can do 16, 8, 4 or 2 at the time. This technique is called auto-vectorization resources, and it falls into the category of machine instruction optimization that I mentioned in my last post. If the machine is SIMD enabled, the compiler can use it when translating a sum loop, for example. If we are summing 8 bits numbers, using SIMD, it will be 16 times faster. However, the compiler can figure that it is not safe to use SIMD due to overlapping or non-aligned data. In fact, the compiler will not apply SIMD in most cases, so we need to get our hands dirty and inject some assembly. I’ll show you how to do it in a second. Here are the lanes of the 128-bit AArch64 Advanced SIMD: 16 x 8 bits 8 x 16 bits 4 x 32 bits 2 x 64 bits 1 x ...

Going Faster

Photo by  Anders Jildén  on  Unsplash Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable. How does the compiler make it faster if my code is perfect? I’m going to put some methods here, but if you want, here is more detail . Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to...

Colour Selection

Photo by Scott Webb  on  Unsplash Today I'll talk about Lab 4. We had to pick two tasks out of four and develop the solution using my least favour language: assembly. Our group chose the options 2 (data input form) and 4 (screen colour selector) thinking that would be the easiest ones. The other options were adding calculator and hexdump. This post will talk about the colour selector, and my next will be the input form. The colour selector project was quite easy to do relatively. There are only 16 colours available (0 to F in hex). We have to list them in the text area and allow the selection using the cursor (up and down). Once the colour is selected, we have to paint the graph area. The graph area we did before. Basically, we have to store the colour code for every pixel in the display using the memory location between $0200 and $05FF. In the last post, we deal with up and down keys to change the numeric display. However, we never dealt with character display bef...