Skip to main content

Benchmarking

Photo by Alex manlyx on Unsplash

Time to get the hands dirty and do some benchmarking. The goal of Lab6 is to run the different sound volume algorithms described in my last post in the five different machines and compare them.
I talked about the algorithm in the last post, so now it’s time to talk about the machines. Here they are:


AARCHIE
BBETTY
CCHARLIE
ISRAEL
XERXES
OS
Fedora 28
Fedora 31
Fedora 30
Ubuntu 19.04
Fedora 30
Architecture
aarch64
aarch64
aarch64
aarch64
x86_64
CPU(s)
24
8
8
16
8
Thread(s) per core
1
1
1
1
2
Model name
Cortex-A53
Cortex-A57
X-Gene
Cortex-A72
Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70GHz
L1d cache
32K
-
unknown
32K
32K
L1i cache
32K
-
unknown
48K
32K
L2 cache
256K
-
unknown
1024K
256K
L3 cache
4096K
-
-
-
10240K

Before running anything, we need to make sure to get the time consumed by the algorithm only. So, I’ve to change the code provided to get the initial and final dates at the right time, do the elapsed time math and display it. Here is an example.

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <time.h>
#include "vol.h"
// Function to scale a sound sample using a volume_factor
// in the range of 0.00 to 1.00.
static inline int16_t scale_sample(int16_t sample, float volume_factor) {
return (int16_t) (volume_factor * (float) sample);
}
// ADDED FUNCTION
long timediff(clock_t t1, clock_t t2) {
long elapsed;
elapsed = ((double)t2 - t1) / CLOCKS_PER_SEC * 1000;
return elapsed;
}
int main() {
clock_t t1, t2; // ADDED VARIABLES
long elapsed;
// Allocate memory for large data array
int16_t* data;
data = (int16_t*) calloc(SAMPLES, sizeof(int16_t));
int x;
int ttl = 0;
// Seed the pseudo-random number generator
srand(1);
// Fill the array with random data
for (x = 0; x < SAMPLES; x++) {
data[x] = (rand()%65536)-32768;
}
// ######################################
// This is the interesting part!
// Scale the volume of all of the samples
t1 = clock(); // ADDED initial date
for (x = 0; x < SAMPLES; x++) {
data[x] = scale_sample(data[x], 0.75);
}
t2 = clock(); // ADDED final date
elapsed = timediff(t1, t2); // ADDED elapsed time
printf("%ld ", elapsed); // ADDED print time
// ######################################
// Sum up the data
for (x = 0; x < SAMPLES; x++) {
ttl = (ttl+data[x])%1000;
}
// Print the sum
printf("Result: %d\n", ttl);
return 0;
}
view raw vol1.c hosted with ❤ by GitHub

To get more accurate data possible, I choose to run each one 100 times. I also put a delay of 5 minutes between executions. Then I set to run around 10 pm to collect the data in the next morning. With the data, I extracted the average elapsed time, along with the fastest and slowest. Here is my script to do the hard work for me.

#!/bin/bash
QTY=100
if [[ ! -z $1 ]]
then
FILE_EXE="$1_exe.txt"
FILE_RPT="$1_rpt.txt"
echo "Testing: $1"
echo "Executions: $FILE_EXE"
echo "Report: $FILE_RPT"
> $FILE_EXE
for ((n=0;n<$QTY;n++))
do
echo "$(date +"%Y-%m-%d %H:%M:%S,%3N") - INI: $n"
./$1 >> $FILE_EXE
echo "$(date +"%Y-%m-%d %H:%M:%S,%3N") - FIN: $n"
echo ''
sleep 5m
done
> $FILE_RPT
awk '{
if (min == "") { min = max = $1 };
if ($1 > max) { max = $1 };
if ($1 < min) { min = $1 };
sum += $1
} END {
print "Executions: " NR "\nMin Time : " min "\nMax Time : " max "\nAvg time : " sum/NR
}' $FILE_EXE > $FILE_RPT
else
echo "Please inform the program to be tested"
fi
view raw tester.sh hosted with ❤ by GitHub


Here are the results (numbers in milliseconds):



AARCHIE
BBETTY
CCHARLIE
ISRAEL
XERXES
Multiplication Method
Min
7571.00
933.00
1715.00
1455.00
340.00
Max
11548.00
942.00
1722.00
1456.00
394.00
Avg
7622.43
934.68
1716.26
1455.53
353.02
Lookup Table Method
Min
12732.00
1376.00
2220.00
2558.00
268.00
Max
34445.00
1390.00
2574.00
2591.00
348.00
Avg
13083.50
1379.64
2406.17
2572.67
281.33
Binary Math Method
Min
4079.00
782.00
1231.00
503.00
211.00
Max
4442.00
795.00
1237.00
505.00
254.00
Avg
4101.68
782.91
1232.35
503.02
218.30

We can see a difference between the algorithms. The binary math method is faster on all platforms. The surprise here is that the multiplication method performs better in aarch64 than in x86_64. And the lookup the opposite, performing better in the x86_64 than in the aarch64. However, we can't compare between machines due to incompatibility. See you!

Comments

Popular posts from this blog

Going Faster

Photo by  Anders Jildén  on  Unsplash Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable. How does the compiler make it faster if my code is perfect? I’m going to put some methods here, but if you want, here is more detail . Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to...

Data Input Form

Photo by  Marvin Meyer  on  Unsplash Continuing the Lab 4, we are going to develop the option 2, data input form. The goal is to prompt the user to enter its name, address, city, province and postal code. Also, letting the user use up, down, left, and right arrows to navigate throughout the fields. After finishing the data input, a summary is presented at the end. Using the ROM routines, wasn’t too hard to allow users to type data into the character display. Then, I decided to make the filed names with the same width, 14 characters, limiting the input to 40 characters. So, the user is not allowed to type in the first 14 and after 54 characters. When the user presses enter at the last field, the summary is shown. I could display the fixed message, but I couldn’t copy the inserted data. I’m still working on that, and I’ll update this post as soon I figure it out. It is frustrating for me to spend days in basic problems that could be solved quickly using other langu...

Two-digit Numeric Display - Final

Photo by  Nick Hillier  on  Unsplash In this post, I’ll continue the two-digit numeric display. If you miss it, click here and check it out . To finish this project, we just need to show the numbers in the matrix-pixel (the black-box in the 6502 emulator ). To kickstart, our instructor gave us one example of how to display graphs, which was a lot helpful. The first thing that I’ve noticed was the bitmap table at the bottom. So, I mimic it and made ten tables like that to represent each number (zero to nine). So far, so good! Then I grabbed the logic to display one digit, and then my nightmares just started. How to place two graphs (one for each digit)? How to switch from one number to another? How to reuse code? Where is my coffee?! To emulate some if-elseif-else statements, I used jmp (jump). They are all over the place! However, the 6502 limits the jump range from -127 to 128. That means moving the code-blocks to satisfy all jumps limit. For e...