Skip to main content

Project Stage 2

Photo by SpaceX on Unsplash

Hey! Were you curious about the results of profiling AWK? Me too! Quick recap, what is profiling, and how to do it? Profiling is a technique to map the time execution of each part of the application. We can add instrumentation to the executable, or use interruption sampling to generate that map. Here, I’ll use both. Click here for more details on profiling.

For the instrumentation, we have to tell the compiler to add the tools needed to collect the execution data. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og -pg” and ran the make command. Then, I just ran the awk the same way I did to benchmark it. Here is the command line:

./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt

This awk version, instrumented, generates a file gmon.out, which contains all execution data. This is the raw material to create a profile report using gprof. I did this on all class machines, and they produced somewhat the same results. I have the text and the graph report to show you, gprof.png and gprof.txt below.

For the sampling, we don’t need to add anything to the executable. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og” and ran the make command. No -pg param. However, to collect the execution data, we need to execute a different command:

perf record ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt

This command generated a file called perf.data with our execution data to make the profiling report. This tool doesn’t create a visual graph, only creates a textual and interactive report. Unfortunately, it is not possible to show the interactive one, but the text is. It is the perf.txt below.

Analyzing the reports, I found the hot spots. Those are executed many times and spend the most execution time. There are two strings:

1 – execute -> fldbld  -> refldbld -> nematch
2 – execute -> program -> getrec   -> readrec

Those are the names of the functions inside the source code. I locate them, and now it is time to think about how to improve them. But this will be on my next posts. Say tuned! See you.

GPROF.png



GPROF.txt

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total          
 time   seconds   seconds    calls   s/call   s/call  name   
 31.45     43.67    43.67 240000000     0.00     0.00  nematch
 12.10     60.47    16.80 30000001     0.00     0.00  readrec
 11.51     76.44    15.98 30000000     0.00     0.00  refldbld
  7.81     87.29    10.84 240000008     0.00     0.00  execute
  5.44     94.84     7.55 30000000     0.00     0.00  fldbld
  5.41    102.35     7.51 150000001     0.00     0.00  setfval
  5.02    109.32     6.97 270000030     0.00     0.00  is_number
  3.16    113.71     4.39 120000009     0.00     0.00  get_str_val
  2.70    117.46     3.76 30000001     0.00     0.00  getrec
  1.29    119.26     1.80 30000000     0.00     0.00  relop
  1.22    120.96     1.70 30000000     0.00     0.00  makedfa
  1.15    122.55     1.59 30000000     0.00     0.00  pastat
  1.07    124.04     1.49 30000001     0.00     0.00  tfree
  0.98    125.40     1.36 120000008     0.00     0.00  getsval
  0.97    126.75     1.35 60000000     0.00     0.00  getfval
  0.88    127.96     1.22 30000000     0.00     0.00  incrdecr
  0.86    129.16     1.20 30000001     0.00     0.00  gettemp
  0.79    130.26     1.10 30000000     0.00     0.00  setlastfld
  0.72    131.26     1.00 30000000     0.00     0.00  indirect
  0.66    132.18     0.92 30000000     0.00     0.00  ifstat
  0.64    133.07     0.89 30000000     0.00     0.00  fieldadr
  0.55    133.83     0.76 60000000     0.00     0.00  cleanfld
  0.52    134.55     0.72 30000019     0.00     0.00  adjbuf
  0.50    135.24     0.69        2     0.35     0.35  isclvar
  0.49    135.92     0.68        1     0.68    32.35  program
  0.45    136.54     0.62 30000001     0.00     0.00  savefs
  0.42    137.13     0.59                             newfld
  0.28    137.52     0.39                             funnyvar
  0.21    137.81     0.29        1     0.29     0.29  cat
  0.14    138.00     0.19        2     0.10     0.10  getargv
  0.12    138.16     0.16                             matchop
  0.11    138.31     0.15        2     0.08     0.08  setsval
  0.11    138.46     0.15                             pmatch
  0.06    138.55     0.09        1     0.09     0.60  initgetrec
  0.05    138.62     0.07                             condexpr
  0.05    138.69     0.07                             nawk_tolower
  0.03    138.73     0.05                             arith
  0.03    138.78     0.05                             getnf
  0.03    138.82     0.04        1     0.04     0.04  recinit
  0.01    138.84     0.02                             freefa
  0.01    138.85     0.02                             boolop
  0.00    138.85     0.00      125     0.00     0.00  hash
  0.00    138.85     0.00      125     0.00     0.00  tostring
  0.00    138.85     0.00       89     0.00     0.00  input
  0.00    138.85     0.00       63     0.00     0.00  ptoi
  0.00    138.85     0.00       61     0.00     0.00  lookup
  0.00    138.85     0.00       58     0.00     0.00  resize_state
  0.00    138.85     0.00       58     0.00     0.00  setsymtab
  0.00    138.85     0.00       41     0.00     0.00  gettok
  0.00    138.85     0.00       30     0.00     0.00  yylex
  0.00    138.85     0.00       28     0.00     0.00  cgoto
  0.00    138.85     0.00       27     0.00     0.00  nodealloc
  0.00    138.85     0.00       16     0.00     0.00  unput
  0.00    138.85     0.00       14     0.00     0.00  node2
  0.00    138.85     0.00       13     0.00     0.00  op2
  0.00    138.85     0.00       10     0.00     0.00  node1
  0.00    138.85     0.00        9     0.00     0.00  intalloc
  0.00    138.85     0.00        8     0.00     0.00  celltonode
  0.00    138.85     0.00        7     0.00     0.00  binsearch
  0.00    138.85     0.00        7     0.00     0.00  linkum
  0.00    138.85     0.00        7     0.00     0.00  word
  0.00    138.85     0.00        6     0.00     0.00  peek
  0.00    138.85     0.00        6     0.00     0.00  relex
  0.00    138.85     0.00        5     0.00     0.00  first
  0.00    138.85     0.00        5     0.00     0.00  follow
  0.00    138.85     0.00        3     0.00     0.00  concat
  0.00    138.85     0.00        3     0.00     0.00  itonp
  0.00    138.85     0.00        3     0.00     0.00  makesymtab
  0.00    138.85     0.00        3     0.00     0.00  node3
  0.00    138.85     0.00        3     0.00     0.00  primary
  0.00    138.85     0.00        3     0.00     0.00  stat3
  0.00    138.85     0.00        3     0.00     0.00  string
  0.00    138.85     0.00        3     0.00     0.00  unary
  0.00    138.85     0.00        3     0.00     0.00  unputstr
  0.00    138.85     0.00        2     0.00     0.00  exptostat
  0.00    138.85     0.00        2     0.00     0.00  makefields
  0.00    138.85     0.00        2     0.00     0.00  op1
  0.00    138.85     0.00        2     0.00     0.00  rehash
  0.00    138.85     0.00        2     0.00     0.00  setfree
  0.00    138.85     0.00        1     0.00     0.00  alt
  0.00    138.85     0.00        1     0.00     0.00  arginit
  0.00    138.85     0.00        1     0.00     0.08  assign
  0.00    138.85     0.00        1     0.00     0.00  cfoll
  0.00    138.85     0.00        1     0.00     0.00  closeall
  0.00    138.85     0.00        1     0.00     0.00  envinit
  0.00    138.85     0.00        1     0.00     0.00  freetr
  0.00    138.85     0.00        1     0.00     0.00  getpssval
  0.00    138.85     0.00        1     0.00     0.00  growfldtab
  0.00    138.85     0.00        1     0.00     0.00  makeinit
  0.00    138.85     0.00        1     0.00     0.00  mkdfa
  0.00    138.85     0.00        1     0.00     0.00  notnull
  0.00    138.85     0.00        1     0.00     0.00  penter
  0.00    138.85     0.00        1     0.00     0.00  printstat
  0.00    138.85     0.00        1     0.00     0.00  regexp
  0.00    138.85     0.00        1     0.00     0.00  reparse
  0.00    138.85     0.00        1     0.00     0.00  resizesetvec
  0.00    138.85     0.00        1     0.00   137.26  run
  0.00    138.85     0.00        1     0.00     0.00  stat2
  0.00    138.85     0.00        1     0.00     0.00  stdinit
  0.00    138.85     0.00        1     0.00     0.00  syminit
  0.00    138.85     0.00        1     0.00     0.00  yyparse

PERF.txt

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 972K of event 'cycles:uppp'
# Event count (approx.): 236100649774
#
# Overhead  Command  Shared Object     Symbol                           
# ........  .......  ................  ..................................
#
    17.46%  awkPerf  awkPerf           [.] nematch
    16.84%  awkPerf  libc-2.27.so      [.] __GI_____strtod_l_internal
     8.52%  awkPerf  libc-2.27.so      [.] _IO_getc
     7.58%  awkPerf  awkPerf           [.] readrec
     5.78%  awkPerf  awkPerf           [.] refldbld
     4.30%  awkPerf  awkPerf           [.] execute
     3.78%  awkPerf  libc-2.27.so      [.] str_to_mpn.isra.0
     3.63%  awkPerf  libc-2.27.so      [.] __GI_strlen
     3.19%  awkPerf  awkPerf           [.] fldbld
     2.89%  awkPerf  awkPerf           [.] setfval
     2.89%  awkPerf  awkPerf           [.] is_number
     2.10%  awkPerf  libc-2.27.so      [.] __GI___strnlen
     1.77%  awkPerf  awkPerf           [.] getrec
     1.71%  awkPerf  libc-2.27.so      [.] strncpy
     1.64%  awkPerf  libc-2.27.so      [.] round_and_return
     1.57%  awkPerf  awkPerf           [.] get_str_val
     1.53%  awkPerf  libc-2.27.so      [.] __memcpy_generic
     1.06%  awkPerf  awkPerf           [.] _IO_getc@plt
     0.93%  awkPerf  libc-2.27.so      [.] strtod
     0.84%  awkPerf  libc-2.27.so      [.] strcmp
     0.73%  awkPerf  libc-2.27.so      [.] __errno_location
     0.64%  awkPerf  awkPerf           [.] makedfa
     0.60%  awkPerf  awkPerf           [.] incrdecr
     0.60%  awkPerf  awkPerf           [.] relop
     0.60%  awkPerf  awkPerf           [.] getfval
     0.53%  awkPerf  awkPerf           [.] ifstat
     0.52%  awkPerf  awkPerf           [.] gettemp
     0.49%  awkPerf  libc-2.27.so      [.] strcpy
     0.42%  awkPerf  awkPerf           [.] strtod@plt
     0.42%  awkPerf  awkPerf           [.] pastat
     0.39%  awkPerf  libc-2.27.so      [.] __mpn_construct_double
     0.39%  awkPerf  awkPerf           [.] cleanfld
     0.38%  awkPerf  awkPerf           [.] getsval
     0.37%  awkPerf  libc-2.27.so      [.] __mpn_lshift
     0.37%  awkPerf  awkPerf           [.] savefs
     0.35%  awkPerf  awkPerf           [.] setlastfld
     0.31%  awkPerf  awkPerf           [.] indirect
     0.31%  awkPerf  awkPerf           [.] program
     0.30%  awkPerf  awkPerf           [.] fieldadr
     0.28%  awkPerf  awkPerf           [.] adjbuf
     0.25%  awkPerf  awkPerf           [.] tfree
     0.23%  awkPerf  awkPerf           [.] __errno_location@plt
     0.17%  awkPerf  awkPerf           [.] strncpy@plt
     0.15%  awkPerf  awkPerf           [.] strlen@plt
     0.08%  awkPerf  awkPerf           [.] strcmp@plt
     0.07%  awkPerf  awkPerf           [.] strcpy@plt
     0.01%  awkPerf  libc-2.27.so      [.] _IO_file_underflow@@GLIBC_2.17
     0.01%  awkPerf  [unknown]         [k] 0xffff000010096654
     0.01%  awkPerf  libc-2.27.so      [.] __uflow
     0.01%  awkPerf  libc-2.27.so      [.] _IO_default_uflow
     0.01%  awkPerf  libc-2.27.so      [.] _IO_switch_to_get_mode
     0.01%  awkPerf  libc-2.27.so      [.] read
     0.00%  awkPerf  libc-2.27.so      [.] _IO_file_read
     0.00%  awkPerf  libc-2.27.so      [.] read_alias_file
     0.00%  awkPerf  libc-2.27.so      [.] _IO_vsnprintf
     0.00%  awkPerf  ld-2.27.so        [.] _dl_runtime_resolve
     0.00%  awkPerf  libc-2.27.so      [.] __strchrnul
     0.00%  awkPerf  libc-2.27.so      [.] _dl_addr
     0.00%  awkPerf  ld-2.27.so        [.] do_lookup_x
     0.00%  awkPerf  ld-2.27.so        [.] strcmp
     0.00%  awkPerf  ld-2.27.so        [.] _dl_relocate_object
     0.00%  awkPerf  ld-2.27.so        [.] _dl_check_all_versions
     0.00%  awkPerf  ld-2.27.so        [.] read
     0.00%  awkPerf  ld-2.27.so        [.] dl_main
     0.00%  awkPerf  ld-2.27.so        [.] __GI___tunables_init
     0.00%  awkPerf  ld-2.27.so        [.] _dl_start
     0.00%  awkPerf  ld-2.27.so        [.] _start
     0.00%  awkPerf  [unknown]         [k] 0xffff000010081544


#
# (Cannot load tips.txt file, please install perf!)
#

Comments

Popular posts from this blog

SIMD - Single Instruction Multiple Data

Photo by  Vladimir Patkachakov  on  Unsplash Hi! Today’s lecture, we learned SIMD - Single Instruction Multiple Data. This is a great tool to process data in a bulk fashion. So, instead of doing one by one, based on the variable size, we can do 16, 8, 4 or 2 at the time. This technique is called auto-vectorization resources, and it falls into the category of machine instruction optimization that I mentioned in my last post. If the machine is SIMD enabled, the compiler can use it when translating a sum loop, for example. If we are summing 8 bits numbers, using SIMD, it will be 16 times faster. However, the compiler can figure that it is not safe to use SIMD due to overlapping or non-aligned data. In fact, the compiler will not apply SIMD in most cases, so we need to get our hands dirty and inject some assembly. I’ll show you how to do it in a second. Here are the lanes of the 128-bit AArch64 Advanced SIMD: 16 x 8 bits 8 x 16 bits 4 x 32 bits 2 x 64 bits 1 x ...

Going Faster

Photo by  Anders Jildén  on  Unsplash Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable. How does the compiler make it faster if my code is perfect? I’m going to put some methods here, but if you want, here is more detail . Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to...

Colour Selection

Photo by Scott Webb  on  Unsplash Today I'll talk about Lab 4. We had to pick two tasks out of four and develop the solution using my least favour language: assembly. Our group chose the options 2 (data input form) and 4 (screen colour selector) thinking that would be the easiest ones. The other options were adding calculator and hexdump. This post will talk about the colour selector, and my next will be the input form. The colour selector project was quite easy to do relatively. There are only 16 colours available (0 to F in hex). We have to list them in the text area and allow the selection using the cursor (up and down). Once the colour is selected, we have to paint the graph area. The graph area we did before. Basically, we have to store the colour code for every pixel in the display using the memory location between $0200 and $05FF. In the last post, we deal with up and down keys to change the numeric display. However, we never dealt with character display bef...