Skip to main content

Project Stage 2

Photo by SpaceX on Unsplash

Hey! Were you curious about the results of profiling AWK? Me too! Quick recap, what is profiling, and how to do it? Profiling is a technique to map the time execution of each part of the application. We can add instrumentation to the executable, or use interruption sampling to generate that map. Here, I’ll use both. Click here for more details on profiling.

For the instrumentation, we have to tell the compiler to add the tools needed to collect the execution data. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og -pg” and ran the make command. Then, I just ran the awk the same way I did to benchmark it. Here is the command line:

./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt

This awk version, instrumented, generates a file gmon.out, which contains all execution data. This is the raw material to create a profile report using gprof. I did this on all class machines, and they produced somewhat the same results. I have the text and the graph report to show you, gprof.png and gprof.txt below.

For the sampling, we don’t need to add anything to the executable. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og” and ran the make command. No -pg param. However, to collect the execution data, we need to execute a different command:

perf record ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt

This command generated a file called perf.data with our execution data to make the profiling report. This tool doesn’t create a visual graph, only creates a textual and interactive report. Unfortunately, it is not possible to show the interactive one, but the text is. It is the perf.txt below.

Analyzing the reports, I found the hot spots. Those are executed many times and spend the most execution time. There are two strings:

1 – execute -> fldbld  -> refldbld -> nematch
2 – execute -> program -> getrec   -> readrec

Those are the names of the functions inside the source code. I locate them, and now it is time to think about how to improve them. But this will be on my next posts. Say tuned! See you.

GPROF.png



GPROF.txt

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total          
 time   seconds   seconds    calls   s/call   s/call  name   
 31.45     43.67    43.67 240000000     0.00     0.00  nematch
 12.10     60.47    16.80 30000001     0.00     0.00  readrec
 11.51     76.44    15.98 30000000     0.00     0.00  refldbld
  7.81     87.29    10.84 240000008     0.00     0.00  execute
  5.44     94.84     7.55 30000000     0.00     0.00  fldbld
  5.41    102.35     7.51 150000001     0.00     0.00  setfval
  5.02    109.32     6.97 270000030     0.00     0.00  is_number
  3.16    113.71     4.39 120000009     0.00     0.00  get_str_val
  2.70    117.46     3.76 30000001     0.00     0.00  getrec
  1.29    119.26     1.80 30000000     0.00     0.00  relop
  1.22    120.96     1.70 30000000     0.00     0.00  makedfa
  1.15    122.55     1.59 30000000     0.00     0.00  pastat
  1.07    124.04     1.49 30000001     0.00     0.00  tfree
  0.98    125.40     1.36 120000008     0.00     0.00  getsval
  0.97    126.75     1.35 60000000     0.00     0.00  getfval
  0.88    127.96     1.22 30000000     0.00     0.00  incrdecr
  0.86    129.16     1.20 30000001     0.00     0.00  gettemp
  0.79    130.26     1.10 30000000     0.00     0.00  setlastfld
  0.72    131.26     1.00 30000000     0.00     0.00  indirect
  0.66    132.18     0.92 30000000     0.00     0.00  ifstat
  0.64    133.07     0.89 30000000     0.00     0.00  fieldadr
  0.55    133.83     0.76 60000000     0.00     0.00  cleanfld
  0.52    134.55     0.72 30000019     0.00     0.00  adjbuf
  0.50    135.24     0.69        2     0.35     0.35  isclvar
  0.49    135.92     0.68        1     0.68    32.35  program
  0.45    136.54     0.62 30000001     0.00     0.00  savefs
  0.42    137.13     0.59                             newfld
  0.28    137.52     0.39                             funnyvar
  0.21    137.81     0.29        1     0.29     0.29  cat
  0.14    138.00     0.19        2     0.10     0.10  getargv
  0.12    138.16     0.16                             matchop
  0.11    138.31     0.15        2     0.08     0.08  setsval
  0.11    138.46     0.15                             pmatch
  0.06    138.55     0.09        1     0.09     0.60  initgetrec
  0.05    138.62     0.07                             condexpr
  0.05    138.69     0.07                             nawk_tolower
  0.03    138.73     0.05                             arith
  0.03    138.78     0.05                             getnf
  0.03    138.82     0.04        1     0.04     0.04  recinit
  0.01    138.84     0.02                             freefa
  0.01    138.85     0.02                             boolop
  0.00    138.85     0.00      125     0.00     0.00  hash
  0.00    138.85     0.00      125     0.00     0.00  tostring
  0.00    138.85     0.00       89     0.00     0.00  input
  0.00    138.85     0.00       63     0.00     0.00  ptoi
  0.00    138.85     0.00       61     0.00     0.00  lookup
  0.00    138.85     0.00       58     0.00     0.00  resize_state
  0.00    138.85     0.00       58     0.00     0.00  setsymtab
  0.00    138.85     0.00       41     0.00     0.00  gettok
  0.00    138.85     0.00       30     0.00     0.00  yylex
  0.00    138.85     0.00       28     0.00     0.00  cgoto
  0.00    138.85     0.00       27     0.00     0.00  nodealloc
  0.00    138.85     0.00       16     0.00     0.00  unput
  0.00    138.85     0.00       14     0.00     0.00  node2
  0.00    138.85     0.00       13     0.00     0.00  op2
  0.00    138.85     0.00       10     0.00     0.00  node1
  0.00    138.85     0.00        9     0.00     0.00  intalloc
  0.00    138.85     0.00        8     0.00     0.00  celltonode
  0.00    138.85     0.00        7     0.00     0.00  binsearch
  0.00    138.85     0.00        7     0.00     0.00  linkum
  0.00    138.85     0.00        7     0.00     0.00  word
  0.00    138.85     0.00        6     0.00     0.00  peek
  0.00    138.85     0.00        6     0.00     0.00  relex
  0.00    138.85     0.00        5     0.00     0.00  first
  0.00    138.85     0.00        5     0.00     0.00  follow
  0.00    138.85     0.00        3     0.00     0.00  concat
  0.00    138.85     0.00        3     0.00     0.00  itonp
  0.00    138.85     0.00        3     0.00     0.00  makesymtab
  0.00    138.85     0.00        3     0.00     0.00  node3
  0.00    138.85     0.00        3     0.00     0.00  primary
  0.00    138.85     0.00        3     0.00     0.00  stat3
  0.00    138.85     0.00        3     0.00     0.00  string
  0.00    138.85     0.00        3     0.00     0.00  unary
  0.00    138.85     0.00        3     0.00     0.00  unputstr
  0.00    138.85     0.00        2     0.00     0.00  exptostat
  0.00    138.85     0.00        2     0.00     0.00  makefields
  0.00    138.85     0.00        2     0.00     0.00  op1
  0.00    138.85     0.00        2     0.00     0.00  rehash
  0.00    138.85     0.00        2     0.00     0.00  setfree
  0.00    138.85     0.00        1     0.00     0.00  alt
  0.00    138.85     0.00        1     0.00     0.00  arginit
  0.00    138.85     0.00        1     0.00     0.08  assign
  0.00    138.85     0.00        1     0.00     0.00  cfoll
  0.00    138.85     0.00        1     0.00     0.00  closeall
  0.00    138.85     0.00        1     0.00     0.00  envinit
  0.00    138.85     0.00        1     0.00     0.00  freetr
  0.00    138.85     0.00        1     0.00     0.00  getpssval
  0.00    138.85     0.00        1     0.00     0.00  growfldtab
  0.00    138.85     0.00        1     0.00     0.00  makeinit
  0.00    138.85     0.00        1     0.00     0.00  mkdfa
  0.00    138.85     0.00        1     0.00     0.00  notnull
  0.00    138.85     0.00        1     0.00     0.00  penter
  0.00    138.85     0.00        1     0.00     0.00  printstat
  0.00    138.85     0.00        1     0.00     0.00  regexp
  0.00    138.85     0.00        1     0.00     0.00  reparse
  0.00    138.85     0.00        1     0.00     0.00  resizesetvec
  0.00    138.85     0.00        1     0.00   137.26  run
  0.00    138.85     0.00        1     0.00     0.00  stat2
  0.00    138.85     0.00        1     0.00     0.00  stdinit
  0.00    138.85     0.00        1     0.00     0.00  syminit
  0.00    138.85     0.00        1     0.00     0.00  yyparse

PERF.txt

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 972K of event 'cycles:uppp'
# Event count (approx.): 236100649774
#
# Overhead  Command  Shared Object     Symbol                           
# ........  .......  ................  ..................................
#
    17.46%  awkPerf  awkPerf           [.] nematch
    16.84%  awkPerf  libc-2.27.so      [.] __GI_____strtod_l_internal
     8.52%  awkPerf  libc-2.27.so      [.] _IO_getc
     7.58%  awkPerf  awkPerf           [.] readrec
     5.78%  awkPerf  awkPerf           [.] refldbld
     4.30%  awkPerf  awkPerf           [.] execute
     3.78%  awkPerf  libc-2.27.so      [.] str_to_mpn.isra.0
     3.63%  awkPerf  libc-2.27.so      [.] __GI_strlen
     3.19%  awkPerf  awkPerf           [.] fldbld
     2.89%  awkPerf  awkPerf           [.] setfval
     2.89%  awkPerf  awkPerf           [.] is_number
     2.10%  awkPerf  libc-2.27.so      [.] __GI___strnlen
     1.77%  awkPerf  awkPerf           [.] getrec
     1.71%  awkPerf  libc-2.27.so      [.] strncpy
     1.64%  awkPerf  libc-2.27.so      [.] round_and_return
     1.57%  awkPerf  awkPerf           [.] get_str_val
     1.53%  awkPerf  libc-2.27.so      [.] __memcpy_generic
     1.06%  awkPerf  awkPerf           [.] _IO_getc@plt
     0.93%  awkPerf  libc-2.27.so      [.] strtod
     0.84%  awkPerf  libc-2.27.so      [.] strcmp
     0.73%  awkPerf  libc-2.27.so      [.] __errno_location
     0.64%  awkPerf  awkPerf           [.] makedfa
     0.60%  awkPerf  awkPerf           [.] incrdecr
     0.60%  awkPerf  awkPerf           [.] relop
     0.60%  awkPerf  awkPerf           [.] getfval
     0.53%  awkPerf  awkPerf           [.] ifstat
     0.52%  awkPerf  awkPerf           [.] gettemp
     0.49%  awkPerf  libc-2.27.so      [.] strcpy
     0.42%  awkPerf  awkPerf           [.] strtod@plt
     0.42%  awkPerf  awkPerf           [.] pastat
     0.39%  awkPerf  libc-2.27.so      [.] __mpn_construct_double
     0.39%  awkPerf  awkPerf           [.] cleanfld
     0.38%  awkPerf  awkPerf           [.] getsval
     0.37%  awkPerf  libc-2.27.so      [.] __mpn_lshift
     0.37%  awkPerf  awkPerf           [.] savefs
     0.35%  awkPerf  awkPerf           [.] setlastfld
     0.31%  awkPerf  awkPerf           [.] indirect
     0.31%  awkPerf  awkPerf           [.] program
     0.30%  awkPerf  awkPerf           [.] fieldadr
     0.28%  awkPerf  awkPerf           [.] adjbuf
     0.25%  awkPerf  awkPerf           [.] tfree
     0.23%  awkPerf  awkPerf           [.] __errno_location@plt
     0.17%  awkPerf  awkPerf           [.] strncpy@plt
     0.15%  awkPerf  awkPerf           [.] strlen@plt
     0.08%  awkPerf  awkPerf           [.] strcmp@plt
     0.07%  awkPerf  awkPerf           [.] strcpy@plt
     0.01%  awkPerf  libc-2.27.so      [.] _IO_file_underflow@@GLIBC_2.17
     0.01%  awkPerf  [unknown]         [k] 0xffff000010096654
     0.01%  awkPerf  libc-2.27.so      [.] __uflow
     0.01%  awkPerf  libc-2.27.so      [.] _IO_default_uflow
     0.01%  awkPerf  libc-2.27.so      [.] _IO_switch_to_get_mode
     0.01%  awkPerf  libc-2.27.so      [.] read
     0.00%  awkPerf  libc-2.27.so      [.] _IO_file_read
     0.00%  awkPerf  libc-2.27.so      [.] read_alias_file
     0.00%  awkPerf  libc-2.27.so      [.] _IO_vsnprintf
     0.00%  awkPerf  ld-2.27.so        [.] _dl_runtime_resolve
     0.00%  awkPerf  libc-2.27.so      [.] __strchrnul
     0.00%  awkPerf  libc-2.27.so      [.] _dl_addr
     0.00%  awkPerf  ld-2.27.so        [.] do_lookup_x
     0.00%  awkPerf  ld-2.27.so        [.] strcmp
     0.00%  awkPerf  ld-2.27.so        [.] _dl_relocate_object
     0.00%  awkPerf  ld-2.27.so        [.] _dl_check_all_versions
     0.00%  awkPerf  ld-2.27.so        [.] read
     0.00%  awkPerf  ld-2.27.so        [.] dl_main
     0.00%  awkPerf  ld-2.27.so        [.] __GI___tunables_init
     0.00%  awkPerf  ld-2.27.so        [.] _dl_start
     0.00%  awkPerf  ld-2.27.so        [.] _start
     0.00%  awkPerf  [unknown]         [k] 0xffff000010081544


#
# (Cannot load tips.txt file, please install perf!)
#

Comments

Popular posts from this blog

Going Faster

Photo by  Anders Jildén  on  Unsplash Today’s topic is compiler optimizations. Besides translating our source code into machine binary executable, the compiler, based on optimization parameters, can also produce faster executables. Just by adding some parameters to the compiler, we can get a better performance or a smaller executable, for example. There are hundreds of those parameters which we can turn on or off using the prefix -f and -fno. However, instead of doing one by one, we can use the features mode using the -O param. It ranges from 0 (no optimization – the default) to 3 (the highest). Using those parameters has a cost —usually, the faster, the larger executable. How does the compiler make it faster if my code is perfect? I’m going to put some methods here, but if you want, here is more detail . Also, bear in mind that most of the optimizations are done in the intermediate representation of the program. So, the examples below are rewritten just to...

Colour Selection

Photo by Scott Webb  on  Unsplash Today I'll talk about Lab 4. We had to pick two tasks out of four and develop the solution using my least favour language: assembly. Our group chose the options 2 (data input form) and 4 (screen colour selector) thinking that would be the easiest ones. The other options were adding calculator and hexdump. This post will talk about the colour selector, and my next will be the input form. The colour selector project was quite easy to do relatively. There are only 16 colours available (0 to F in hex). We have to list them in the text area and allow the selection using the cursor (up and down). Once the colour is selected, we have to paint the graph area. The graph area we did before. Basically, we have to store the colour code for every pixel in the display using the memory location between $0200 and $05FF. In the last post, we deal with up and down keys to change the numeric display. However, we never dealt with character display bef...

Project Stage 3

Photo by  NASA  on  Unsplash Hello! In this post, I’ll make a list of optimization opportunities that I identified on the AWK project based on what I’ve learned in the SPO600 classes. There are two types of optimizations: portable and platform-specific. Portable optimizations are the ones that work everywhere, like better algorithms and implementations, and also compiler building flags. Platform-specific, on the other hand, works only for a targeted architecture. Like the SIMD instructions available only on Arch64 and many others specific for x86_64. It is possible to “force” the usage of such instructions according to the targeted hardware. We can do that on compilation time, and also on run-time. Now that we know our options, let’s dig in. According to my previous post , the functions nematch and readrec are the hotspots. Here is the command line used to run the awk: ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "cou...