Skip to main content

Project Stage 2

Photo by SpaceX on Unsplash

Hey! Were you curious about the results of profiling AWK? Me too! Quick recap, what is profiling, and how to do it? Profiling is a technique to map the time execution of each part of the application. We can add instrumentation to the executable, or use interruption sampling to generate that map. Here, I’ll use both. Click here for more details on profiling.

For the instrumentation, we have to tell the compiler to add the tools needed to collect the execution data. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og -pg” and ran the make command. Then, I just ran the awk the same way I did to benchmark it. Here is the command line:

./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt

This awk version, instrumented, generates a file gmon.out, which contains all execution data. This is the raw material to create a profile report using gprof. I did this on all class machines, and they produced somewhat the same results. I have the text and the graph report to show you, gprof.png and gprof.txt below.

For the sampling, we don’t need to add anything to the executable. So, I’ve changed the “makefile” file, CFLAGS variable with “-g -Og” and ran the make command. No -pg param. However, to collect the execution data, we need to execute a different command:

perf record ./awk 'BEGIN {FS = "<|:|=";} {if ($8 == "DDD>") a ++;} END {print "count: " a;}' bmark-data.txt

This command generated a file called perf.data with our execution data to make the profiling report. This tool doesn’t create a visual graph, only creates a textual and interactive report. Unfortunately, it is not possible to show the interactive one, but the text is. It is the perf.txt below.

Analyzing the reports, I found the hot spots. Those are executed many times and spend the most execution time. There are two strings:

1 – execute -> fldbld  -> refldbld -> nematch
2 – execute -> program -> getrec   -> readrec

Those are the names of the functions inside the source code. I locate them, and now it is time to think about how to improve them. But this will be on my next posts. Say tuned! See you.

GPROF.png



GPROF.txt

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total          
 time   seconds   seconds    calls   s/call   s/call  name   
 31.45     43.67    43.67 240000000     0.00     0.00  nematch
 12.10     60.47    16.80 30000001     0.00     0.00  readrec
 11.51     76.44    15.98 30000000     0.00     0.00  refldbld
  7.81     87.29    10.84 240000008     0.00     0.00  execute
  5.44     94.84     7.55 30000000     0.00     0.00  fldbld
  5.41    102.35     7.51 150000001     0.00     0.00  setfval
  5.02    109.32     6.97 270000030     0.00     0.00  is_number
  3.16    113.71     4.39 120000009     0.00     0.00  get_str_val
  2.70    117.46     3.76 30000001     0.00     0.00  getrec
  1.29    119.26     1.80 30000000     0.00     0.00  relop
  1.22    120.96     1.70 30000000     0.00     0.00  makedfa
  1.15    122.55     1.59 30000000     0.00     0.00  pastat
  1.07    124.04     1.49 30000001     0.00     0.00  tfree
  0.98    125.40     1.36 120000008     0.00     0.00  getsval
  0.97    126.75     1.35 60000000     0.00     0.00  getfval
  0.88    127.96     1.22 30000000     0.00     0.00  incrdecr
  0.86    129.16     1.20 30000001     0.00     0.00  gettemp
  0.79    130.26     1.10 30000000     0.00     0.00  setlastfld
  0.72    131.26     1.00 30000000     0.00     0.00  indirect
  0.66    132.18     0.92 30000000     0.00     0.00  ifstat
  0.64    133.07     0.89 30000000     0.00     0.00  fieldadr
  0.55    133.83     0.76 60000000     0.00     0.00  cleanfld
  0.52    134.55     0.72 30000019     0.00     0.00  adjbuf
  0.50    135.24     0.69        2     0.35     0.35  isclvar
  0.49    135.92     0.68        1     0.68    32.35  program
  0.45    136.54     0.62 30000001     0.00     0.00  savefs
  0.42    137.13     0.59                             newfld
  0.28    137.52     0.39                             funnyvar
  0.21    137.81     0.29        1     0.29     0.29  cat
  0.14    138.00     0.19        2     0.10     0.10  getargv
  0.12    138.16     0.16                             matchop
  0.11    138.31     0.15        2     0.08     0.08  setsval
  0.11    138.46     0.15                             pmatch
  0.06    138.55     0.09        1     0.09     0.60  initgetrec
  0.05    138.62     0.07                             condexpr
  0.05    138.69     0.07                             nawk_tolower
  0.03    138.73     0.05                             arith
  0.03    138.78     0.05                             getnf
  0.03    138.82     0.04        1     0.04     0.04  recinit
  0.01    138.84     0.02                             freefa
  0.01    138.85     0.02                             boolop
  0.00    138.85     0.00      125     0.00     0.00  hash
  0.00    138.85     0.00      125     0.00     0.00  tostring
  0.00    138.85     0.00       89     0.00     0.00  input
  0.00    138.85     0.00       63     0.00     0.00  ptoi
  0.00    138.85     0.00       61     0.00     0.00  lookup
  0.00    138.85     0.00       58     0.00     0.00  resize_state
  0.00    138.85     0.00       58     0.00     0.00  setsymtab
  0.00    138.85     0.00       41     0.00     0.00  gettok
  0.00    138.85     0.00       30     0.00     0.00  yylex
  0.00    138.85     0.00       28     0.00     0.00  cgoto
  0.00    138.85     0.00       27     0.00     0.00  nodealloc
  0.00    138.85     0.00       16     0.00     0.00  unput
  0.00    138.85     0.00       14     0.00     0.00  node2
  0.00    138.85     0.00       13     0.00     0.00  op2
  0.00    138.85     0.00       10     0.00     0.00  node1
  0.00    138.85     0.00        9     0.00     0.00  intalloc
  0.00    138.85     0.00        8     0.00     0.00  celltonode
  0.00    138.85     0.00        7     0.00     0.00  binsearch
  0.00    138.85     0.00        7     0.00     0.00  linkum
  0.00    138.85     0.00        7     0.00     0.00  word
  0.00    138.85     0.00        6     0.00     0.00  peek
  0.00    138.85     0.00        6     0.00     0.00  relex
  0.00    138.85     0.00        5     0.00     0.00  first
  0.00    138.85     0.00        5     0.00     0.00  follow
  0.00    138.85     0.00        3     0.00     0.00  concat
  0.00    138.85     0.00        3     0.00     0.00  itonp
  0.00    138.85     0.00        3     0.00     0.00  makesymtab
  0.00    138.85     0.00        3     0.00     0.00  node3
  0.00    138.85     0.00        3     0.00     0.00  primary
  0.00    138.85     0.00        3     0.00     0.00  stat3
  0.00    138.85     0.00        3     0.00     0.00  string
  0.00    138.85     0.00        3     0.00     0.00  unary
  0.00    138.85     0.00        3     0.00     0.00  unputstr
  0.00    138.85     0.00        2     0.00     0.00  exptostat
  0.00    138.85     0.00        2     0.00     0.00  makefields
  0.00    138.85     0.00        2     0.00     0.00  op1
  0.00    138.85     0.00        2     0.00     0.00  rehash
  0.00    138.85     0.00        2     0.00     0.00  setfree
  0.00    138.85     0.00        1     0.00     0.00  alt
  0.00    138.85     0.00        1     0.00     0.00  arginit
  0.00    138.85     0.00        1     0.00     0.08  assign
  0.00    138.85     0.00        1     0.00     0.00  cfoll
  0.00    138.85     0.00        1     0.00     0.00  closeall
  0.00    138.85     0.00        1     0.00     0.00  envinit
  0.00    138.85     0.00        1     0.00     0.00  freetr
  0.00    138.85     0.00        1     0.00     0.00  getpssval
  0.00    138.85     0.00        1     0.00     0.00  growfldtab
  0.00    138.85     0.00        1     0.00     0.00  makeinit
  0.00    138.85     0.00        1     0.00     0.00  mkdfa
  0.00    138.85     0.00        1     0.00     0.00  notnull
  0.00    138.85     0.00        1     0.00     0.00  penter
  0.00    138.85     0.00        1     0.00     0.00  printstat
  0.00    138.85     0.00        1     0.00     0.00  regexp
  0.00    138.85     0.00        1     0.00     0.00  reparse
  0.00    138.85     0.00        1     0.00     0.00  resizesetvec
  0.00    138.85     0.00        1     0.00   137.26  run
  0.00    138.85     0.00        1     0.00     0.00  stat2
  0.00    138.85     0.00        1     0.00     0.00  stdinit
  0.00    138.85     0.00        1     0.00     0.00  syminit
  0.00    138.85     0.00        1     0.00     0.00  yyparse

PERF.txt

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 972K of event 'cycles:uppp'
# Event count (approx.): 236100649774
#
# Overhead  Command  Shared Object     Symbol                           
# ........  .......  ................  ..................................
#
    17.46%  awkPerf  awkPerf           [.] nematch
    16.84%  awkPerf  libc-2.27.so      [.] __GI_____strtod_l_internal
     8.52%  awkPerf  libc-2.27.so      [.] _IO_getc
     7.58%  awkPerf  awkPerf           [.] readrec
     5.78%  awkPerf  awkPerf           [.] refldbld
     4.30%  awkPerf  awkPerf           [.] execute
     3.78%  awkPerf  libc-2.27.so      [.] str_to_mpn.isra.0
     3.63%  awkPerf  libc-2.27.so      [.] __GI_strlen
     3.19%  awkPerf  awkPerf           [.] fldbld
     2.89%  awkPerf  awkPerf           [.] setfval
     2.89%  awkPerf  awkPerf           [.] is_number
     2.10%  awkPerf  libc-2.27.so      [.] __GI___strnlen
     1.77%  awkPerf  awkPerf           [.] getrec
     1.71%  awkPerf  libc-2.27.so      [.] strncpy
     1.64%  awkPerf  libc-2.27.so      [.] round_and_return
     1.57%  awkPerf  awkPerf           [.] get_str_val
     1.53%  awkPerf  libc-2.27.so      [.] __memcpy_generic
     1.06%  awkPerf  awkPerf           [.] _IO_getc@plt
     0.93%  awkPerf  libc-2.27.so      [.] strtod
     0.84%  awkPerf  libc-2.27.so      [.] strcmp
     0.73%  awkPerf  libc-2.27.so      [.] __errno_location
     0.64%  awkPerf  awkPerf           [.] makedfa
     0.60%  awkPerf  awkPerf           [.] incrdecr
     0.60%  awkPerf  awkPerf           [.] relop
     0.60%  awkPerf  awkPerf           [.] getfval
     0.53%  awkPerf  awkPerf           [.] ifstat
     0.52%  awkPerf  awkPerf           [.] gettemp
     0.49%  awkPerf  libc-2.27.so      [.] strcpy
     0.42%  awkPerf  awkPerf           [.] strtod@plt
     0.42%  awkPerf  awkPerf           [.] pastat
     0.39%  awkPerf  libc-2.27.so      [.] __mpn_construct_double
     0.39%  awkPerf  awkPerf           [.] cleanfld
     0.38%  awkPerf  awkPerf           [.] getsval
     0.37%  awkPerf  libc-2.27.so      [.] __mpn_lshift
     0.37%  awkPerf  awkPerf           [.] savefs
     0.35%  awkPerf  awkPerf           [.] setlastfld
     0.31%  awkPerf  awkPerf           [.] indirect
     0.31%  awkPerf  awkPerf           [.] program
     0.30%  awkPerf  awkPerf           [.] fieldadr
     0.28%  awkPerf  awkPerf           [.] adjbuf
     0.25%  awkPerf  awkPerf           [.] tfree
     0.23%  awkPerf  awkPerf           [.] __errno_location@plt
     0.17%  awkPerf  awkPerf           [.] strncpy@plt
     0.15%  awkPerf  awkPerf           [.] strlen@plt
     0.08%  awkPerf  awkPerf           [.] strcmp@plt
     0.07%  awkPerf  awkPerf           [.] strcpy@plt
     0.01%  awkPerf  libc-2.27.so      [.] _IO_file_underflow@@GLIBC_2.17
     0.01%  awkPerf  [unknown]         [k] 0xffff000010096654
     0.01%  awkPerf  libc-2.27.so      [.] __uflow
     0.01%  awkPerf  libc-2.27.so      [.] _IO_default_uflow
     0.01%  awkPerf  libc-2.27.so      [.] _IO_switch_to_get_mode
     0.01%  awkPerf  libc-2.27.so      [.] read
     0.00%  awkPerf  libc-2.27.so      [.] _IO_file_read
     0.00%  awkPerf  libc-2.27.so      [.] read_alias_file
     0.00%  awkPerf  libc-2.27.so      [.] _IO_vsnprintf
     0.00%  awkPerf  ld-2.27.so        [.] _dl_runtime_resolve
     0.00%  awkPerf  libc-2.27.so      [.] __strchrnul
     0.00%  awkPerf  libc-2.27.so      [.] _dl_addr
     0.00%  awkPerf  ld-2.27.so        [.] do_lookup_x
     0.00%  awkPerf  ld-2.27.so        [.] strcmp
     0.00%  awkPerf  ld-2.27.so        [.] _dl_relocate_object
     0.00%  awkPerf  ld-2.27.so        [.] _dl_check_all_versions
     0.00%  awkPerf  ld-2.27.so        [.] read
     0.00%  awkPerf  ld-2.27.so        [.] dl_main
     0.00%  awkPerf  ld-2.27.so        [.] __GI___tunables_init
     0.00%  awkPerf  ld-2.27.so        [.] _dl_start
     0.00%  awkPerf  ld-2.27.so        [.] _start
     0.00%  awkPerf  [unknown]         [k] 0xffff000010081544


#
# (Cannot load tips.txt file, please install perf!)
#

Comments

Popular posts from this blog

Two-digit Numeric Display - Final

Photo by  Nick Hillier  on  Unsplash In this post, I’ll continue the two-digit numeric display. If you miss it, click here and check it out . To finish this project, we just need to show the numbers in the matrix-pixel (the black-box in the 6502 emulator ). To kickstart, our instructor gave us one example of how to display graphs, which was a lot helpful. The first thing that I’ve noticed was the bitmap table at the bottom. So, I mimic it and made ten tables like that to represent each number (zero to nine). So far, so good! Then I grabbed the logic to display one digit, and then my nightmares just started. How to place two graphs (one for each digit)? How to switch from one number to another? How to reuse code? Where is my coffee?! To emulate some if-elseif-else statements, I used jmp (jump). They are all over the place! However, the 6502 limits the jump range from -127 to 128. That means moving the code-blocks to satisfy all jumps limit. For e...

Project Stage 1

Photo by  SpaceX  on  Unsplash Hello! This is my SPO 600 blog, and this post will be long – sorry. The goal is to pick one project that is CPU intensive, written in C or C++, and experiment different compiler options and present the results. That’s why it will be long – lots of data to show. I choose the AWK project ( https://github.com/onetrueawk/awk ). It is a handy tool to process files. Parse, sort, and filter are some trivial operations that are CPU intensive. To make it harder, I created a huge XML file to parse it and count the tags. I've described the machines in my last post, if you miss it, here it is . I also created a script to run and collect the data. I planned to run each candidate 10 times, but a few attempts didn’t receive any data. So, I decided to nest the loop in a way that even if someone kills my process, the data could be used. Guess what? It happened! To produce the candidates, I just changed the CFLAGS inside the makefile and ran the...

Assembly?

Photo by  Jonas Svidras  on  Unsplash Last week on my SPO course, I had my first experience writing Assembly code. I won’t lie; it was struggling. For me, Assembly is like the Latin of the codding languages and “carpe diem” wasn’t my first lesson. Hexadecimal, binary and a list of instructions is a must know to guarantee survival. Our instructor introduced us to the 6502 processor: it is an old school chip that was used in many home solutions such as PCs and video games. Internally, it has three general-purpose registers, three special-purpose registers, memory and input and output ports. Fortunately, there are emulators on the internet that helps us to focus on the development, hiding the electronic part from us. http://6502.cdot.systems/ Using the emulator, our first task was to copy, paste and execute a piece of code to change the colour of every pixel in the display matrix. That was easy! The result was a yellow screen. Then we were asked t...