libpng profiling

23 Sep 2011


      I did some quick and dirty profiling of libpng decoding on a Beagle-xm.
This is the result with one image:
46.18%  pngbench  pngbench           [.] inflate_fast
    26.12%  pngbench  pngbench           [.] png_read_filter_row
     7.81%  pngbench  pngbench           [.] inflate
     5.65%  pngbench  pngbench           [.] memcpy
     4.26%  pngbench  pngbench           [.] adler32
     2.39%  pngbench  pngbench           [.] crc32
     1.78%  pngbench  [kernel.kallsyms]  [k] __copy_to_user
     1.76%  pngbench  [kernel.kallsyms]  [k] __do_softirq
     1.40%  pngbench  pngbench           [.] inflate_table
     1.02%  pngbench  [kernel.kallsyms]  [k] __memzero
And another:
64.79%  pngbench  pngbench           [.] inflate_fast
     8.61%  pngbench  pngbench           [.] memcpy
     7.46%  pngbench  pngbench           [.] adler32
     5.10%  pngbench  pngbench           [.] crc32
     3.49%  pngbench  pngbench           [.] inflate
     3.16%  pngbench  [kernel.kallsyms]  [k] __copy_to_user
     1.33%  pngbench  [kernel.kallsyms]  [k] __memzero
And a third:
47.00%  pngbench  pngbench           [.] png_read_filter_row
    28.52%  pngbench  pngbench           [.] inflate_fast
     5.12%  pngbench  pngbench           [.] memcpy
     4.23%  pngbench  pngbench           [.] crc32
     3.85%  pngbench  pngbench           [.] adler32
     1.60%  pngbench  [kernel.kallsyms]  [k] __memzero
     1.56%  pngbench  pngbench           [.] inflate_table
     1.50%  pngbench  [kernel.kallsyms]  [k] __copy_to_user
     1.38%  pngbench  [kernel.kallsyms]  [k] __do_softirq
     0.78%  pngbench  pngbench           [.] inflate
Two of these are coded using a predictive filter resulting in
png_read_filter_row()
using a substantial amount of decoding time.  Multiple filters are available,
thus the different amounts of time seen in that function above.  When no such
filter is used, decoding time is dominated by zlib decompression.
Two checksum functions feature in these profiles.  Adler32 is the checksum
used by zlib to verify data integrity, and crc32 is used by PNG.
Optimising the png_read_filter_row function in NEON is possible in
principle, the
effort of hooking this up in libpng might however be non-trivial.  Assuming a
speedup of 4x for this function, the overall decoding performance
improvement would
be up to ~1.6x depending on the image.  This should definitely be investigated
further.
A worryingly large amount of time is also spent in memcpy().  If some
of these calls
could be eliminated, a further 10% speed might be gained.  This is likely to be
quite difficult.
Optimising zlib is of course also possible in theory, but is probably even more
difficult.
-- 
Mans Rullgard / mru

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

libpng profiling