[KimDaBa] Profiling Kimdaba startup

Wed Jan 5 12:19:12 GMT 2005

   Date: Wed, 05 Jan 2005 07:58:04 +0200
   From: David Fraser <davidf at sjsoft.com>
   Cc: kimdaba at klaralvdalens-datakonsult.se

   Jody Harris wrote:

   > A humble thought on making start time shorter:
   >
   > One way to compress the load time would be to compress the xml file. 
   > Because of XML's inherent "spaciness," there is some (large) 
   > percentage of the file that is read from disk, then "tossed" (white 
   > spaces, tags, etc) as data is put into an in-memory data structure.
   >
   > Since much of this information is highly redundant, you should be able 
   > to gain a high amount of compression on the data even at the default 
   > (6?) compression level.  Time save reading the data should more than 
   > offset time decompressing the data.
   >
   > Some tests:
   >
   > My original file: 5743 images, 3.2MB
   >
   > Copy index.xml from A to B:
   > bigpig@~ $ time cp photo/index.xml tmp
   >
   > real    0m0.194s
   > user    0m0.001s
   > sys     0m0.028s
   >
   > Compression step with gzip, default settings:
   > bigpig@~ $ time gzip tmp/index.xml
   >
   > real    0m0.107s
   > user    0m0.092s
   > sys     0m0.006s
   >
   >
   > Resulting compressed file: 209KB (!)
   >
   > Make a copy of the compressed file with gzip:
   > bigpig@~ $ time cp tmp/index.xml.gz tmp/index.xml.gz.2
   >
   > real    0m0.006s
   > user    0m0.002s
   > sys     0m0.003s
   >
   > Unzip the compressed file:
   > bigpig@~ $ time gzip -d tmp/index.xml.gz
   >
   > real    0m0.034s
   > user    0m0.020s
   > sys     0m0.010s
   >
   >
   > Granted those are "one-time tests."  to get a good benchmark, you'd 
   > need to run each step enough times to get a system average, but if 
   > these numbers are even close, we're looking at 0.040s (real) vs 0.194s 
   > (real), a savings of ~79%.
   >
   > Jesper?  What do you think?  Robert?  Michael?  Others?
   >
   > jody

   Most of the time isn't reading the file from disk etc but parsing the xml...

David's right.  In addition, there's another problem with this kind of
benchmark: the file's going to be cached after it's been used once.
So the original copy (cp photo/index.xml /tmp) might measure the
actual time needed to read the file from disk, but the second one (cp
tmp/index.xml.gz tmp/index.xml.gz.2) won't.  So it's deceptive, unless
you do it right.

When I ran my startup test, I was careful to benchmark the second
consecutive run I did, not the first, to ensure that everything was in
memory.  While this isn't entirely realistic in some ways (the first
time you'll start it, which is probably the one you care most about,
is much slower), it gives the most reproducible results.  My
observation was that it took about 2.5 seconds for Qt to parse the 3.7
MB file, or about 1.5 MB/sec.  Considering that my known disk
bandwidth is between 15 and 30 MB/sec (depending upon where on the
disk the file resides), this process clearly isn't disk bandwidth
limited.

To demonstrate this, I did the following test (it's necessary to use
two cat processes in a pipeline, because the way cat works, using
memory mapped I/O, if the destination is /dev/null it does almost no
work).  Note that the second time (when the file was in memory) the
amount of time spent was completely negligible.

[2(rlk)||{!19}<rlk-mobile>/images]
$ /usr/bin/time cat index.xml | cat > /dev/null
0.00user 0.01system 0:00.58elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+156minor)pagefaults 0swaps
[2(rlk)||{!20}<rlk-mobile>/images]
$ /usr/bin/time cat index.xml | cat > /dev/null
0.00user 0.01system 0:00.01elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+156minor)pagefaults 0swaps

-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

Tall Clubs International  --  http://www.tall.org/ or 1-888-IM-TALL-2
Member of the League for Programming Freedom -- mail lpf at uunet.uu.net
Project lead for Gimp Print   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton