[KimDaBa] KimDaBa 2.0 is released.

Thu Oct 21 12:53:57 BST 2004

   From: "Jesper K. Pedersen" <blackie at blackie.dk>
   Date: Thu, 21 Oct 2004 08:26:45 +0200

   | Just checking with strace, it appears that no more than 2 seconds of
   | the 7 seconds is spent loading the database (it takes 13 seconds
   | running under strace; 2 seconds of that appears to be related to
   | loading the database).

   I did profile using valgrind/kcachegrind, and it showed that almost
   all start up time was spent in reading in the XML File.

How much time was that (as measured by kcachegrind)?  Unless you have
a very slow processor, I doubt it was more than 2 seconds.

   | About half of the rest of the time (with my database) is spent
   | scanning the directories for new files (and stat'ing the existing
   | files) and the rest is spent groveling fonts, icons, etc.  Making
   | the initial scan optional would speed up startup; most of the
   | time I'm not interested in searching for new files every time
   | KimDaBa starts up.

   Interesting, valgrind will of course not see the time spent in
   system libraries. I'll look into that.

It will see time spent in *libraries*, but not time spent in the
kernel.  If you're going to scan 5000 or 50000 files on startup, it
isn't going to be able to start quickly.  Period.

Also, try looking at how long even a fairly minimal KDE application
(konqueror on an empty directory) takes to start up.  On my laptop, it
takes about 2 seconds of wallclock time.  An RDBMS isn't going to
start up instantly, either.

   | It might be possible to speed up the load and save with a faster
   | XML library.  Gimp-Print uses mxml, which Mike Sweet of Easy
   | Software (www.easysw.com) wrote for this purpose.  I tried
   | rewritting part of loading code to SAX rather than DOM, which
   | gave me almost nothing in speedup, but in any case I dont want
   | peanuts, I wan't instant start up (tm) ;-)

   | Above maybe 50,000 entries it might be worthwhile finding a faster
   | storage mechanism.  That's not an unreasonable number of images for a
   | professional photographer, but there are a lot of other things we
   | should be looking at first.  The problem's more likely to be memory
   | consumption than outright performance.

   memory consumption is the other part of the coin, given I store
   thing in XML, I read in the whole database in memory, which
   therefore will grow. I'm not really sure how bad this will be even
   with 50.000 images.

The memory requirements are ultimately going to be similar.  An
advantage of an RDBMS might be that you don't have to keep it all in
memory at any one time.  The downside would be that reading it in from
disk (in the RDBMS) will add latency to each operation, but perhaps
that latency won't be too much.

   Anyway, let me elaborate on my decision of going for a DB: KimDaBa
   takes approx 10 sec to start for my db of ~ 5000 images, I
   anticipate/expect that in the comming years it will not be unlikely
   to see people with 50.000 images, just look at your image
   collection and tell me how fast it grows. loading 50.000 images
   will then take 100 sec, almost 1:20 minute. I fear/expect that
   people will not be willing to use KimDaBa when loading takes almost
   1 1/2 minute.

I wouldn't like that either.  Again, figure out how much of that is
really due to loading the XML file vs. how much time is spent scanning
images, loading shared libraries, etc.  The KDE-related startup won't
scale with the number of images, of course, but the images scan at
startup will.

I'd recommend that the first thing you do is turn off the image scan
at startup and see how long it takes.  You can always offer a checkbox
to scan images at startup and/or manually scan.  You would of course
need to scan when the user requests find all images not on disk, but
you can be lazy about scanning.

   Second thing to this is an idea someone sent me a year ago or so:
   How about making kimdaba the KDE global image database? When you
   see a new image in say konqueror, or scans in a new image, it
   should drop directly into kimdaba. For this to work, I need a KPart
   for kimdaba, but a kpart that takes 1:20 for each image you want to
   add is unrealistic, even one that takes 10 sec to load is.

Agreed, but if you have to start up an RDBMS and scan the directory
tree each time someone drops an image on it, you're going to have very
sluggish response too.  For something like that, you need to have
kimdaba running as a daemon.  What you're looking for in this case is
something more like a transaction processing system, and a light
weight RDBMS may well be a better choice.  But whatever you do, if you
want snappy interactive response you'll need a very different
architecture.

-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

Tall Clubs International  --  http://www.tall.org/ or 1-888-IM-TALL-2
Member of the League for Programming Freedom -- mail lpf at uunet.uu.net
Project lead for Gimp Print   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton