[KimDaBa] KimDaBa 2.0 is released.

Thu Oct 21 01:23:04 BST 2004

   From: "Jesper K. Pedersen" <blackie at blackie.dk>
   Date: Wed, 20 Oct 2004 21:06:08 +0200

   On Wednesday 20 October 2004 20:00, William Holland wrote:

   | It's not that it's slow for the data - an 10 Meg xml file should
   | be relatively slow. (from one filter, when I click on the 'home'
   | icon, it takes 3 seconds to re-load the root interface, on an
   | athlon 1GHz, 512MB) I now have 14000 entries in the database, so
   | that's not exactly slow.
   |
   | Great that you're putting in a true DB backend for 2.1 (I hope
   | you will still keep the option of xml files for those of us too
   | lazy to setup a proper database.

   Ohhhh yes indeed, and for all of those (like me) who why shy away
   from kimdaba before ever giving it a try, if it requires them to
   install a db.

   It will be possible to start with an XML file, and then later
   change to a db, and later even back again.

I really don't understand the urgency of doing this.  With over 5000
entries, it takes about 7 seconds for KimDaBa to start up (PIII-1000,
512 MB) if I've just run it (if I haven't run it previously, it takes
much longer, because it stats all of the files and scans the directory
for new files).  As far as it taking a long time to perform operations
within KimDaBa, that's not the fault of the XML data file, which after
all is simply an external representation of the data.  You can always
use whatever internal representation for the data you please.

Relational databases are valuable when the relationships between
different types of data are more complicated.  In this case, they're
pretty basic -- an image has a number of fairly simple attributes
(date, some number of keywords, and such).  Try drawing an entity
relationship diagram for all of this, and you'll see that there's one
very big table (images) with a lot of fixed attributes and several
very small tables (keywords, people, and locations).  You're not going
to be performing any queries that can't be expressed as very simple
logic.

There are a lot of advantages to storing the database as a flat file.
For one, it's a lot easier to fix if something goes wrong.  If you
arrange to keep a few copies as backup (which KimDaBa could very
easily do), you'd have your safety factor.

Just checking with strace, it appears that no more than 2 seconds of
the 7 seconds is spent loading the database (it takes 13 seconds
running under strace; 2 seconds of that appears to be related to
loading the database).  About half of the rest of the time (with my
database) is spent scanning the directories for new files (and
stat'ing the existing files) and the rest is spent groveling fonts,
icons, etc.  Making the initial scan optional would speed up startup;
most of the time I'm not interested in searching for new files every
time KimDaBa starts up.

It might be possible to speed up the load and save with a faster XML
library.  Gimp-Print uses mxml, which Mike Sweet of Easy Software
(www.easysw.com) wrote for this purpose.

Above maybe 50,000 entries it might be worthwhile finding a faster
storage mechanism.  That's not an unreasonable number of images for a
professional photographer, but there are a lot of other things we
should be looking at first.  The problem's more likely to be memory
consumption than outright performance.

-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

Tall Clubs International  --  http://www.tall.org/ or 1-888-IM-TALL-2
Member of the League for Programming Freedom -- mail lpf at uunet.uu.net
Project lead for Gimp Print   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton