[KimDaBa] KimDaBa 2.0 is released.

Thu Oct 21 07:26:45 BST 2004

| I really don't understand the urgency of doing this.  With over 5000
| entries, it takes about 7 seconds for KimDaBa to start up (PIII-1000,
| 512 MB) if I've just run it (if I haven't run it previously, it takes
| much longer, because it stats all of the files and scans the directory
| for new files).  As far as it taking a long time to perform operations
| within KimDaBa, that's not the fault of the XML data file, which after
| all is simply an external representation of the data.  You can always
| use whatever internal representation for the data you please.
|
| Relational databases are valuable when the relationships between
| different types of data are more complicated.  In this case, they're
| pretty basic -- an image has a number of fairly simple attributes
| (date, some number of keywords, and such).  Try drawing an entity
| relationship diagram for all of this, and you'll see that there's one
| very big table (images) with a lot of fixed attributes and several
| very small tables (keywords, people, and locations).  You're not going
| to be performing any queries that can't be expressed as very simple
| logic.
|
| There are a lot of advantages to storing the database as a flat file.
| For one, it's a lot easier to fix if something goes wrong.  If you
| arrange to keep a few copies as backup (which KimDaBa could very
| easily do), you'd have your safety factor.
As I said switching between backends should be as easy as to scratch your 
back, this implies that you still can ask KimDaBa for a flat file for backup 
or recovery purposes.

| Just checking with strace, it appears that no more than 2 seconds of
| the 7 seconds is spent loading the database (it takes 13 seconds
| running under strace; 2 seconds of that appears to be related to
| loading the database).
I did profile using valgrind/kcachegrind, and it showed that almost all start 
up time was spent in reading in the XML File.

| About half of the rest of the time (with my 
| database) is spent scanning the directories for new files (and
| stat'ing the existing files) and the rest is spent groveling fonts,
| icons, etc.  Making the initial scan optional would speed up startup;
| most of the time I'm not interested in searching for new files every
| time KimDaBa starts up.
Interesting, valgrind will of course not see the time spent in system 
libraries. I'll look into that.

| It might be possible to speed up the load and save with a faster XML
| library.  Gimp-Print uses mxml, which Mike Sweet of Easy Software
| (www.easysw.com) wrote for this purpose.
I tried rewritting part of loading code to SAX rather than DOM, which gave me 
almost nothing in speedup, but in any case I dont want peanuts, I wan't 
instant start up (tm) ;-)

| Above maybe 50,000 entries it might be worthwhile finding a faster
| storage mechanism.  That's not an unreasonable number of images for a
| professional photographer, but there are a lot of other things we
| should be looking at first.  The problem's more likely to be memory
| consumption than outright performance.
memory consumption is the other part of the coin, given I store thing in XML, 
I read in the whole database in memory, which therefore will grow. I'm not 
really sure how bad this will be even with 50.000 images.

Anyway, let me elaborate on my decision of going for a DB:
KimDaBa takes approx 10 sec to start for my db of ~ 5000 images, I 
anticipate/expect that in the comming years it will not be unlikely to see 
people with 50.000 images, just look at your image collection and tell me how 
fast it grows. loading 50.000 images will then take 100 sec, almost 1:20 
minute. I fear/expect that people will not be willing to use KimDaBa when 
loading takes almost 1 1/2 minute.

Second thing to this is an idea someone sent me a year ago or so: How about 
making kimdaba the KDE global image database? When you see a new image in say 
konqueror, or scans in a new image, it should drop directly into kimdaba. For 
this to work, I need a KPart for kimdaba, but a kpart that takes 1:20 for 
each image you want to add is unrealistic, even one that takes 10 sec to load 
is.

Hope that clarifies my reasoning. 
-- 
Having trouble finding a given image in your collection containing
thousands of images?

http://ktown.kde.org/kimdaba might be the answer.