[KPhotoAlbum] Performance issue: KPA too slow when clickingon "big" supercategories

Robert L Krawitz rlk at alum.mit.edu
Wed Dec 27 17:27:07 GMT 2006


   Date: Thu, 28 Dec 2006 02:51:56 +0100 (CET)
   From: "Lars Clausen" <lars at raeder.dk>

   Robert L Krawitz said:
   >    From: "Jesper K. Pedersen" <blackie at blackie.dk>
   >    Date: Tue, 26 Dec 2006 12:05:05 +0100
   >
   >    One of my goals is to push all this onto a worker thread, so it is
   >    backgrounded. Esp md5sum calculation is tedious at when new images
   >    are found.
   >
   > The time required to actually import the images isn't bad.  The
   > problem is the time spent scanning for images when you're only
   > importing a few.
   >
   > Importing 100 images or so only takes about a minute.  Scanning my
   > filesystem for new images, with 17000 already present, currently takes
   > about 100 seconds.  This is likely to scale linearly with the number
   > of images already present (specifically, with the number of files in
   > the image directory and all of its children).

   I think I mentioned this before: If you store for each directory
   involved whether it's a leaf directory, you can skip those leaf
   directories that haven't changed.  I'd wager in most setups, this
   would skip the majority of old images.

The issue is minimizing the number of filesystem accesses.  How would
you propose determining whether a leaf directory hasn't changed
without looking at the contents of the directory?

The problem here is that QDir insists on stat'ing every directory
entry, even if there's no filter of any kind.  What we need for
starters is a way to get the directory entries without calling stat()
(or access(), which also has to read the inode).  KPA should eliminate
all entries it knows it doesn't care about -- entries that already
exist in the database or that can otherwise be identified by *name
alone* as being uninteresting.  The reason I say "name alone" is that
retrieving the name of a file doesn't incur the overhead of reading
its inode from disk.

If the inodes are already in main memory, there isn't much of a
problem -- stat'ing all of the entries takes less than a second.  This
demonstrates the problem.

$ /usr/bin/time du -sk
75758251        .
0.12user 1.79system 2:05.38elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+247minor)pagefaults 0swaps
$ /usr/bin/time du -sk
75758251        .
0.04user 0.12system 0:00.17elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+248minor)pagefaults 0swaps
$ find . -print |wc
  24966   25106  746406




More information about the Kphotoalbum mailing list