[KPhotoAlbum] Performance issue: KPA too slow whenclickingon "big" supercategories

Lars Clausen lars at raeder.dk
Thu Dec 28 05:46:27 GMT 2006


Robert L Krawitz said:
>    Date: Thu, 28 Dec 2006 02:51:56 +0100 (CET)
>    From: "Lars Clausen" <lars at raeder.dk>
>
>    Robert L Krawitz said:
>    >    From: "Jesper K. Pedersen" <blackie at blackie.dk>
>    >    Date: Tue, 26 Dec 2006 12:05:05 +0100
>    >
>    >    One of my goals is to push all this onto a worker thread, so it is
>    >    backgrounded. Esp md5sum calculation is tedious at when new images
>    >    are found.
>    >
>    > The time required to actually import the images isn't bad.  The
>    > problem is the time spent scanning for images when you're only
>    > importing a few.
>    >
>    > Importing 100 images or so only takes about a minute.  Scanning my
>    > filesystem for new images, with 17000 already present, currently
> takes
>    > about 100 seconds.  This is likely to scale linearly with the number
>    > of images already present (specifically, with the number of files in
>    > the image directory and all of its children).
>
>    I think I mentioned this before: If you store for each directory
>    involved whether it's a leaf directory, you can skip those leaf
>    directories that haven't changed.  I'd wager in most setups, this
>    would skip the majority of old images.
>
> The issue is minimizing the number of filesystem accesses.  How would
> you propose determining whether a leaf directory hasn't changed
> without looking at the contents of the directory?

The point is that we're looking for *new* images, not changed images.  If
a directory at last scan had no subdirectories, then if the directory
timestamp is unchanged, no new files or directories can have been added,
and we can skip the directory entirely.

> The problem here is that QDir insists on stat'ing every directory
> entry, even if there's no filter of any kind.  What we need for
> starters is a way to get the directory entries without calling stat()
> (or access(), which also has to read the inode).  KPA should eliminate
> all entries it knows it doesn't care about -- entries that already
> exist in the database or that can otherwise be identified by *name
> alone* as being uninteresting.  The reason I say "name alone" is that
> retrieving the name of a file doesn't incur the overhead of reading
> its inode from disk.

Is it QDir::entryList you are looking for?  Since it just returns strings,
I imaging it might not stat() everything, and the docs claim that it is
more efficient than the other listing functions.

-Lars



More information about the Kphotoalbum mailing list