[KPhotoAlbum] Performance issue: KPA too slow whenclickingon "big" supercategories
lars at raeder.dk
Thu Dec 28 05:46:27 GMT 2006
Robert L Krawitz said:
> Date: Thu, 28 Dec 2006 02:51:56 +0100 (CET)
> From: "Lars Clausen" <lars at raeder.dk>
> Robert L Krawitz said:
> > From: "Jesper K. Pedersen" <blackie at blackie.dk>
> > Date: Tue, 26 Dec 2006 12:05:05 +0100
> > One of my goals is to push all this onto a worker thread, so it is
> > backgrounded. Esp md5sum calculation is tedious at when new images
> > are found.
> > The time required to actually import the images isn't bad. The
> > problem is the time spent scanning for images when you're only
> > importing a few.
> > Importing 100 images or so only takes about a minute. Scanning my
> > filesystem for new images, with 17000 already present, currently
> > about 100 seconds. This is likely to scale linearly with the number
> > of images already present (specifically, with the number of files in
> > the image directory and all of its children).
> I think I mentioned this before: If you store for each directory
> involved whether it's a leaf directory, you can skip those leaf
> directories that haven't changed. I'd wager in most setups, this
> would skip the majority of old images.
> The issue is minimizing the number of filesystem accesses. How would
> you propose determining whether a leaf directory hasn't changed
> without looking at the contents of the directory?
The point is that we're looking for *new* images, not changed images. If
a directory at last scan had no subdirectories, then if the directory
timestamp is unchanged, no new files or directories can have been added,
and we can skip the directory entirely.
> The problem here is that QDir insists on stat'ing every directory
> entry, even if there's no filter of any kind. What we need for
> starters is a way to get the directory entries without calling stat()
> (or access(), which also has to read the inode). KPA should eliminate
> all entries it knows it doesn't care about -- entries that already
> exist in the database or that can otherwise be identified by *name
> alone* as being uninteresting. The reason I say "name alone" is that
> retrieving the name of a file doesn't incur the overhead of reading
> its inode from disk.
Is it QDir::entryList you are looking for? Since it just returns strings,
I imaging it might not stat() everything, and the docs claim that it is
more efficient than the other listing functions.
More information about the Kphotoalbum