[KPhotoAlbum] Startup performance: the final frontier

Fri Jun 2 04:51:05 BST 2017

On Thu, 1 Jun 2017 22:36:38 +0200, Andreas Schleth wrote:
> Am 31.05.2017 um 04:22 schrieb Robert Krawitz:
>> The last big performance problem I have with kpa is
> ...
>
> Hi Robert,
>
> I have been following your exchanges with Johannes about the performance problems of KPA for a while and would like to add my two cents (or is it five?).
>
> First, changing the structure of the index.xml at this point would be a really bad move from my point of view.  Why?
>
> I still use a canned version KPA 4.1.whatever with an ancient Suse 11.2 in a virtual machine if I have to time shift some images from a foreign source. And from there I work directly on my original database via host-only network.  In that old KPA version (KDE4) the kipi-plugin for time shifting still works. In the current version it is out of business because some design decisions of the digicam community.  AFAIK Martin Füssel is looking into this, but this might take a while.

If you're using essentially a KPA appliance, none of this should
matter -- you're going to be using a fixed version of KPA.  There's no
suggestion that KPA would not be able to recognize an older data file.

I wasn't actually suggesting any particular changes in the XML file
format; I wanted to look at startup time overall.  It looks like the
time spent reading the XML file is only a bit more than half of the
total timme (about 10 seconds out of 18 total to start up).  I tried
replacing the code that actually reads the images with code that does
just enough parsing to get through the file; that got the read time
down to 3 seconds, so that doesn't look all that fruitful.

I'm not a big fan of using XML for this purpose period.  It would be a
much better fit for a relational database, IMO.  The schema is pretty
straightforward.

> The charming thing about KPA at the moment is, that my database operations with this ancient KPA version on index.xml files (maintained normally with the most recent git master) do not corrupt the database.
> Thus, this is a feature (backward compatibility), that would be lovely to keep for the time being.
> [[I probably only understood some 10% of what you wrote about, but seeing mentions of xml structure and parsing made me listen up ...]]

Again, I'm not proposing breaking back compatibility.  Don't confuse
back compatibility with forward compatibility.  Your old version of
kpa might not be able to read a new index.xml file (and certainly
wouldn't be able to handle the EXIF database, if for no other reason
than you probably don't have sqlite3 around), so there may not be
forward compatibility.

> Second, as you are looking at performance ... did you perchance ever look at thumbnail generation?  My database of some 30000 images resides on a NFS share and rebuilding thumbnails still takes close to 30 minutes with KPA not being very reactive in the meantime, even tending to crash in the (longish) interval until the progress bar first shows up - this is after Johannes did some improvements to the process already.  So, this would be a point where you could harvest many minutes and not seconds of performance gain :-)  [[Also the files in the .thumbnails folder always have the wrong (for my setup) permissions (rw-r--r-- while everything else is rw-rw-r--) - but this might be a problem existing between keyboard and (my) chair]]

It's very hard for me to evaluate the numbers without more
information.  How big are your images, how fast is your network (both
bandwidth and latency), what's behind the NFS server (flash or
spinning rust of some variety), what's your CPU utilization while
you're building the thumbnails, how long does it take using local
disk?  I don't remember exactly how long it takes me to build
thumbnails, but offhand that doesn't sound very slow to me.  That's
about 17 images/second you're thumbnailing; considering that each
image requires a variety of NFS operations, some of which hit the
disk, you may not be able to do a lot better.  I did look at the code,
and the thumbnails are batched up and only written every 100 images,
so the code's avoiding small writes (which are very bad over NFS).

Also, why do you frequently rebuild your thumbnails?  That's usually
something you do once and forget, unless you want to keep changing
your thumbnail size, which is going to be inefficient whatever you do.

The thumbnail storage is optimized for reading (viewing), and in my
experience, optimized very well indeed -- you can jump scroll and the
thumbnails keep up, even if you jump around all over the place.  I
remember that there was a lot of work done on it.  They're stored as
fixed size (uncompressed) bitmaps in a hashed file, so reading a
thumbnail is one seek and one read.

NFS works best for streaming I/O.  That's true for most high latency
protocols.  It is not good for small random I/O.
-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton