[KPhotoAlbum] Speaking of performance...

Robert Krawitz rlk at alum.mit.edu
Sun Feb 17 00:06:16 GMT 2019

On Sun, 27 Jan 2019 10:28:38 -0500, Robert Krawitz wrote:
> We really do need to attack the startup performance somehow, for
> people (like myself) with big collections.  I currently have a little
> over 300,000 shots in my collection, and depending upon how well we do
> in the postseason that could increase by another 10% over the next few
> months.  It takes about 13 seconds for kpa to start up, largely due to
> the XML parsing.  Since the XML file is "only" 57 MB, it's clearly not
> I/O-limited.
> I understand (and agree with) the desire for a readable and editable
> file format.  I've fixed things up myself on occasion.  But I don't
> want to pay that kind of startup price every time.
> What I'm thinking in terms of is to save the file in two formats, a
> fast format (which could be an SQL database, a binary serialization,
> or such) and the XML format.  The fast format would have an embedded
> timestamp; if the XML file were newer, it would be used instead, or
> the user would be prompted to choose which.
> Autosave would save only the fast format (possibly only a delta, but
> that would likely be quite difficult).  Full save would save both
> formats; if we were really clever, we might be able to parallelize the
> two operations.

So one possible way to eke out a little improvement might -- I haven't
actually tried this to evaluate the performance difference, but it
saved about 20% (44 MB vs. 55 MB) in size -- is to use single
character XML tags rather than the verbose ones we currently use.  It
would at least allow dispatching via a switch on the first character
of the tag rather than having to do a strcmp.

We could also store the times as a simple decimal (or hex?) seconds
since the epoch.  In decimal, these would be 10 or (for old photos) 9
bytes; in hex they would be 8 bytes for quite a while yet.  That
compares to 19 currently, plus the more complex parsing needed.
Saving 10 bytes, with 300,000 photos, would represent another 3 MB or
so, which would amount to about 25% saving over the current format
(although that would make dates unreadable).

I suspect the improvement in load time would be small, very likely not
enough to really matter.  In any event, it will be a while yet before
I have any chance to look at this.
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton

More information about the Kphotoalbum mailing list