[KPhotoAlbum] Speaking of performance...

Sun Jan 27 19:55:19 GMT 2019

On Sun, 27 Jan 2019 19:56:51 +0100, Bruno Pagani wrote:
> Le 27/01/2019 =C3  16:28, Robert Krawitz a =C3=A9crit  :
>> We really do need to attack the startup performance somehow, for
>> people (like myself) with big collections.  I currently have a little
>> over 300,000 shots in my collection, and depending upon how well we do
>> in the postseason that could increase by another 10% over the next few
>> months.  It takes about 13 seconds for kpa to start up, largely due to
>> the XML parsing.  Since the XML file is "only" 57 MB, it's clearly not
>> I/O-limited.
>>
>> I understand (and agree with) the desire for a readable and editable
>> file format.  I've fixed things up myself on occasion.  But I don't
>> want to pay that kind of startup price every time.
>>
>> What I'm thinking in terms of is to save the file in two formats, a
>> fast format (which could be an SQL database, a binary serialization,
>> or such) and the XML format.  The fast format would have an embedded
>> timestamp; if the XML file were newer, it would be used instead, or
>> the user would be prompted to choose which.
>>
>> Autosave would save only the fast format (possibly only a delta, but
>> that would likely be quite difficult).  Full save would save both
>> formats; if we were really clever, we might be able to parallelize the
>> two operations.
>
> I don't have a lot of knowledge in that domain, but this might be of
> interest:
>
> https://github.com/hughsie/libxmlb/

XPath isn't the right model for this.  We're not looking to select
nodes matching certain criteria from the database at startup; we're
looking to load the entire database into memory.  This might be
workable (possibly at some runtime cost) if all queries were based on
criteria in the index file, but they aren't -- we also need to query
on EXIF data, which is stored in a separate database.

This, of course, begs the question of why the EXIF data is stored in a
separate database rather than in the index.xml file (or, of course,
vice versa).  We have only two tables (the trivial settings table, and
the exif table, which stores all of the data).  All of the queries are
simple conjunction queries against one table (we don't even have
tables for lenses and cameras).

The index.xml file actually does represent something more like a
relational database.  It contains four tables (categories, images,
blocklist, and member-groups), with a raft of relationships between
all except the blocklist.  If the index file is stored in the compact
format, those keys are very explicit.
-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton