[KPhotoAlbum] Speaking of performance...

Sun Jan 27 17:00:13 GMT 2019

On Sun, 27 Jan 2019 17:31:03 +0100, Andreas Schleth wrote:
> Hi Robert,
>
> 300000 shots in one database is a lot of images...
> Do you really sift through *all* of them every other day?

By no means, but if I want to look at even a few (e. g. for a shot
from the last game), I have to start kpa.

About 60% of my shots are from sports (I'm official unofficial
photographer MIT football and men's and women's basketball).  I
probably should toss a lot more than I do, but I don't.  These days I
typically take about 2500 frames at a basketball game and select maybe
300-350 (after making an adjustment to the AF settings on my Canon
7DmkII, I get a lot more keepers).  These, of course, are all JPEG to
begin with.  The only processing I do is crop and adjust angle, for
which I use RawTherapee (I'd prefer Darktable because it's faster on
the output side, but the crop workflow is less efficient), and then
add a watermark.

The feedback I get from team members, coaches, and parents is that
they really like having a lot of action shots.  Getting good action
shots means shooting a lot of frames.

> I attacked the sheer quantity of images by using three tiers:
> 1. raw development: only shots that are basically OK (and not
> duplicates) get through, all the rest is discarded permanently (this
> takes care of 1/3 to 1/2 of the shots).

See above about RAW.  I find the JPEG engine these days to be good
enough that I seldom need to use RAW, really only when I have dynamic
range issues.

> 2. KPA then only gets the jpegs in a parallel folder structure
> (20xx/01_xx_topic). As I rarely go back to redevelop a raw image, some
> searching in parallel folders is OK for me. This KPA is my personal DB.
> If this DB gets too large for my tastes, I spawn a new one and keep the
> old one as an archive.

I prefer to keep images in the same layout as how they come out of the
camera -- again, it's a lot faster that way.

> A better handling of parallel DBs (like syncing categories and tags
> between them) could help. e.g.: split the DB into one (or more) common
> tag/category-section(s) with multiple image sections that only get read
> on demand. Each img-DB should then linked to a certain cat/tag-DB to
> enable different sets of categories for different purposes.

That wouldn't be too helpful, since I'd still have one big database of
(currently) 190K frames for sports.

> So, there might be other venues to attack the question of startup time
> than just doing things the same as before, but only faster. But You are
> right: KPA momentarily has inherent limitations when the DB becomes too
> large - this will hit each of us. Some earlier, some later.

The best time to solve problems is before they hit.

> Just my 2 cents ...
> Last not least: I really appreciate Your recent work on startup time!

You're welcome.

> Am 27.01.19 um 16:28 schrieb Robert Krawitz:
>> We really do need to attack the startup performance somehow, for
>> people (like myself) with big collections.  I currently have a little
>> over 300,000 shots in my collection, and depending upon how well we do
>> in the postseason that could increase by another 10% over the next few
>> months.  It takes about 13 seconds for kpa to start up, largely due to
>> the XML parsing.  Since the XML file is "only" 57 MB, it's clearly not
>> I/O-limited.
>>
>> I understand (and agree with) the desire for a readable and editable
>> file format.  I've fixed things up myself on occasion.  But I don't
>> want to pay that kind of startup price every time.
>>
>> What I'm thinking in terms of is to save the file in two formats, a
>> fast format (which could be an SQL database, a binary serialization,
>> or such) and the XML format.  The fast format would have an embedded
>> timestamp; if the XML file were newer, it would be used instead, or
>> the user would be prompted to choose which.
>>
>> Autosave would save only the fast format (possibly only a delta, but
>> that would likely be quite difficult).  Full save would save both
>> formats; if we were really clever, we might be able to parallelize the
>> two operations.

-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton