[KPhotoAlbum] Speaking of performance...
Robert Krawitz
rlk at alum.mit.edu
Sun Jan 27 17:00:13 GMT 2019
On Sun, 27 Jan 2019 17:31:03 +0100, Andreas Schleth wrote:
> Hi Robert,
>
> 300000 shots in one database is a lot of images...
> Do you really sift through *all* of them every other day?
By no means, but if I want to look at even a few (e. g. for a shot
from the last game), I have to start kpa.
About 60% of my shots are from sports (I'm official unofficial
photographer MIT football and men's and women's basketball). I
probably should toss a lot more than I do, but I don't. These days I
typically take about 2500 frames at a basketball game and select maybe
300-350 (after making an adjustment to the AF settings on my Canon
7DmkII, I get a lot more keepers). These, of course, are all JPEG to
begin with. The only processing I do is crop and adjust angle, for
which I use RawTherapee (I'd prefer Darktable because it's faster on
the output side, but the crop workflow is less efficient), and then
add a watermark.
The feedback I get from team members, coaches, and parents is that
they really like having a lot of action shots. Getting good action
shots means shooting a lot of frames.
> I attacked the sheer quantity of images by using three tiers:
> 1. raw development: only shots that are basically OK (and not
> duplicates) get through, all the rest is discarded permanently (this
> takes care of 1/3 to 1/2 of the shots).
See above about RAW. I find the JPEG engine these days to be good
enough that I seldom need to use RAW, really only when I have dynamic
range issues.
> 2. KPA then only gets the jpegs in a parallel folder structure
> (20xx/01_xx_topic). As I rarely go back to redevelop a raw image, some
> searching in parallel folders is OK for me. This KPA is my personal DB.
> If this DB gets too large for my tastes, I spawn a new one and keep the
> old one as an archive.
I prefer to keep images in the same layout as how they come out of the
camera -- again, it's a lot faster that way.
> A better handling of parallel DBs (like syncing categories and tags
> between them) could help. e.g.: split the DB into one (or more) common
> tag/category-section(s) with multiple image sections that only get read
> on demand. Each img-DB should then linked to a certain cat/tag-DB to
> enable different sets of categories for different purposes.
That wouldn't be too helpful, since I'd still have one big database of
(currently) 190K frames for sports.
> So, there might be other venues to attack the question of startup time
> than just doing things the same as before, but only faster. But You are
> right: KPA momentarily has inherent limitations when the DB becomes too
> large - this will hit each of us. Some earlier, some later.
The best time to solve problems is before they hit.
> Just my 2 cents ...
> Last not least: I really appreciate Your recent work on startup time!
You're welcome.
> Am 27.01.19 um 16:28 schrieb Robert Krawitz:
>> We really do need to attack the startup performance somehow, for
>> people (like myself) with big collections. I currently have a little
>> over 300,000 shots in my collection, and depending upon how well we do
>> in the postseason that could increase by another 10% over the next few
>> months. It takes about 13 seconds for kpa to start up, largely due to
>> the XML parsing. Since the XML file is "only" 57 MB, it's clearly not
>> I/O-limited.
>>
>> I understand (and agree with) the desire for a readable and editable
>> file format. I've fixed things up myself on occasion. But I don't
>> want to pay that kind of startup price every time.
>>
>> What I'm thinking in terms of is to save the file in two formats, a
>> fast format (which could be an SQL database, a binary serialization,
>> or such) and the XML format. The fast format would have an embedded
>> timestamp; if the XML file were newer, it would be used instead, or
>> the user would be prompted to choose which.
>>
>> Autosave would save only the fast format (possibly only a delta, but
>> that would likely be quite difficult). Full save would save both
>> formats; if we were really clever, we might be able to parallelize the
>> two operations.
--
Robert Krawitz <rlk at alum.mit.edu>
*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net
"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
More information about the Kphotoalbum
mailing list