[KPhotoAlbum] Speed up new image load time
Johannes Zarl-Zierl
johannes at zarl-zierl.at
Tue May 30 20:20:13 BST 2017
Hi Robert,
Thanks for providing these patches! They are appreciated ;-)
I'm a little sleep deprived right now, so please bear with me if I don't merge
them right away.
@Tobias: If you have time to review and merge Robert's patches, I won't mind
:)
Cheers,
Johannes
On Montag, 29. Mai 2017 19:35:50 CEST Robert Krawitz wrote:
> On Mon, 29 May 2017 19:05:47 -0400 (EDT), Robert Krawitz wrote:
> > On Mon, 29 May 2017 18:47:05 -0400 (EDT), Robert Krawitz wrote:
> >> On Mon, 29 May 2017 17:27:49 -0400 (EDT), Robert Krawitz wrote:
> >>> Some timings, for loading 1133 images:
> >>> Old New
> >>>
> >>> 20 MP 5:41 0:32
> >>
> >> ...
> >>
> >>> It looks like storing the EXIF data in the database takes about 3
> >>> seconds. The next big time consumer is file version detection; if I
> >>> turn that off, the total time drops off to about 7 seconds. At that
> >>> point, in a realistic scenario, I'd likely be I/O-bound; if I were
> >>> loading 3000 images (30 GB, typically), I'd need on the order of
> >>> 250-300 seconds just to read the data from disk. But if someone were
> >>> storing their images on nVME, it might matter.
> >>
> >> Well, there's some very low hanging fruit here: the modified file
> >> detection computes the MD5 checksum of each file twice! It's a very
> >> simple matter to get rid of one of those; the time drops to about 20
> >> seconds (which is consistent with what I saw running md5sum on all of
> >> the files: it took about 10 seconds).
> >
> > If I take out MD5 checksumming altogether it drops to about 8 seconds,
> > as would be expected.
> >
> > Of that time, about 3-4 seconds is spent in what looks like saving the
> > EXIF data, 2-3 seconds scanning the filesystem, and 2-3 seconds
> > reading the files in (when I interrupted gdb several times during
> > that, it looked like most of it was library routines scanning the EXIF
> > headers).
> >
> > So, 20'ish seconds to read in 1100 files, which would normally be
> > around 10 GB. And that's with a fairly slow processor; with a
> > contemporary fast processor it would be more like 10. With a large
> > amount of data, thatt would be completely I/O-bound unless you had an
> > nVME.
> >
> > I think this problem is solved.
>
> I tried the same experiment on my server (i7-5820K, with single
> threads a bit more than twice as fast as my laptop). The first time I
> loaded the new files, it was on a pace to take something like a
> minute. When I repeated it, it took 15 seconds. That's I/O-bound,
> and short of not computing the MD5, there's not much we can do.
>
> One option, if detect duplicate files isn't turned on, would be to
> compute the MD5 checksum only when the thumbnails are created or the
> image viewed. Since the working set of the images is frequently
> larger than RAM, this would save on I/O. But it would be rather
> complicated, I suspect.
>
> This may not be entirely accurate, because I ran it to a remote
> display (my laptop). But I suspect it's not off by much.
More information about the Kphotoalbum
mailing list