[KPhotoAlbum] Benchmarks on Robert's preload settings
rlk at alum.mit.edu
Sun Oct 27 04:16:57 GMT 2019
On Sat, 26 Oct 2019 21:27:13 +0200, Andreas Schleth wrote:
> Hi Robert,
> I did quite a few runs in a more or less controlled setting (normal KDE
> desktop with a few terminals open but no "real" activity):
> ~1400 images to read
> rm index.xml exif-info.db .thumbnails/*
> kphotoalbum -c index.xml
> before each run, I cleared the cache in a separate window (as root):
> sync;echo 3 > /proc/sys/vm/drop_caches
> These are the results sorted by duration (only the time it takes to
> read/process the images after rescan...):
Thank you for collecting all of this information.
> storage Preset preload Thr Threads pre thumbs Threads
> Thumbs second avg cpu %
> Nfs - repeat manual x 8 8 8 13 96.7
> ssd manual x 4 4 4 18 69.2
> ssd manual x 4 4 0 18 92.6
> ssd manual x 2 2 2 19 44.2
> Nfs - repeat NET x 4 4 4 19 75.2
> ssd manual 0 1 1 4 20 61.0
> ssd manual 0 1 1 0 20 76.1
> Ssd - repeat SATA x 4 4 4 20 76.3
> ssd manual x 8 8 8 20 78.6
> ssd manual x 2 4 0 20 87.4
> ssd manual x 1 1 0 23 73.7
> ssd SATA x 4 4 4 36 43.9
> nfs manual 0 4 4 4 154 13.1
> nfs manual x 4 4 4 154 16.6
> nfs manual x 8 8 8 154 16.8
> nfs manual x 4 4 0 155 16.5
> nfs manual x 2 2 2 162 15.7
> nfs manual 0 1 1 4 175 10.8
> nfs manual 0 1 1 0 175 10.9
> nfs NET 0 1 1 0 176 11.0
> It took a while until I found out, that the thread settings have an
> effect only if the checkbox "do preload" is checked.
That shouldn't be the case; that simply controls whether MD5 checksums
are computed during the preload (scout) phase or separately. My own
experiments have shown that that does make a difference, particularly
with mechanical hard drives and NVME IIRC. It's possible that it
doesn't make as much difference on SATA SSDs and NFS. Although the
result of computing checksums on preload is very different -- it
greatly improves matters on NVMe drives, hurts on spinning rust.
It also validates my hypothesis that more threads helps with NFS,
which I assume to be the case because it allows overlapping disk I/O
with network. The other parameters don't appear to make any
difference with NFS.
> There are 3 lines with "repeat" in the table. These runs were done
> without clearing the cache beforehand.
Interesting that you get faster results with the "repeat" NFS than
with the SSD in one case, even with otherwise identical parameters
> If the images were copied or otherwise read by any process before
> reading the images, the files get transferred from memory instead from
> disk. If you have enough memory.
Right, which is the point of the scout (aka "preload") threads.
The interesting thing with NVMe devices (at least what I have, a
midrange gen3 x4 device) is that they're so fast that it's actually
faster (in terms of time to get control back) to allow the MD5 and
EXIF to get so far ahead of thumbnail generation that the thumbnail
generation actually has to re-read the data. Of course, getting
control back doesn't then mean that all of the thumbnails are
generated, but if you start from the top and scroll down you'll either
stay behind the thumbnail generation or it will be so fast that it
> My NFS is on a RAID1 spinning rust server via GB-Ethernet. While reading
> from NFS, the transfer rate was around 70 MB per sec in all these cases.
> My ssd delivers around 480 MB per sec.
OK, so it's a SATA SSD.
> Thus, the repeat - results are for an optimal disk (reading from memory).
> My machine is a Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (skylake, 4 real
> cores with hyperthreading enabled, reporting as 8 cores in top), memory
> is 32 GB.
> The measurements were done with: dstat -a --output xxx.csv
> This gives one reading per second. I just counted the seconds with high
> transfer or CPU rates. Thus the values might be off by a second or two.
> My conclusions from this test are:
> - preloading speeds things up, at least a little.
It's always preloading; the issue is whether the MD5 calculations are
done at preload time or afterwards.
> - funny enough, nfs seems to gain a bit more (175 to 155 sec) than the
> ssd (20 to 18 sec).
That doesn't surprise me, actually; read my comment in
> - using as many threads as there are logical cpus does not clog the
> pipes: nfs just spits out its usual 70 MB/sec and takes its time without
> having any adverse effects on the rest of the system (except using a lot
> of bandwidth). Thus, my worries were not justified.
I suspect this will depend a lot on network characteristics, physical
storage backend, etc.
> - The presets are slower than manual settings.
I deliberately made the presets a bit conservative, and I expect that
careful tuning for one's system will get some improvement. But it
looks like I do want to retune at least the network filesystem for at
least two scout threads.
> - I will keep the x-8-8-8 scheme for the time being (even while working
> over nfs). I might just find a few files in the cache.
> - It might be a good idea to start up KPA directly after you copied the
> images into their folders: copy also loads the files into the cache
> where KPA finds them faster than from disk. (I tested this.)
Depending upon the size of your image load and RAM, yes.
> - it is absolutely necessary to flush the cache before you do any of
> these test. Otherwise you get overly fast results.
Yes (unless you want to see just what happens with the images in RAM,
athough it's difficult at best to detect what is and isn't in RAM).
> - for tests of different SSDs you would need more images to get a better
Yes. I'm doing my test with about 10K images totaling over 100GB.
> I like what I saw and will keep the current version (v5.5-150-g323e2b29)
> "in production" (until the next tests are due).
> Thanks for working on and improving KPA!
Robert Krawitz <rlk at alum.mit.edu>
*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net
"Linux doesn't dictate how I work, I dictate how Linux works."
More information about the Kphotoalbum