[KPhotoAlbum] NVMe

Sun Oct 20 18:57:21 BST 2019

So it looks like I've got the following numbers.  I'm showing 2
significant figures here; in reality, probably no more than 1, maybe
1.5, are really significant in most cases.

* PCIe gen3/x4 NVMe:

  4 scout/preload MD5: 1.9 GB/sec
  4 scout/no preload: 490 MB/sec (75-80% CPU)
  1 scout/preload MD5: 480 MB/sec
  1 scout/no preload: 480 MB/sec (75-80% CPU)
  2 scout/preload MD5: 1.2 GB/sec
  5 scout/preload MD5: 1.9 GB/sec (maybe slightly faster than 4 scouts)
  6 scout/preload MD5: 1.75 GB/sec

  All of these were about 90-95% CPU consumption except as noted,
  regardless of I/O throughput.  What I think is happening is that at
  the lower throughput the extra CPU is going toward building
  thumbnails.

* HDD:

  4 scout/preload MD5: 70-75 MB/sec (490 IO/sec)
  4 scout/no preload: 75-80 MB/sec (115 IO/sec)
  1 scout/preload MD5: 95 MB/sec (900 IO/sec)
  1 scout/no preload: 95-98 MB/sec (150 IO/sec)
  2 scout/no preload: 65-70 MB/sec

  All generally <20% CPU

* SATA SSD

  4 scout/preload MD5: 380 MB/sec
  4 scout/no preload: 480 MB/sec
  1 scout/preload MD5: 200 MB/sec
  1 scout/no preload: 370 MB/sec
  2 scout/no preload: 440 MB/sec
  3 scout/no preload: 470 MB/sec
  5 scout/no preload: 470 MB/sec

  CPU varied considerably, generally in parallel with I/O throughput.

So the general themes are:

* On NVMe devices (fast ones, at any rate), more scout threads (up to
  4 on my system, which coincidentally or not is the number of cores)
  and computing MD5 during preload gives a big benefit.  It appears
  that throughput scales with threads up to the number of cores
  available.  I won't have a chance this week, but at some point I'll
  have to try on my Ryzen 2700X (with 8 cores that are at least
  somewhat faster than those on my laptop).  I know that my NVMe can
  do better than 2 GB/sec.

* On SADA SSDs, more scout threads is a benefit although it levels
  off, but computing MD5 on preload is distinctly detrimental.

* On HDDs, more scout threads is detrimental, but when the MD5 is
  computed is of little import.

It would be interesting to see what would happen on slower and faster
NVMe devices and slower/faster/more core CPUs.  It would also be
interesting to see what happens on network filesystems, if someone
wants to try, but if you do, make sure to record information about the
server, network, and remote filesystem location/type in addition to
the client.

The main benefits to this work are probably for initial impression,
initial database load, and loading large numbers of images.  For
someone who wants to try out KPA or start a large database, having
very fast load times will make for a good first impression.  The
thumbnails won't all be built by the end of load, but from a user
interaction standpoint that likely doesn't matter; if they start from
the top and scroll down, the thumbnail building will probably already
be ahead.  For loading many gigabytes of images onto fast storage, the
benefits of correct tuning are obvious.
-- 
Robert Krawitz                                     <rlk at alum.mit.edu>

***  MIT Engineers   A Proud Tradition   http://mitathletics.com  ***
Member of the League for Programming Freedom  --  http://ProgFree.org
Project lead for Gutenprint   --    http://gimp-print.sourceforge.net

"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton