[KPhotoAlbum] NVMe
Robert Krawitz
rlk at alum.mit.edu
Tue Oct 22 01:36:05 BST 2019
On Mon, 21 Oct 2019 20:59:43 +0200, Andreas Schleth wrote:
> Hi Robert,
> what do I do, to obtain your branch and how do I set the thread
> numbers/preload options?
git clone git at git.kde.org:/kphotoalbum.git
git checkout parallel-md5
The two tunables are IMABE_SCOUT_THREAD_COUNT and PRELOAD_MD5 in
DB/NewImageFinder.cpp. This of course is prototype.
> How do you instrument the code to obtail such detailled performance numbers?
> My knowledge does not go much further than "time executable"...
> That said, I'll be happy to provide some numbers on my setup here.
> Cheers, Andreas
I simply ran
iostat 5
to get I/O throughput and total CPU consumption.
That said, I also need to stopwatch-time the loading; if I add more
threads for thumbnail generation the throughput doesn't drop but it
looks like fewer images are being read per second, probably because
after a while the thumbnailing starts reading more data.
I've done profiling (via kcachegrind) in earlier phases of this work
(Load-performance, elide_unnecessary_metadata, exifdb_improvements,
startup-performance, no-statvfs) because user CPU was involved in a
lot of those improvements. This work has very little to do with user
CPU; it's a function of I/O throughput and to some extent scheduling,
which profiling won't help with.
> Am 20.10.19 um 19:57 schrieb Robert Krawitz:
>> So it looks like I've got the following numbers. I'm showing 2
>> significant figures here; in reality, probably no more than 1, maybe
>> 1.5, are really significant in most cases.
>>
>> * PCIe gen3/x4 NVMe:
>>
>> 4 scout/preload MD5: 1.9 GB/sec
>> 4 scout/no preload: 490 MB/sec (75-80% CPU)
>> 1 scout/preload MD5: 480 MB/sec
>> 1 scout/no preload: 480 MB/sec (75-80% CPU)
>> 2 scout/preload MD5: 1.2 GB/sec
>> 5 scout/preload MD5: 1.9 GB/sec (maybe slightly faster than 4 scouts)
>> 6 scout/preload MD5: 1.75 GB/sec
>>
>> All of these were about 90-95% CPU consumption except as noted,
>> regardless of I/O throughput. What I think is happening is that at
>> the lower throughput the extra CPU is going toward building
>> thumbnails.
>>
>> * HDD:
>>
>> 4 scout/preload MD5: 70-75 MB/sec (490 IO/sec)
>> 4 scout/no preload: 75-80 MB/sec (115 IO/sec)
>> 1 scout/preload MD5: 95 MB/sec (900 IO/sec)
>> 1 scout/no preload: 95-98 MB/sec (150 IO/sec)
>> 2 scout/no preload: 65-70 MB/sec
>>
>> All generally <20% CPU
>>
>> * SATA SSD
>>
>> 4 scout/preload MD5: 380 MB/sec
>> 4 scout/no preload: 480 MB/sec
>> 1 scout/preload MD5: 200 MB/sec
>> 1 scout/no preload: 370 MB/sec
>> 2 scout/no preload: 440 MB/sec
>> 3 scout/no preload: 470 MB/sec
>> 5 scout/no preload: 470 MB/sec
>>
>> CPU varied considerably, generally in parallel with I/O throughput.
>>
>> So the general themes are:
>>
>> * On NVMe devices (fast ones, at any rate), more scout threads (up to
>> 4 on my system, which coincidentally or not is the number of cores)
>> and computing MD5 during preload gives a big benefit. It appears
>> that throughput scales with threads up to the number of cores
>> available. I won't have a chance this week, but at some point I'll
>> have to try on my Ryzen 2700X (with 8 cores that are at least
>> somewhat faster than those on my laptop). I know that my NVMe can
>> do better than 2 GB/sec.
>>
>> * On SADA SSDs, more scout threads is a benefit although it levels
>> off, but computing MD5 on preload is distinctly detrimental.
>>
>> * On HDDs, more scout threads is detrimental, but when the MD5 is
>> computed is of little import.
>>
>> It would be interesting to see what would happen on slower and faster
>> NVMe devices and slower/faster/more core CPUs. It would also be
>> interesting to see what happens on network filesystems, if someone
>> wants to try, but if you do, make sure to record information about the
>> server, network, and remote filesystem location/type in addition to
>> the client.
>>
>> The main benefits to this work are probably for initial impression,
>> initial database load, and loading large numbers of images. For
>> someone who wants to try out KPA or start a large database, having
>> very fast load times will make for a good first impression. The
>> thumbnails won't all be built by the end of load, but from a user
>> interaction standpoint that likely doesn't matter; if they start from
>> the top and scroll down, the thumbnail building will probably already
>> be ahead. For loading many gigabytes of images onto fast storage, the
>> benefits of correct tuning are obvious.
--
Robert Krawitz <rlk at alum.mit.edu>
*** MIT Engineers A Proud Tradition http://mitathletics.com ***
Member of the League for Programming Freedom -- http://ProgFree.org
Project lead for Gutenprint -- http://gimp-print.sourceforge.net
"Linux doesn't dictate how I work, I dictate how Linux works."
--Eric Crampton
More information about the Kphotoalbum
mailing list