<html><head></head><body>Hi Robert,<br>Out there (actually here) we also have the use case of slow rust storage via network (NFS). Using too much concurrent io might virtually kill my server. So an option to limit the io threads manually might be nice. Thumbnail generation might tap into the system generated thumbnails and reuse them...<br>Cheers, Andreas<br><br><div class="gmail_quote">Am 20. Oktober 2019 07:09:54 MESZ schrieb Robert Krawitz <rlk@alum.mit.edu>:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

<pre class="k9mail">On Sun, 13 Oct 2019 19:39:52 -0400 (EDT), Robert Krawitz wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;"> Unfortunately, I'm not getting a lot of benefit from use of an NVMe;<br> it looks like I'm hitting other limits right away (MD5 checksumming<br> and thumbnail extraction) even with everything cached in memory.<br><br> With increasing thread counts, it would be very uesful to be able to<br> farm out checksumming and thumbnail generation.  Checksum generation<br> shouldn't be much of a problem; it could be computed by the scout<br> thread and stored in a hash, and only computed by the loader if it<br> doesn't exist.  The good news (subject to verifying that it did the<br> checksum correctly) is that even 3-way parallelism (3 scout threads)<br> got me to 1.8 GB/sec I/O rate, something like 200 files/sec.<br> Unfortunately, this then gets ahead of thumbnail generation, with the<br> result that images have size (-1, -1) until their thumbnails get<br> created.  Still need to figure out how to deal with that.<br></blockquote><br>I've created a parallel-md5 branch that prototypes this.  With fast<br>backing store, such as the Inland Premium 2TB NVMe, image loading<br>really flies; I'm getting about 1.9 GB/sec loading images, limited by<br>MD5 checksum generation on a processor that's not especially fast by<br>recent standards (Xeon E3-1505M, 4x2 Skylake at 2.8/3.7 GHz).  I'm<br>using 4 scout threads to get there, with the scouts doing the MD5<br>calculation.  With RAID gen4 NVMe on a Threadripper or higher thread<br>count Epyc the results would be interesting.<br><br>Thumbnail generation of course lags badly on my hardware.  The result<br>is that I'm actually doing about 2x as much total I/O, but the user<br>gets control back very quickly.  I managed to get the image size<br>during preload, so the -1 problem went away.<br><br>The hard part's going to be figuring out how to autotune the number of<br>scout threads.</pre></blockquote></div><br>-- <br>Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.</body></html>