Scanner benchmark

Jeff Mitchell mitchell at kde.org
Wed Nov 17 00:48:56 CET 2010


On 11/13/2010 03:07 PM, Leo Franchi wrote:
> Hello,
> 
> Below are my observations too, just to see if other users' compare.
> 
> On Sat, Nov 13, 2010 at 4:06 AM, Mikko C. <mikko.cal at gmail.com> wrote:
>> Hi,
>> I found some time to run some tests with the new scanner.
>>
>> Amarok from git master of today:
>> Full rescan with the collection already being present on the external
>> MySQL database.
>>
>> - 11:30 mins for the first scanning part (up to 50% in the progress bar)
>> - 2:50 mins for the last part (remaining 50%)
>>
>> Total time: around 14:20 mins.
>>
>> tracks found: 21113
>> albums found: 1703
>> artists found: 1013
> 
> Rescan with empty mysql database:
> 
> 11:00 amarokcollectionscanner run
> 16:00 scan result processing / committing
> 
> total of 26:00
> 
> 47 636 tracks.
> 
> Old scanner:
> 
> 11:30 total time for amarokcollectionscanner + committing.

This is almost certainly due to the way that insertions and other DB
accesses were handled in the old scanning code.

I did a lot of work doing every thing I possibly could to minimize DB
calls, because they were by far being the slowest part of the scanning,
other than actual I/O access on the drives. The end result was a lot of
really nasty data structures to be able to emulate the behavior of
running various SQL calls. These data structures would store all
information to be committed, and then this information would be
committed in one go, using the largest packet size possible. This made
it quite complex, yes -- but it made it extremely fast. You've probably
seen them before but see e.g.
http://jefferai.org/2009/07/db-changes-call-for-benchmarkers/ and
http://jefferai.org/2009/10/speed-never-gets-old-at-least-in-software/
and especially
http://jefferai.org/2009/11/the-collection-scanners-ultimate-speed-bump-and-cases/

I haven't seen any proper query logs for the new scanner because when I
was last looking at them with Leo there were logic problems in the new
scanner that were keeping queries screwed up -- hopefully those have
been fixed. But I'm guessing from what I *did* see that each track uses
several database accesses -- an INSERT or two into various tables and
several SELECT or so queries. If so, this is going to be the big
bottleneck and the big reason for the slowdown.

--Jeff

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
Url : http://mail.kde.org/pipermail/amarok-devel/attachments/20101116/aa97145b/attachment.sig 


More information about the Amarok-devel mailing list