I need *your* benchmarking help!

Jeff Mitchell mitchell at kde.org
Fri Oct 23 02:37:57 UTC 2009


No apologies for the cross-post  :-)

Over the last few days I've been working extremely hard -- just ask my
poor neglected wife -- on solving one of the most longstanding issues in
A2 -- and, for that matter, A1 -- scanning performance.

I've done all kinds of tweaks in the amarokcollectionscanner binary
itself (
http://blog.jefferai.org/2009/10/14/speed-never-gets-old-at-least-in-software-1129
) but the problem remained that, when it came down to it, we were still
just accessing the database too damn much.

So, I finally bit the bullet and did what me and Leo had figured a while
back was the only way to solve this problem -- replace all SQL queries
in the middle of the scan with batch queries at the front and back and a
series of hashes with types like
QHash<int, QLinkedList<QStringList*> *>. In other words, the
ScanResultProcessor now writes a bunch of code to populate the hashes
with SQL at the start, then uses *only* the hashes during all of the
"inserts" and queries, and then writes out all the hashes to SQL at the
end. While maintaining cache coherency the whole way. If I didn't mess
up. Which I didn't. I think. Pretty sure. Possibly.

If you didn't understand that, don't worry. All you have to know is that
it was an absolute fuckload of work, and now I need some help
determining whether or not it was all worth it, which means I need
people benchmarking.

If you'd like to help out -- and please, do help out -- you'll need a a
large enough collection that a normal full scan takes a noticeable
amount of time, and you'll need to be running Amarok built from Git.
Here's what you do:

1) Update master and build. Open Amarok and run two full rescans
(Settings->Collection->Fully Rescan Collection). So click, let it run
until the progress bar reaches 100%, then click again. The second time,
time it with a stopwatch or some such thing. (The reason it's done twice
is so that the effects of disk caching can be reasonably ignored between
this version and the new one.)

2) Add my clone as a remote (use Google if you need help). My clone is
at git://gitorious.org/~jefferai/amarok/jefferai-work.git and the branch
you want is called "uidhash". Build it.

3) Run two full rescans again, timing the second one.

4) Close Amarok and re-open it.*

5) If you see any oddities (that aren't fixed by switching back to
master and rebuilding, then running a full rescan, then closing and
opening Amarok -- yes, all those steps) please be sure to report them
along with your benchmark results.

Many thanks in advance to those that help out.
--Jeff

* The reason you need to close and reopen Amarok is that there are
longstanding bugs in the collection browser that cause it to not be
updated properly when new data is scanned. So it can *look* like your
collection is messed up when what's really happening is that the browser
is using bad cached data. Since the browser reloads from SQL every time
you open Amarok, closing and opening Amarok ensures that you're seeing
the browser without these bugs interfering, so you can actually see
whether or not something is truly not working right.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <http://mail.kde.org/pipermail/amarok/attachments/20091022/e1305ae5/attachment.sig>


More information about the Amarok mailing list