PATCH: Impoving import of Amarok 1.4 statistics

Sun Jul 26 11:41:23 CEST 2009

Hi Michael,
All is looking good - except for one problem.

Running the query:

QString sql =
      QString( "SELECT lastmountpoint, S.url, S.createdate,
accessdate, percentage, rating, playcounter, lyrics, title, A.name,
R.name, C.name, G.name, Y.name, track, discnumber, filesize "
        "FROM statistics S "
        "LEFT OUTER JOIN devices D "
        "  ON S.deviceid = D.id "
        "LEFT OUTER JOIN lyrics L "
        "  ON L.deviceid = S.deviceid "
        "  AND L.url = S.url "
        "LEFT OUTER JOIN tags T "
        "  ON T.deviceid = S.deviceid "
        "  AND T.url = S.url "
        "LEFT OUTER JOIN album A "
        "  ON T.album == A.id "
        "LEFT OUTER JOIN artist R "
        "  ON T.artist == R.id "
        "LEFT OUTER JOIN composer C "
        "  ON T.composer == C.id "
        "LEFT OUTER JOIN genre G "
        "  ON T.genre == G.id "
        "LEFT OUTER JOIN year Y "
        "  ON T.year == Y.id "
        "ORDER BY lastmountpoint, S.url" );

Gives me an error in mysql:

Error: Could not execute import query: You have an error in your SQL
syntax; check the manual that corresponds to your MySQL server version
for the right syntax to use near '== A.id LEFT OUTER JOIN artist R ON
T.artist == R.id LEFT OUTER JOIN composer ' at line 1 QMYSQL: Unable
to execute query

I'm not really sure what's going on here, since the query looks good.
Playing around with the query on the command line seems to indicate
that perhaps mysql can't handle so many left outer joins. Can't really
find any useful info on the web though.

Do you know anything about this, Michael?

Cheers,
Seb

2009/7/22 Michael Reiher <redm at gmx.de>:
> Hi
>
> No problem. I just wondered, why no one cared to respond... :)
>
> In the meanwhile I figured it might be helpful to redo the patch against
> current SVN (Sorry, no idea about git yet;), as the original was against
> 2.1.1. So attached is an updated version.
>
> There are two comments regarding the patch, that I didn't mention in my
> original mail:
>
> One thing which I'm unsure about is, that I added a few specialized signals
> and slots to the generic importer classes in order to get it to display more
> then an "Imported" message.
>
> The other thing is that I made the queries to find appropriate tracks in the
> new collection synchronous by waiting for the query thread. Reasons:
> - There might be (performance) problems firing off hundreds of parallel
> queries.
> - I'd somehow have to make sure they all finish before the FastForwardWorker
> thread finishes (or continues with db sync and coverfetch)
> - I'd have to implement some extensive bookkeeping to keep track of the
> queries and data access from the result slots.
> So it may be not exactly elegant, but IMHO it does the job in this case. And
> it's sufficiently simple to not introduce too many weird bugs by
> parallelization.
>
>
> I also updated the escape patchlet, one case is already fixed in SVN.
>
> On Tuesday 21 July 2009 07:47:41 you wrote:
>> Hi Michael,
>> Sorry I've taken a while in responding.
>>
>> Thanks for the patch - I'll look at it when I get a chance (currently
>> very busy).
>>
>> The problems you faced are certainly no outlier case, and they way you
>> have outlined your solution seems both sane and intelligent.
>>
>> 2009/7/17 Michael Reiher <redm at gmx.de>:
>> > Hi
>> >
> [snip]
>
>> > Problem 2: Import statistics even if files have moved. Here I added an
>> > option to try to determine the appropriate entries in the new collection
>> > by evaluating the tags of the original files.
>>
>> Do you have any estimate as to potential performance degradations
>> here? Opening files and reading tags is no quick task. However, given
>> that this should be a one off operation a perf degradation would be
>> worth it.
>
> Well, I don't do any opening of files. I simply query them in the collection
> (QueryMaker). And even though I made it synchronous (see above), I was
> surprised that it's actually not too much a slow down. I just measured it and
> it's 51s with only using URLs and 1min31s with tag based matching. This is
> with about 2600 entries in the old statistics database on a not really fast
> machine. But as you say, it's more or less a one time operation anyway.
>
>>
>> > A remaining problem I came across during my experiments is, whether it's
>> > useful to import statistics for files that exist, but are not in the
>> > collection. I stumbled over this when I set a symlink from the old prefix
>> > (/data/music) to the new one (~/Music) in an attempt to get the import
>> > working. So a part of the files were accessible again with their old
>> > paths. The new collection however contained only the new paths... All I
>> > got however were a load of double entries in the collection... So it a)
>> > can be confusing and b) I don't actually see a use for this. So I would
>> > suggest only to import statistics for files which are in the collection.
>> > Does anything speak against that?
>>
>> That is exactly the plan and what we were aiming to do.
>>
> Are you sure? FastForwardWorker explicitly maintains a tracksForInsert list
> with tracks from the old statistics table that exist by URL but are *not* in
> the new collection. So this seems rather indended for me. However I don't see
> the point for this. When a track is not in the new collection, I'd simply
> ignore it. As otherwise the user would have included these files in the new
> collection...
>
> Greets Michael
>

-- 
Seb Ruiz

http://www.sebruiz.net/
http://amarok.kde.org/