Incremental scanning (Bart Cerneels)

Tue Feb 7 08:19:15 UTC 2012

On Tue, Feb 7, 2012 at 01:23, Ralf Engels <ralf-engels at gmx.de> wrote:
>
>> Date: Mon, 6 Feb 2012 10:20:29 +0100
>> From: Bart Cerneels <bart.cerneels at kde.org>
>> To: amarok-devel <amarok-devel at kde.org>
>> Subject: Re: Incremental scanning
>> Message-ID:
>>       <CAMnMsScAKu3JgfEw43_rRzYQhqBZjmWP2Dwo6d9Qo4Cp5FFFmw at mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> On Wed, Feb 1, 2012 at 14:31, Ville Ranki <ville.ranki at iki.fi> wrote:
>> >
>> > Hello,
>> >
>> > I've been implementing incremental scanning for Amarok.
>> >
>
> Hi Ville,
> the Amarok collection scanner has a not-obvious but very logical reason
> for being not incremental.
>
> It does not rely on a specific file system layout and is deciding
> whether an album is a compilation or not depending on the artists of the
> tracks.
>
> Take a directory that looks like this:
> Michael Jackson
>  Bad
>  Bad
> Quincy Jones
>  Bad
>  Comments to Bad
>
> You will only be able to determine that "Bad" belongs in the "Bad"
> compilation album after having scanned the track from Quincy Jones.
>
>>> scanning is interrupted. Even if it would takes few minutes
>>> to scan entire collection, it is good usability to be
>>> able to start playing something as soon as Amarok is
>>> started.
>
> I agree completely.
> Even with a normal collection and a scanning time of only five minutes
> it's really stupid that I have to wait that long.
>
> We could change the behaviour in the way you propose.
> Directly committing the tracks after every directory (which would
> increase the overall scanning time and make it impossible to reliably
> detect compilations)
>
> To change the old design decision I would propose to open a bug/wish
> entry and collect votes.
>
>
>> >
>> > I have modified amarokcollectionscanner to output one
>> > XML block for each scanned directory and ScanManager
>> > to parse this data on the fly.
>> >
>> > Everything works well up to this point when i give
>> > Directory instances to result processor. In this
>> > example i have 2 directories with files in database.
>> >
>> > (scanner scans first directory)
>> >
>> > Following errors are outputted for each file:
>> >
>> > [WARNING] [SqlScanResultProcessor] Found urls entry without directory. A
>> > phantom track. Removing
>> > "amarok-sqltrackuid://1ca15c03e1fe38d324e128f81afc39a0"
>> > amarok: ? ? [SqlScanResultProcessor] deleteTrack
>> > "amarok-sqltrackuid://1ca15c03e1fe38d324e128f81afc39a0" url id 107
>> > amarok: ? ? [WARNING] [MountPointManager] Device ?0 ?not in database,
>> > this should never happen!
>> >
>> > (second directory is scanned)
>> >
>> > For each track:
>> > ?[SqlScanResultProcessor] deleteTrack
>> > "amarok-sqltrackuid://a15cea27a3d60e37bcee8493e5efcbec" url id 101
>> >
>
> I would need to see the xml data that you output.
> I imagine that you just broke something :O
>
>> > In GUI only second directory is visible. Documentation
>> > on SqlScanResultProcessor is a bit vague. I suppose
>> > i am using it wrong.
>> >
>
> There are is a nice auto test case available.
> I propose to first try to get the test case running.
>
>> > The following is done for each directory. I understood that
>> > ScanResultProcessor shouldn't be re-used so it's instantieted
> ...
>> > delete processor;
>> >
>> > Any ideas what might be wrong?
>> >
>
> Sorry, no idea. Maybe if I see your code.
> Do you have a git repository that I can look at (maybe on gitorious?)
>
>> > --
>>
>> I've recently used the collectionscanner for the USB mass storgage
>> plugin and scratched my head at ScanRersultProcess myself. Which is
>> why I ended up not using it.
>> As far as I can tell the reason it does not do true incremental
>> scanning is atomicity of the scanning operation. Either everything is
>> applied or the entire scan result is rolled back. I wonder if that
>> last case actually happens enough for us to have such a complex system
>> however.
>
> Actually once the committing has been started nothing can be rolled
> back.
> I also hated the system at first but I really can't see any other
> solution right now.
>
> Cheers,
> Ralf
>
>

I think we can massively simplify and speed up the collection scanning
if we remove some architectural complexity. At least, that is what my
experience with the UMS use of CollectionScanner suggests. Perhaps
it's time to re-factor it?

Bart