Review Request: Restore heuristics to guess whether album is a compilation

Ralf Engels ralf-engels at gmx.de
Mon Mar 19 21:57:22 UTC 2012



> On March 18, 2012, 9:39 p.m., Ralf Engels wrote:
> > Are the unit test running?
> > 
> > There is one specific scenario that the current scanner is handling (since years. No new change).
> > 
> > ITunes stile collections put tracks from the same album but different artist into different directories like this:
> > 
> > Michael Jackson
> >  + Bad
> >     + Smooth Criminal
> > Quincy Jones
> >  + Bad
> >     + Commentary
> > 
> > If your fix doesn't break the unit test I am all for committing the fix.
> > Also you might want to make a new unit test to prevent somebody later on to change the behaviour for the worse.
> 
> Alexey Neyman wrote:
>     Ralf,
>     
>     I wasn't able to run the unit test; even though I installed google mock 1.6.0 ("apt-get install google-mock") on Kubuntu 12.04, cmake does not detect it:
>     
>     -----------------------------------------------------------------------------
>     -- The following REQUIRED packages could NOT be located on your system.
>     -- You must install these packages before continuing.
>     -----------------------------------------------------------------------------
>        * gmock (1.4 or higher)  <http://code.google.com/p/googlemock/>
>          Used in Amarok's tests.
>     
>     -----------------------------------------------------------------------------
>     
>     For the same reason, I didn't add a new unit test.
>     
>     But I am afraid this is exactly the behavior I am trying to fix. With this change, your example would result in two (or more separate albums): "Michael Jackson - Bad" and "Quincy Jones - Bad".
>     
>     So, your change was fixing iTunes but breaking the heuristics Amarok used before. Could you think of a solution that would also handle the following scenario:
>     
>     ABBA
>      + The Album
>         + Eagle
>         + Take a chance on me
>         + One man one woman
>     Chris Norman
>      + The Album
>         + As good as it gets
>         + Every little thing
>         + Red hot screaming love
>     
>     I.e., two completely separate albums being merged? One solution I could think of would be to merge tracks into single compilation if they all share the same year (is it the case with iTunes?). However, this is still prone to errors, for example if the tracks in both directories lack the year tag.
>     
>     Perhaps, make it a config option? Something like:
>     
>     [ ] Detect iTunes-style compilations
>
> 
> Alexey Neyman wrote:
>     By the way, is it the case that if one had both albums from my example in iTunes collection, they would also be merged in a compilation?
> 
> Alexey Neyman wrote:
>     And, just a few examples of such "coincidental" clashes in album names:
>     
>     http://en.wikipedia.org/wiki/The_Album_%28disambiguation%29
>     http://en.wikipedia.org/wiki/The_White_Album_%28disambiguation%29
>     http://en.wikipedia.org/wiki/The_Black_Album_%28disambiguation%29
>     http://en.wikipedia.org/wiki/The_Brown_Album_%28disambiguation%29
>     http://en.wikipedia.org/wiki/Animal_%28disambiguation%29
>     http://en.wikipedia.org/wiki/Red_%28disambiguation%29
>     http://en.wikipedia.org/wiki/Big_Bang_%28disambiguation%29
>     http://en.wikipedia.org/wiki/The_Hits_%28disambiguation%29
>     http://en.wikipedia.org/wiki/Blues_%28disambiguation%29
>     http://en.wikipedia.org/wiki/Monolith_%28disambiguation%29
>     ...
>     
>     And a lots more, googling by [disambiguation "album by" site:wikipedia.org]. That it, if one had two albums named "Monolith" from iTunes, they would also be merged into a single "compilation".
> 
> Sam Lade wrote:
>     Here's a thought: if it's a single compilation album, you won't get multiple copies of a given track number/disk number pair. If they're two different albums, you will. You'd have to account for the disk number tag not being present, and it's still not perfect, but it should be a better heuristic for both cases discussed above than either system.
> 
> Alexey Neyman wrote:
>     Thanks Sam, for detection of compilations/regular albums that sounds better. So, I think the algorithm should be:
>     - if there are tracks with disc# tag and without disc# tag, they are separate albums
>     - if there are matching (disk#,track#) tuples, excluding tracks without track# tag, they are separate
>     - otherwise, it is a compilation album
>     
>     This would solve the detection issue, but the issue of separation remains. If they are detected to be separate albums, should they be separated by the directory? Or by some more complex rules? There are several possible types of clashes:
>     
>     a. Regular album (in a single dir) vs. regular album
>     b. Regular album vs. iTunes-style compilation
>     c. iTunes-style compilation vs. iTunes-style compilation
>     d. Multi-way clashes (e.g. two regular albums and iTunes-style compilation)
>     
>     Current code handles none of these. Separation by directory would handle (a) and partially (b)/(d) for regular albums, but would be overzealous in separating iTunes-style compilations. However, I don't think cases (b) and (c) can be reliably handled in any way. Imagine the following structure:
>     
>     Artist_1
>      + Album_A
>        + 1_Foo
>        + 2_Far
>        + 3_Faz
>     Artist_2
>      + Album_A
>        + 4_Zeb
>     Artist_3
>      + Album_A
>        + 1_Joo
>        + 2_Jar
>        + 3_Jaz
>     
>     In this case, Amarok would be able to detect clash by track# tags, but how could one separate them? Is it (compilation of artists 1&2 + regular album of artist 3)? Or (compilation of artists 2&3 + regular album by artist 1)? Or (regular albums by all three, with tracks missing from artist 2's album)?
>
> 
> Alexey Neyman wrote:
>     Ralf,
>     
>     I just thought: shouldn't the iTunes compilation that you gave as an example have a proper "Album Artist" tag? In your example,
>     
>     Michael Jackson
>      + Bad
>         + Smooth Criminal
>     Quincy Jones
>      + Bad
>         + Commentary
>     
>     if both tracks had album artist set to "Michael Jackson" (even with different track artist tags), they would be properly merged into a single album by the following code in ScanResultProcessor::commit(), even before the compilations are analyzed:
>     
>             for( int i = tracks.count() - 1; i >= 0; --i )
>             {
>                 CollectionScanner::Album *album = sortTrack( tracks.at( i ) );
>                 if( album )
>                 {
>                     dirAlbums.insert( album );
>                     dirAlbumNames.insert( album->name() );
>                     tracks.removeAt( i );
>                 }
>             }
>     
>     On other hand, in my example:
>     
>     ABBA
>      + The Album
>         + Eagle
>         + Take a chance on me
>         + One man one woman
>     Chris Norman
>      + The Album
>         + As good as it gets
>         + Every little thing
>         + Red hot screaming love
>     
>     the albums will be improperly merged into a compilation despite properly set tags (all ABBA's tracks have album artist and track artist set to "ABBA", all Chris Norman's tracks have album artist and track artist tags set to "Chris Norman").
>     
>     So the question becomes: it is okay to mishandle properly tagged collections to somewhat improve handling of improperly tagged ones?

Hi Alexey,

regarding the google mock. Have you tried to remove the ccache file? Sometimes that remembers when libraries are not present and prevents cmake from rechecking.

Regaring the behaviour,
sometimes design decisions later on turn out to be bad.
In this case I am not yet sure but I remember thinking about some more heuristics,
e.g. if you just have too many tracks in an album.

Also I implemented the "Best Of" heuristic and the compilation flag two years ago.

I think that going by the amount of tracks would really help further.
So add a rule: If ceil(<tracks> / 15.0) >= <albums with the same name but different artists> then don't split them up.


- Ralf


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/104294/#review11582
-----------------------------------------------------------


On March 16, 2012, 12:13 a.m., Alexey Neyman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/104294/
> -----------------------------------------------------------
> 
> (Updated March 16, 2012, 12:13 a.m.)
> 
> 
> Review request for Amarok.
> 
> 
> Description
> -------
> 
> Older amarok used the following heuristics to determine whether a particular
> album is a compilation (from a comment in 
> amarok-2.2.1/src/collection/sqlcollection/ScanResultProcessor.cpp):
> 
>     //using the following heuristics:
>     //if more than one album is in the dir, use the artist of each track as albumartist
>     //if all tracks have the same artist, use it as albumartist
>     //try to find the albumartist A: tracks must have the artist A or A feat. B (and variants)
>     //if no albumartist could be found, it's a compilation
> 
> 
> However, more recent Amarok versions started to merge different albums
> with different artist in separate directories together, as explained above.
> Amarok started to assume all albums with same name to be compilations
> (even if in separate directories) since the following commit:
> 
> dfd8b457d7094144563c51b2528afdbe23ffc344
> Ralf Engels
> Fix all collection scanner auto tests.
> 
> Now, amarok first scans all directories (sorting albums by the name)
> and then tries to process *album names*, one at a time. If it finds
> more than one instance of an album name, it assumes it to be a compilation.
> Thus, it lost the heuristics in employed before ("if more than one album
> is in the dir...").
> 
> While it is still possible to force the right behavior
> by selecting "Do not show under Various Artists" for each of the erroneous
> albums, it would still be better to restore the original heuristics as there
> may be lots of albums merged this way. I think the old heuristics made sense
> (why would albums be put into separate directories otherwise, if they are
> a single compilation album?).
> 
> The attached patch restores the following logic: If any given directory
> contains tracks that were sorted into a single album and and that album
> was not created as a compilation (i.e. it has non-empty artists), this
> album is excluded from being merged with other albums to create a "compilation".
> 
> 
> Diffs
> -----
> 
>   src/core-impl/collections/db/ScanResultProcessor.cpp 4f02a16 
> 
> Diff: http://git.reviewboard.kde.org/r/104294/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alexey Neyman
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/amarok-devel/attachments/20120319/c6c6913b/attachment-0001.html>


More information about the Amarok-devel mailing list