[Nepomuk] Duplicate resources because of using different sources and creators/performers

Sebastian Trüg trueg at kde.org
Fri Oct 14 14:06:15 UTC 2011


Check here: https://git.reviewboard.kde.org/r/102862/

On 10/14/2011 03:46 PM, Sebastian Trüg wrote:
> The Vorbis comments spec http://xiph.org/vorbis/doc/v-comment.html
> states that for popular music "artist" is normally the performing band
> and "performer" is omitted. But for classical music "artist" would be
> the composer while "performer" would be the - well performer. :P
> 
> So the question is: how do we handle this in the flac (and also
> vorbisbtw) analyzer plugin? Do something like "if perfomer is set use
> nco:creator otherwise nmm:performer"? That seems reasonable. Opinions?
> 
> As for resource merging: we have APi to merge two resources in the Data
> management service also easily accessible through DBus. But there is no
> code yet that tries to find duplicates.
> 
> Cheers,
> Sebastian
> 
> On 10/14/2011 02:19 PM, Ignacio Serantes wrote:
>> Hi,
>>
>> As my Nepomuk's database grows I found that duplicate resources is a
>> growing problem, and I'm not referring to the bug when indexing.
>>
>> As Nepomuk collect data form several sources, in my case Strigi,
>> Bangarang and me, occurs that data collected is not coherent and this is
>> expected. One simple example with actress/singer Shibasaki Kou (柴咲コ
>> ウ) <http://en.wikipedia.org/wiki/Kou_Shibasaki>:
>>
>> ignacio at misaki:~> nepoogle --nogui contacts:shibasaki or tag:shibasaki
>> 柴咲コウ (Shibasaki Kou)
>> shibasaki kou, 柴咲コウ
>> Kô Shibasaki, Kô Shibasaki
>>
>> and there are more combinations, Kō Shibasaki, Shibasaki Kô, Kou
>> Shibasaki, etc..., more or less valid. This, in fact, is a common
>> problem with Asian names but Occidental names are not free, for example
>> ELO and Electric Light Orchestra.
>>
>> Obviously if you search only "shibasaki" you found what you're looking
>> for but other resources you don't want also.
>>
>> This is not a bug, is a simple logical problem because we found the Real
>> World™ so I wonder if there is implemented, planning or discussed
>> something about merge all this records.
>>
>>
>> Other different question is the fact that there is "performers" and
>> "creators", both are the singers of a music file, and the main
>> difference is this data are collected from mp3 files or flac files. By
>> surprise I found yesterday that strigi is indexing flac files again.
>> Terrific :).
>>
>> So, for example, if I want to search for all Shibasaki Kou's songs I
>> must type:
>>
>> nepoogle --nogui performer:shibasaki or creator:shibasaki
>>
>> Of course I could search by contact
>>
>> nepoogle --nogui contact:shibasaki
>>
>> but in this case I also get movies and other records and I only looking
>> for music.
>>
>> I'm thinking in add a shortcut named "singer", with implements a
>> "performer or creator", to nepoogle but I wan't to confirm if this would
>> be changed or not.
>>
>> -- 
>> Best wishes,
>> Ignacio
>>
>>
>>
>>
>> _______________________________________________
>> Nepomuk mailing list
>> Nepomuk at kde.org
>> https://mail.kde.org/mailman/listinfo/nepomuk
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk


More information about the Nepomuk mailing list