[Nepomuk] Duplicate resources because of using different sources and creators/performers

Ignacio Serantes kde at aynoa.net
Fri Oct 14 12:19:29 UTC 2011


Hi,

As my Nepomuk's database grows I found that duplicate resources is a growing
problem, and I'm not referring to the bug when indexing.

As Nepomuk collect data form several sources, in my case Strigi, Bangarang
and me, occurs that data collected is not coherent and this is expected. One
simple example with actress/singer Shibasaki Kou
(柴咲コウ)<http://en.wikipedia.org/wiki/Kou_Shibasaki>
:

ignacio at misaki:~> nepoogle --nogui contacts:shibasaki or tag:shibasaki
柴咲コウ (Shibasaki Kou)
shibasaki kou, 柴咲コウ
Kô Shibasaki, Kô Shibasaki

and there are more combinations, Kō Shibasaki, Shibasaki Kô, Kou Shibasaki,
etc..., more or less valid. This, in fact, is a common problem with Asian
names but Occidental names are not free, for example ELO and Electric Light
Orchestra.

Obviously if you search only "shibasaki" you found what you're looking for
but other resources you don't want also.

This is not a bug, is a simple logical problem because we found the Real
World™ so I wonder if there is implemented, planning or discussed something
about merge all this records.


Other different question is the fact that there is "performers" and
"creators", both are the singers of a music file, and the main difference is
this data are collected from mp3 files or flac files. By surprise I found
yesterday that strigi is indexing flac files again. Terrific :).

So, for example, if I want to search for all Shibasaki Kou's songs I must
type:

nepoogle --nogui performer:shibasaki or creator:shibasaki

Of course I could search by contact

nepoogle --nogui contact:shibasaki

but in this case I also get movies and other records and I only looking for
music.

I'm thinking in add a shortcut named "singer", with implements a "performer
or creator", to nepoogle but I wan't to confirm if this would be changed or
not.

-- 
Best wishes,
Ignacio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20111014/218be945/attachment.html>


More information about the Nepomuk mailing list