[Nepomuk] Bangarang player media library

Evgeny Egorochkin phreedom.stdin at gmail.com
Mon Feb 1 23:37:21 CET 2010


В сообщении от Понедельник 01 февраля 2010 12:52:43 автор Sebastian Trüg 
написал:
> Hi guys,
> 
> On Sunday 31 January 2010 18:12:32 Jamboarder wrote:
> > > From: Evgeny Egorochkin
> > > I have uncovered another issue with Bangarang, It seems to have it's
> > > own media library scanner and metadata extractor which also marks song
> > > metadata as user- created, and not automatically generated.
> >
> > Bangarang doesn't really have an explicit media scanner function.  It
> > just updates nepomuk whenever a file with extractable metadata is opened
> > or when a user updates metadata for a media resource.  It uses the
> > methods (setProperty, addType and setTypes) of Nepomuk::Resource to
> > set/update data. Am I using the api incorrectly?
> 
> You do. It is just that we try to keep data that can be extracted from
>  files separate from the rest, i.e. have it in one graph which can easily
>  be removed and recreated. That is what the Strigi indexer service does.
> 
> > > A much better solution would be to tell nepomuk which files to index
> > > and add any of you custom data extractors into libstreamanalyzer or at
> > > least refactor them as libstreamanalyzer plugins.
> >
> > Sounds good. That makes sense for metadata that is techinically
> > extractable from the file itself.  Since there's quite a lot of metadata
> > that may or may not be extractable there will still be a need for
> > Bangarang to maintain the in-context update of the nepomuk resource.  At
> > the moment, when updating nepomuk data, Bangarang doesn't distinguish
> > between techinically extractable file metadata and non-extractable
> > metadata. Video metadata is the prime example.  The entirety of the the
> > video metadata in Bangarang right now is user created. I hope to add some
> > support of video file metadata formats in the 2.0 version.  I like the
> > idea of refactoring the file metadata extractors as libstreamanalyser
> > plugins, but I'll still need to maintain functionality for in-context
> > user-created metadata that is not technically extractable.
> 
> The problem here is that Bangarang uses Taglib which cannot be used in
>  Strigi. Strigi is stream-based while Taglib is not. This is a well known
>  and old problem which leads to so much rewriting of code for Strigi.
> So converting the Banganrang analyzer to lsa is not an option, at least not
> until the latter becomes a non-stream-based API.

This isn't at all correct. It isn't a big problem to use taglib or whatever 
else for files. However one of features of libstreamanalyzer is that it can 
analyze many other things besides plain files. So even if we had a taglib 
analyzer, libstreamanalyzer would produce no data for eg email attachments.

What is even nastier is that fact that Taglib internally uses streams to 
handle files but doesn't expose this :(

Also you guys can expect media analyzers get much better soon. I'm working on 
it right now.

> > > You would get the same or better functionality, contribute to nepomuk
> > > and avoid lots of pitfalls like your custom extractors and file
> > > indexing service adding what essentially is 2 copies of the same data.
> >
> > That seems a little odd.  Does that mean a duplicate triple for the same
> > resource is created when setProperty is called on an existing resource
> > property? I can kinda see the reasoning for setting it to user-created
> > but it seams like it shouldn't duplicate it.  Is there anyway for an app
> > to just update the automatically generated triples since, whether strigi
> > or Bangarang or any other app does it, the data is still automatically
> > generated from the file metadata.
> 
> Resource::setProperty overwrites any existing triple with that
> subject/property pair.
> 
> > In the short term I'd like to sort out this duplicate issue as a
> > potential bugfix.  Then, if it still makes sense, in the medium term
> > (version 2.0) I'd like to try for the libstreamanalyser solution.
> 
> There is no duplication of data at the moment. The only thing that needs
> fixing if I saw correctly is that Banganrang does use plain strings for
> artists instead of nco:Contact resources.

> This is another problem (not of Banganrang but lsa): it would be good to
>  reuse these contacts instead of recreating them everytime. The latter
>  results in a lot of Contact resources which confuses KDEPIM.

This isn't a problem. It is a design decisioin which has been extensively 
discussed. It was decided that:
* nco:Contact is a low-level concept that shouldn't be exposed to the user at 
all. Users deal with pimo:Persons
* analyzers have no way to query the database and therefore can't meaningfully 
create links between IEs.

=> re-linking IEs should be done at higher-level.

Another example where this happens: albums are created for each and every 
song.

> > > I can help you with strigi part, and I hope Sebastian can find a way to
> > > let Bangarang extend the media collection via some nepomuk API call.
> 
> I have been thinking about this many times but so far did not come up with
>  a good solution yet. The Strigi service does create the one graph which is
>  marked as the index graph for that particular file[1]. This graph contains
>  only data that can be recreated by re-indexing the file. So in theory
>  Bangarang would need to add its own graph with data only extracted from
>  the file. But then we need to sync that data. If we were to put it in the
>  same graph that is used by Strigi then Strigi would delete that data again
>  on update. Also not a perfect solution. But maybe better since media files
>  almost never change....

It's better to get file indexing service do the dirty work, then the only thing 
Bangarang would need to do is to ask the service to index the needed file(s).

> > Thanks much Evgeny.   A quick first thought is that since a
> > libstreamanalyser plugin would result in a split in the code path for
> > writing to nepomuk in Bangarang, to keep the user feedback consistent
> > I'll probably need some progress feedback. Does that already exist? 
> > Also, probably a silly question but can the plugin be delivered at app
> > install time? Or does it have to be delivered at strigi install time?
> 
> Like most plugin systems Strigi also loads all available plugins at runtime
> which means that you can install any plugin you want at any time. KDE
>  actually contains a few of them.

-- 
Evgeny


More information about the Nepomuk mailing list