[Nepomuk] Re: Handling multiple sources of metadata

Sebastian Trüg trueg at kde.org
Tue May 3 11:14:26 CEST 2011


Hi Bruce,

this is what is done:
- We store everything in one db
- We index file metadate like id3 or xmp tags
- We have a GSoC project for metadata writeback, ie. changed metadata in
the db will be written back to the file if possible

about extended attributes:
- AFAIK most distributions disable them by default.
- they are not supported by such file systems like fat which is used on
most usb keys. thus, they do not increase interoperability much

The idea we have is to store the metadata on the filesystem itself in a
cross-platform way. This has been looked into but we need someone to
really do it. Are you interested?

Cheers,
Sebastian

On 05/03/2011 01:33 AM, Bruce Adams wrote:
> 
> Hi,
>     I am revisiting the idea of file tagging again. 
> There are potentially several places to store meta data.
> 
> 1 Stored in a database.
> 2 Embedded in the file format. E.g XMP/EXIF
> 3 Stored in extended file attributes
> 4 Stored in a special meta-data file associated with the original file.
> 
> Embedded data is explicitly mentioned here:
> 
> http://api.kde.org/4.0-api/kdelibs-apidocs/nepomuk/html/index.html
> 
> 
> with ID3 tags used as the example.
> What about XMP tags added, for example, in digikam?
> 
> Unless I am mistaken Nepomuk currently only uses its own database.
> I understand the reason for this approach is that the database solution is the 
> only one that works for all cases.
> (Though the link to the FAQ where I was looking was broken)
> I personally think it is wrong to make it the primary location as losing 
> metadata when you copy files around is broken behaviour.
> 
> I was wondering (especially with a sprint potentially coming) what the ideal 
> system would be.
> This is revisiting old ground but bitrot seems to have affected my google search 
> results so forgive me re-asking old questions.
> 
> If you have multiple sources of the same data and they disagree which should be 
> considered primary?
> Who is responsible for syncing them if they disagree?
> 
> My thinking is as follows:
> 
> File embedded data is primary.
> Extended file attributes are secondary and should only be used for data when the 
> file format does not allow for embedding.
> Meta data associated with the original file is simulation of the above and hence 
> comes next.
> 
> The database is last but definitely not least,
> If it is able the server should sync the data.
> 
> For example:
>  Given an image tagged in nepomuk (e.g. via gwenview) nepomuk or a service on 
> its behalf should
>   add the embedded tags itself (on gwenviews behalf - assuming gwenview did't do 
> it) 
> 
>   
>  Given an image tagged outside of nepomuk (e.g. in digikam) nepomuk should 
> import the tags into its database
>  the next time it needs to query the file (or when indexing it).
> 
> Similarly I think extended file attributes should be imported/exported where the 
> file system supports them
> and with an optional fall back to simulating them with .metadata files or 
> similar.
> 
> I read something alluding that extended file attributes are unsuitable for 
> nepomuk data as they are stored as pairs
> whereas nepomuk uses triples. Hyperlinks to the details were either missing or 
> broken.
> I'm not sure I understand the problem. Surely both triples and pairs can be 
> converted between easily enough when the
> base representations are strings?  E.g. "A" "B" "C" becomes either "A:B" "C" or 
> "A" "B:C" 
> 
> 
> Are there some other limitations on extended file attributes that I'm not aware 
> of?
> 
> At the risk of re-opening old wounds I notice beagle uses extended attributes so 
> I assume xesam does too.
> This would be another way to promote interoperability in spite of incompatible 
> ontologies.
> For the case of simple file tagging  nepomuk <-> XMP <-> xesam might work for 
> example with only the simple
> digikam ontology as an interchange.  xmp.digikam.TagsList for example.
> 
> Any thoughts?
> 
> Regards,
> 
> Bruce.
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
> 


More information about the Nepomuk mailing list