[Nepomuk] Second try: classes and properties for describing excerpts (parts of text, parts of images)
Sebastian Trüg
trueg at kde.org
Fri Sep 23 14:19:19 UTC 2011
On 09/23/2011 04:08 PM, Sam Thursfield wrote:
>> this is the second time I send this email. Previously it had the subject
>> "Bookmarking - rebooted" and I fear that it scared away some people that
>> should actually be interested in this topic. The thing is just that from
>> my point of view the concepts are the same. So here goes again:
>>
>> Bookmarking is a rather simple concept. Typically used for URLs in web
>> browsers or file managers. In the Nepomuk ontologies (NFO) we have a
>> more or less direct mapping of the old bookmarking concept to classes:
>>
>> nfo:BookmarkFolder contains several nfo:Bookmarks which nfo:bookmarks
>> some nie:DataObjects.
>>
>> This is fine for the most basic kind of bookmarking: web urls and
>> files/folders. However, quickly the need for finer grained bookmarking
>> arose - a position in a text, a stream, and so on. Thus, properties like
>> nfo:pageNumber were created which give some information on the position
>> in the data object.
>>
>> >From my point of view this is not a great solution. For starters I do
>> not even like the concept of bookmarks. For me a bookmark is nothing
>> more than a piece of information that has been marked as interesting.
>> And with semantic search and friends I see no need for the organization
>> into bookmark folders anymore anyway.
>>
>> That aside I also think that we should not try to describe where in some
>> document our bookmark points to but we should rather properly define the
>> excerpt that we want to remember - a piece of text, part of an image,
>> and so on. Thus, we need to describe part of a nie:InformationElement.
>> Part of a nfo:PlainTextDocument for example is a piece of text which
>> starts at a certain character offset and has a certain length. To state
>> that a person is depicted in an image we should describe the part of the
>> image and then simply link that to the person.
>>
>> To me all this seems to happen on the nie:InformationElement level
>> rather than on the nie:DataObject level. We are interested in the
>> information, not the container. Thus, such a part of the document would
>> be nie:isLogicalPartOf the main information element.
>
> I fully agree with what you're saying. We need to strongly define
> 'excerpt' and 'bookmark' in this environment: an excerpt is a part of a
> logical resource, and a bookmark is essentially an annotation of this
> excerpt. The special case of bookmarks in a web browser sense is a bit of
> a trap because one annotates the whole resource (web page) instead of an
> excerpt.
>
>> What I am not sure about, however, is whether part of, say, a
>> nfo:RasterImage is a nfo:RasterImage again or if we need a dedicated new
>> type or if we would double-type.
>
> If we ignore any annotations to the excerpt (which count as bookmarks)
> there's no useful difference between nfo:RasterImage the whole image and
> nfo:RasterImage which is part of a larger one.
>
> I've been looking at this from an audio POV (I got here from
> https://sourceforge.net/apps/trac/oscaf/ticket/123 ) and the case of
> one audio file containing a whole album. A generic extract of a larger piece
> of audio is still nfo:Audio in my opinion. In the case that the extract is
> an entire track we can represent that by giving it type nmm:MusicPiece as well.
I agree.
>> In any case I would like to kick off the discussion of this topic which
>> is important in many situations by simply throwing some draft at you.
>> Have a look, comment on it, tell me that it is utter bs or that you like
>> the approach. Let's discuss.
>>
>> The draft:
>> ==========================================
>>
>> nie:Excerpt a rdfs:Class ;
>> rdfs:subClassOf nie:InformationElement .
>>
>> nie:containsExcerpt a rdf:Property ;
>> rdfs:subPropertyOf nao:hasSubResource, nie:hasLogicalPart ;
>> nrl:inverseProperty nie:isExcerptOf .
>>
>> nie:IsExcerptOf a rdf:Property ;
>> rdfs:subPropertyOf nao:hasSuperResource, nie:isLogicalPartOf ;
>> nrl:inverseProperty nie:containsExcerpt .
>>
>>
>> nfo:TextExcerpt a rdfs:Class ;
>> rdfs:subClassOf nie:Excerpt .
>>
>>
>> // can this be a nfo:Visual?
>> nfo:ImageRegion a rdfs:Class ;
>> rdfs:subClassOf nie:Excerpt .
>>
>> nfo:RectangularImageRegion a rdfs:Class ;
>> rdfs:subClassOf nfo:ImageRegion .
>>
>> nfo:offsetX a rdf:Property ;
>> rdfs:domain nfo:RectangularImageRegion ;
>> rdfs:range xsd:integer .
>>
>> nfo:offsetY a rdf:Property ;
>> rdfs:domain nfo:RectangularImageRegion ;
>> rdfs:range xsd:integer .
>> ==========================================
>
> I'm not convinced that the separate heirarchy for excerpts
> is necessary. nfo:Image is an ugly case because it's defined
> as "a file containing an image". I see something more like
I think we could simply change that.
> this would make sense:
>
> nie:Image a rdfs:Class ;
> rdfs:subClassOf nie:InformationElement
>
> nie:RasterImage a rdfs:Class ;
> rdfs:subClassOf nie:Image
>
> nfo:ImageFile a rdfs:Class ;
> rdfs:subClassOf nfo:FileDataObject
actually no. There is the file and then there are its contents. The
former is a FileDataObject and the latter a RasterImage. By using
doube-typing as we do now everything already works out fine. We only
need to change the rdfs:comment on the image class (which is wrong
anyway since an image can also be embedded in an attachment or an
archive, so it does not have to be a file)
> We use an nie:Image to represent the whole file, linked to the
> nfo:ImageFile resource using nie:isStoredAs. We can then use
nie:isStoredAs is conceptually correct but in Nepomuk we do not use it
as we combine the DataObject and Informationelement into one - as
mentioned above.
> nie:isLogicalPartOf to link image excerpts to the larger image,
> using a set of new properties to describe the containment:
>
> # Vector offset [0, 1]
> nie:imageOffsetX a rdf:Property ;
> rdfs:domain nie:Image ;
> rdfs:range xsd:double .
> nie:imageOffsetY a rdf:Property ;
> rdfs:domain nie:Image ;
> rdfs:range xsd:double .
>
> # Pixel offset
> nie:rasterImageOffsetX a rdf:Property ;
> rdfs:domain nie:RasterImage ;
> rdfs:range xsd:integer.
> nie:rasterImageOffsetY a rdf:Property ;
> rdfs:domain nie:RasterImage ;
> rdfs:range xsd:integer.
could be sub-properties of the above.
>
> For audio, a similar thing works:
>
> # Offset in seconds
> nie:audioOffset a rdf:Property ;
> rdfs:domain nfo:Audio ;
> rdfs:range xsd:double .
sounds good to me.
> (As an aside, the other conceptual way to implement this
> would be by specifying URI schemes for specifying regions, eg.
>
> file:///foo.mp3#15502.33 (seconds)
> file:///foo.jpg#40,50-500,600 (pixels)
I would like to avoid parsing of URIs actually.
> This is inherently less robust than specifying the properties in RDF because
> we can't specify units, range etc and there will have to be some special URI parsing
> code to handle them. I'm bringing it up pre-emptively to show it's a bad idea :)
hehe
> Hope these thoughts are in some way helpful.
very. For the moment this seems like a good idea to me. I suppose I need
to write it down and propose it as a patch before I can get a final view
on things.
> Sam
>
>
> PS. my underlying motivation here is https://bugzilla.gnome.org/show_bug.cgi?id=657183 -
> I'm adding support to Tracker for parsing .flac files with embedded cue sheets and the
> like and recognising the tracks inside.
Ah, I see. Very nice.
Cheers,
Sebastian
More information about the Nepomuk
mailing list