[Nepomuk] Second try: classes and properties for describing excerpts (parts of text, parts of images)

Sam Thursfield sam.thursfield at codethink.co.uk
Fri Sep 23 16:06:49 UTC 2011


On Fri, 2011-09-23 at 16:19 +0200, Sebastian Trüg wrote:
> On 09/23/2011 04:08 PM, Sam Thursfield wrote:
> >> this is the second time I send this email. Previously it had the subject
> >> "Bookmarking - rebooted" and I fear that it scared away some people that
> >> should actually be interested in this topic. The thing is just that from
> >> my point of view the concepts are the same. So here goes again:
> >>
> >> Bookmarking is a rather simple concept. Typically used for URLs in web
> >> browsers or file managers. In the Nepomuk ontologies (NFO) we have a
> >> more or less direct mapping of the old bookmarking concept to classes:
> >>
> >> nfo:BookmarkFolder contains several nfo:Bookmarks which nfo:bookmarks
> >> some nie:DataObjects.
> >>
> >> This is fine for the most basic kind of bookmarking: web urls and
> >> files/folders. However, quickly the need for finer grained bookmarking
> >> arose - a position in a text, a stream, and so on. Thus, properties like
> >> nfo:pageNumber were created which give some information on the position
> >> in the data object.
> >>
> >> >From my point of view this is not a great solution. For starters I do
> >> not even like the concept of bookmarks. For me a bookmark is nothing
> >> more than a piece of information that has been marked as interesting.
> >> And with semantic search and friends I see no need for the organization
> >> into bookmark folders anymore anyway.
> >>
> >> That aside I also think that we should not try to describe where in some
> >> document our bookmark points to but we should rather properly define the
> >> excerpt that we want to remember - a piece of text, part of an image,
> >> and so on. Thus, we need to describe part of a nie:InformationElement.
> >> Part of a nfo:PlainTextDocument for example is a piece of text which
> >> starts at a certain character offset and has a certain length. To state
> >> that a person is depicted in an image we should describe the part of the
> >> image and then simply link that to the person.
> >>
> >> To me all this seems to happen on the nie:InformationElement level
> >> rather than on the nie:DataObject level. We are interested in the
> >> information, not the container. Thus, such a part of the document would
> >> be nie:isLogicalPartOf the main information element.
> > 
> > I fully agree with what you're saying. We need to strongly define
> > 'excerpt' and 'bookmark' in this environment: an excerpt is a part of a
> > logical resource, and a bookmark is essentially an annotation of this
> > excerpt. The special case of bookmarks in a web browser sense is a bit of
> > a trap because one annotates the whole resource (web page) instead of an
> > excerpt.
> > 
> >> What I am not sure about, however, is whether part of, say, a
> >> nfo:RasterImage is a nfo:RasterImage again or if we need a dedicated new
> >> type or if we would double-type.
> > 
> > If we ignore any annotations to the excerpt (which count as bookmarks)
> > there's no useful difference between nfo:RasterImage the whole image and
> > nfo:RasterImage which is part of a larger one.
> > 
> > I've been looking at this from an audio POV (I got here from
> > https://sourceforge.net/apps/trac/oscaf/ticket/123 ) and the case of
> > one audio file containing a whole album. A generic extract of a larger piece
> > of audio is still nfo:Audio in my opinion. In the case that the extract is
> > an entire track we can represent that by giving it type nmm:MusicPiece as well.
> 
> I agree.
> 
> >> In any case I would like to kick off the discussion of this topic which
> >> is important in many situations by simply throwing some draft at you.
> >> Have a look, comment on it, tell me that it is utter bs or that you like
> >> the approach. Let's discuss.
> >>
> >> The draft:
> >> ==========================================
> >>
> >> nie:Excerpt a rdfs:Class ;
> >>   rdfs:subClassOf nie:InformationElement .
> >>
> >> nie:containsExcerpt a rdf:Property ;
> >>   rdfs:subPropertyOf nao:hasSubResource, nie:hasLogicalPart ;
> >>   nrl:inverseProperty nie:isExcerptOf .
> >>
> >> nie:IsExcerptOf a rdf:Property ;
> >>   rdfs:subPropertyOf nao:hasSuperResource, nie:isLogicalPartOf ;
> >>   nrl:inverseProperty nie:containsExcerpt .
> >>
> >>
> >> nfo:TextExcerpt a rdfs:Class ;
> >>   rdfs:subClassOf nie:Excerpt .
> >>
> >>
> >> // can this be a nfo:Visual?
> >> nfo:ImageRegion a rdfs:Class ;
> >>   rdfs:subClassOf nie:Excerpt .
> >>
> >> nfo:RectangularImageRegion a rdfs:Class ;
> >>   rdfs:subClassOf nfo:ImageRegion .
> >>
> >> nfo:offsetX a rdf:Property ;
> >>   rdfs:domain nfo:RectangularImageRegion ;
> >>   rdfs:range xsd:integer .
> >>
> >> nfo:offsetY a rdf:Property ;
> >>   rdfs:domain nfo:RectangularImageRegion ;
> >>   rdfs:range xsd:integer .
> >> ==========================================
> > 
> > I'm not convinced that the separate heirarchy for excerpts
> > is necessary. nfo:Image is an ugly case because it's defined
> > as "a file containing an image". I see something more like
> 
> I think we could simply change that.

Yes, we could have both nie:Image and nfo:Image which is a little
nasty but would avoid breaking compatibility ..

> > this would make sense:
> > 
> > nie:Image a rdfs:Class ;
> >   rdfs:subClassOf nie:InformationElement
> > 
> > nie:RasterImage a rdfs:Class ;
> >   rdfs:subClassOf nie:Image
> > 
> > nfo:ImageFile a rdfs:Class ;
> >   rdfs:subClassOf nfo:FileDataObject
> 
> actually no. There is the file and then there are its contents. The
> former is a FileDataObject and the latter a RasterImage. By using
> doube-typing as we do now everything already works out fine. We only
> need to change the rdfs:comment on the image class (which is wrong
> anyway since an image can also be embedded in an attachment or an
> archive, so it does not have to be a file)

I think we're agreeing here :)

> > We use an nie:Image to represent the whole file, linked to the
> > nfo:ImageFile resource using nie:isStoredAs. We can then use
> 
> nie:isStoredAs is conceptually correct but in Nepomuk we do not use it
> as we combine the DataObject and Informationelement into one - as
> mentioned above.

Interesting, in Tracker we do the same but also have

 _:file nie:isStoredAs _:file

> > nie:isLogicalPartOf to link image excerpts to the larger image,
> > using a set of new properties to describe the containment:
> > 
> > # Vector offset [0, 1]
> > nie:imageOffsetX a rdf:Property ;
> >   rdfs:domain nie:Image ;
> >   rdfs:range xsd:double .
> > nie:imageOffsetY a rdf:Property ;
> >   rdfs:domain nie:Image ;
> >   rdfs:range xsd:double .
> > 
> > # Pixel offset
> > nie:rasterImageOffsetX a rdf:Property ;
> >   rdfs:domain nie:RasterImage ;
> >   rdfs:range xsd:integer.
> > nie:rasterImageOffsetY a rdf:Property ;
> >   rdfs:domain nie:RasterImage ;
> >   rdfs:range xsd:integer.
> 
> could be sub-properties of the above.

Very true

> > 
> > For audio, a similar thing works:
> > 
> > # Offset in seconds
> > nie:audioOffset a rdf:Property ;
> >   rdfs:domain nfo:Audio ;
> >   rdfs:range xsd:double . 
> 
> sounds good to me.

> > (As an aside, the other conceptual way to implement this
> > would be by specifying URI schemes for specifying regions, eg.
> > 
> > file:///foo.mp3#15502.33  (seconds)
> > file:///foo.jpg#40,50-500,600  (pixels)
> 
> I would like to avoid parsing of URIs actually.
> 
> > This is inherently less robust than specifying the properties in RDF because
> > we can't specify units, range etc and there will have to be some special URI parsing
> > code to handle them. I'm bringing it up pre-emptively to show it's a bad idea :)
> 
> hehe
> 
> > Hope these thoughts are in some way helpful.
> 
> very. For the moment this seems like a good idea to me. I suppose I need
> to write it down and propose it as a patch before I can get a final view
> on things.

I'm liking the sound of everything here, look forward to seeing the
patch :)

Sam



More information about the Nepomuk mailing list