[Nepomuk] ontology related advice request

Laura Dragan aprilush at gmail.com
Wed Oct 21 16:17:50 CEST 2009


Evgeny Egorochkin wrote:
> В сообщении от Среда 21 октября 2009 14:58:40 автор Laura Dragan написал:
>> I'm looking at updating the Note class used in SemNotes and I could
>> use a second opinion from somebody who knows the ontologies and API
>> better.
>>
>> Currently the notes are of type pimo:Note. They have the following
>> properties (not exclusively):
>>
>> - title -> dc:title, also sets the nao:prefLabel to the same value
>> - creation time -> nao:created
>> - last write -> nao:lastModified
>> - tags -> nao:hasTag
>> - referenced resources -> pimo:isRelated
>> - content -> semnotes:htmlContent
>>
>> Implementation-wise, the Note class is a subclass of
>> Nepomuk::Resource, but it will be changed to a subclass of
>> Nepomuk::Thing after all this.
>>
>> There are 2 questions:
>>
>> 1. What is the best choice to replace the ugly semnotes:htmlContent
>>  property?
>>
>> I would like to replace it with some existing property in an existing
>> ontology. This would allow me to delete the ontology that comes with
>> the application. I thought for a while that pimo:wikiText might do the
>> job, but after some consideration I'm not so sure any more.
>
> Formatted text is a big pain. The closest thing we have in NIE is
> nmo:htmlMessageContent.
>
> There's no generic property like this. For HTML we could add it though if it's
> needed.

I used HTML links to store in the text the links to the referenced
resources. I will move away from this and start using the annotation
plugins to store the references separately from text. However, HTML is
also useful for formatted text, which is what I actually need (like
the nmo:htmlMessageContent you mentioned above). So maybe a new
property that allows that would be good.

>
> As to wiki, the only standard thing in wiki formatting is the wiki word.
> Everything else seems to be done in as different way as possible. So I don't
> know what's the point of having any wiki properties at all.
>
> Much easier would be to translate between a subset of html and the wiki syntax
> your app has to deal with. Easier != easy though :(

I wouldn't want to get into wiki syntax .. precisely because of the
unlimited number of standards and subsets of standards. Although, some
users who are familiar with it commented that it would be a nice
addition to the tool.

>
>> 2. Should I keep storing the notes in the RDF store or should I use
>> files on disk?
>>
>> Currently the note and all its properties are stored in the
>> repository, including the content. Initially I was expecting that
>> notes would be small, therefore not really worth storing in
>> independent files on disk. But after looking at the way that the few
>> users I know (including myself) take notes, I found that some notes
>> can be quite long and elaborated. That's why I'm now wondering if it
>> wouldn't be better to just store them in files and let Strigi index
>> the files. This way the indexing of note content is not lost.
>>
>> This question makes the first one redundant in a way, because if notes
>> should not be stored in the repository, the semnotes:htmlContent would
>> be anyway removed and the corresponding file would become
>> pimo:groundingOccurence for the note instance.
>
> You can store notes in the RDF store as long as you do sopranocmd export on a
> regular basis.

What do you mean by this? The tools provides a backup utility that
saves the notes and all the information related to them as rdf and
then there is the possibility of restoring notes from such a file.
Also there is the possibility of exporting the notes as files, but
even exported they are still stored in the repository..

>
> It doesn't matter if the notes are elaborate as long as you write them
> yourself instead of using your note-taking app as a dumpster for random
> content from the web. You are very unlikely to write more than several MB of
> text during your lifetime ;)
>

Some might do note-taking by copy/pasting :) I don't do that, and my
notes total almost 1M currently, with the first one dating from March.
So your estimate is very accurate :)

For now I will keep storing the content in the rdf repo.

> A rule of thumb is: the fulltext index of a plaintext file is about the same
> size as the file itself. So if you store the file in the RDF store, you double
> its DB size.
>
> A fail-safe approach would be to introduce a generic htmlContent property,
> create a nie:InformationElement to hold this property and make it a
> groundingOccurence of your pimo:Note. This doesn't mean this information
> element has to be a file on the disk. But it means it can become a file on the
> disk if necessary without breaking anything.

So, if I understand correctly, this would be right:

[ - notes are of type pimo:Note and keep the title, creation time and
last modif time and tag properties as they are ]
- note has pimo:groundingOccurence a nie:InformationElement
- the nie:InformationElement has a property htmlContent

New questions:

1. Should I use nie:InformationElement directly, or would
nfo:HtmlDocument make a better type, although there is no actual
DataObject for it?

2. Should I create a ticket for adding the nie:htmlContent property in Trac?
It would be super-property of nmo:htmlMessageContent.

>
>> Sorry for the lengthy email :)
>> Thanks for reading,
>
> But no thanks for replying? :)

Now I thank you for replying and hope you will reply some more :)


Laura


More information about the Nepomuk mailing list