[Nepomuk] word lists - strigi? nepomuk?
Dean Perry
happy.heyoka at gmail.com
Wed Jul 18 12:42:07 UTC 2012
Hi,
I originally posted this here :
<http://forum.kde.org/viewtopic.php?f=43&t=106919>
but the forum admin said I should try you directly... if you feel like
answering, post to the forum or mail me and I'll copy it there; I
can't be the only one who has wondered about this:
I have an idea for an application to automatically categorise and tag
documents based on their contents.
To do this I need a frequency distribution of the words in the
document.
I have played around with the nepomuk examples and have a few
clues about the tagging and rdf storage.
I can't find much info on a per-document word list though - nepsak,
nepoogle don't appear to show it, so maybe it's not stored in
virtuoso?
Is there a word list stored (eg: inverted vector index)? How does the
full text search in Dolphin do its thing?
Do I need to produce this list myself using libstreamanalyzer? I'd
prefer not to do a second indexing pass.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120718/79141062/attachment.html>
More information about the Nepomuk
mailing list