[Nepomuk] [RFC] New File Indexer
happy.heyoka at gmail.com
Thu Sep 13 14:48:32 UTC 2012
On Tue, 11 Sep 2012 09:06:19 PM Vishesh Handa wrote:
> Do you also have email indexing enabled? Cause that is handled separately by kdepim, though pushing the data does make
> virtuoso act up.
I tried killing akonadi via console, not much impact. I can see the nepomukindexer processes coming and going but
virtuoso is doing all the work... still ...same run I mentioned Wed, plus one reboot : 48 hours...
Kubuntu hasn't got the latest build of virtuoso yet - I recall there was mention of it having some efficiency
> > If the nepomukindexer process crashes, then that file is ignored, and we continue on the next file.
> > Even I like the concept of systemd. Currently half of the Nepomuk communication happens over a local socket, and the
> > other half over dbus. Eventually, I would like to move completely to the local socket, but that's for later. And
> > it's
> > only when I profile and discover that dbus actually is a limiting factor.
> > Imagine the simplest indexer that adds only resource/tag/value triplets - it just becomes just two nested loops:
> > - iterate over resources
> > -- iterate over meta data items.
> > --- Test if resource contains item 1 (eg: jpeg/exif exposure), output triple for item 1
> > --- Test if resource contains item 2 (eg: jpeg/exif iso), output triple for item 2
> > - exit.
> I'm not sure I understand what you mean over here.
What I was thinking was this something like this : an equivalent to the scanning part nepomukindexer launches something
that would look like this from a shell:
fredmetaparser file1.frd | nepomukegraphdigester [file.frd]
Where 'fredmetaparser' knows how to extract metadata from a '.frd' file and output a graph to stdout.
'nepomukegraphdigester' knows uses the Soprano stuff to parse the graph and add it to the storage.
You make 'nepomukgraphdigester' have a verbose and/or non-storage mode (eg: debugging mode) - it might also need to know
the file URI (?)
Building the 'fredmetaparser' is just a matter out using stdout to create the graphs in one of the simpler forms (xml?)
should be straightforward enough and easy to debug - you just run the parser on a file and look at the output.
Stage two is you pipe it into the 'nepomukgraphdigester' in debug mode and look at how happy it is about the graph.
afaik pipes are two or three times faster than anything else, but the downside is of course the "protocol' is raw bytes.
Again, I have no idea if the bottleneck is the meta data parsing, the ipc or the storage (virtuoso); maybe instrumenting
the processes would be a good thing?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Nepomuk