[Nepomuk] [RFC] File Indexers

Vishesh Handa me at vhanda.in
Wed Mar 20 14:27:45 UTC 2013


On Wed, Mar 20, 2013 at 7:39 PM, <phreedom at yandex.ru> wrote:

> On Вторник 19 марта 2013 23:35:42 Vishesh Handa wrote:
> > As your guys might remember, we moved away from Strigi for the 4.10
> > release. Our solution however, still does not support any document
> formats
> > apart from PDF. We need to change that and support other formats. There
> are
> > 2 possible ways to go about this -
> >
> > 1. We use Okular which supports a number of popular formats
> > 2. We write our own indexers by using the relevant library.
>
> I know I risk starting a flamewar, or more likely, there's no risk, and
> instead
> a 100% guarantee, but:
>

Not really. It was mostly just a decision taken by me.


>   3. Use libStreamAnalyzer.
>
> Take a look back at how many tiny issues and corner cases had to be fixed
> so
> far, how many lib quirks had to be accounted for? This was also the most
> significant source of troubles for libstreamanalyzer.
>

The main reason I'm against this is Strigi does not have a maintainer. Bugs
keep propping up - It doesn't handle all kinds of odf files, docs files,
etc. I do not want to have to fix them. Also, we're fundamentally
duplicating work. Libraries already exist to parse those file formats, and
they are actively being used all across kde. We can just reuse those
libraries instead of having our own parsers, and maintaining them.

What this duplication of effort has accomplished so far? And what happens
> if or
> hopefully when Nepomuk outgrows this file-based sandbox?
>

The duplication of effort has been quite small.

Currently all of the indexing code in Nepomuk which is doing 80% of the
Strigi's job is about 1400 lines of code. In comparison the code required
to just interface with Strigi in Nepomuk was a good 700 lines. Also, now
with our 2 tier approach, Strigi would be giving us data which has already
been pushed. One could remove that data and all, but it's just not
something I want to do.

I'm not sure when we will outgrow this file-based sandbox, but based on our
current requirements, we do not need anything more than file handling. The
other additional stuff that Strigi used to provide was just discarded.


> -- Evgeny
>



-- 
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130320/fa35eb73/attachment.html>


More information about the Nepomuk mailing list