<br><br><div class="gmail_quote">On Tue, Sep 11, 2012 at 8:18 PM, Sebastian Trüg <span dir="ltr"><<a href="mailto:sebastian@trueg.de" target="_blank">sebastian@trueg.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I like this.<br></blockquote><div><br>Good. Then I'll start working on this.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

But I would vote for a plugin system nonetheless. A simple one though. A plugin can register for one or more mimetypes and then it simply gets the file path and returns a SimpleResourceGraph. You merge all and are done. Plugins should never deal with file size, mimetype, or any of those basic things the framework can handle.<br>

</blockquote><div><br>Of course. No point duplicating code.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

This means that the first sweep is done without plugins, the second one would call the plugins and the third one, well, that could be yet another plugin system which does use RDF types instead of mimetypes. For example: the TV show plugin handles nfo:Video. The framework thus calls the plugin, provides the path and a handle to the existing metadata. The plugin can then simply run its filename analysis and continue from there.<br>

</blockquote><div><br>For now I'm just going to have the first two steps, but the data will be saved in one go. And I'll implement all the plugins, after that we can see about splitting it up into multiple phases like you and Alex have described.<br>

<br>This approach of having multiple steps will also help with the scheduling. We can do the basic indexing (stat the file + url + mimetype) instantly, and the rest can be scheduled based on the system usage. ( For people who don't know - The current approach is to have a queue, whose delay is controlled based on the system usage )<br>

 <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

OK, one issue we have here is the following: the tv show extractor for example works better when run on sets of video files, preferably a whole season. Then it only needs to get feedback from the user once or can even do its job automatically. This, however, means that third-sweep plugins would need an option "can-handle-more-than-one-file-at-a-time".<br>


<br>

My 2cents.<div class="HOEnZb"><div class="h5"><br>

<br>

On 09/11/2012 04:06 PM, Alex Fiestas wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I think we've discussed this somewhere but I don't remember the outcome of the<br>

discussion xD<br>

<br>

I think that would be really interesting to have an indexer that does a 2pass<br>

strategy.<br>

<br>

First pass will index only basic data such a name, dates, mimetype.<br>

<br>

Second pass will index specific stuff, previews, texts, tags...<br>

<br>

Doing this, we can even add third party "information fetchers" as a 3 pass,<br>

for example to get information about tv shows and such.<br>

<br>

Let's put an example:<br>

<br>

-New file in my Downlaod folder detected<br>

-Quick super fast indexer indexs data, name, mimetype<br>

          From this point, this file is already usable in Nepomuk<br>

-Second pass, indexing tags, previews<br>

-Third pass (this can be onDemand via GUI) information from the internetz is<br>

fetched.<br></blockquote></div></div></blockquote><div><br>+1<br><br>Seems like a good idea.<br> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb">

<div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

I got this idea from spotlight (osx indexer metadata thing), the most obvious<br>

way of seeing this in osx is when a new external storage is plugged, files<br>

will get indexed super fast but all you will get if you perform a search is<br>

going ot be filenames, not even mimetypes !<br>

<br>

Cheerz.<br>

______________________________<u></u>_________________<br>

Nepomuk mailing list<br>

<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>

<a href="https://mail.kde.org/mailman/listinfo/nepomuk" target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br>

<br>

</blockquote>

______________________________<u></u>_________________<br>

Nepomuk mailing list<br>

<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>

<a href="https://mail.kde.org/mailman/listinfo/nepomuk" target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br><span style="color:rgb(192,192,192)">Vishesh Handa</span><br><br>