Sounds good on paper but a lot of work will be required and a logic layer will be necessary to avoid bad values, the all or nothing Nepomuk issue. Now I'm on vacation but I will write an extended opinion when I come back to Spain.<span></span><br>
<br>El lunes, 10 de septiembre de 2012, Vishesh Handa escribió:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hey everyone<br><br>This month I'm focusing on the file indexing part of Nepomuk, and right now it takes forever for Strigi to index all my files. Additionally, it doesn't do a very good job of it. I have tons of mp3 files whose metadata is not correctly outputted by Strigi. This obviously makes Nepomuk not index those files.<br>
<br>I realize this is a big change, but I would like to stop using Strigi. Here is why -<br><br>* Doesn't always handle PDFs, Microsoft Document Formats<br>* Doesn't always handle ID3 tags properly<br>* Seeks into video files thereby slowing down the extraction<br>
* Implements its own parsers for archives and utf handling<br>* Goes berserk handling some large video files<br>* Large code base<br>* Difficult to contribute to<br>* Very little documentation<br>* Un-maintained<br>* We have hacks on the Nepomuk side to get the correct types<br>
* We use KDE's mimetype detection instead of Strigi<br><br><br>I'm not the only one with this problem. We already have another project called the nepomuk-metadata-extractor [1] which implements the following indexers -<br>
* PDF ( Poppler Based )<br>* Audio Files ( Uses Taglib )<br>* Videos ( Only based on the file name )<br><br>I would like to move these indexers into nepomuk-core, and create light wrappers to handle whatever file types are missing. Just to be clear, I am not proposing a fancy plugin based architecture like Strigi. We would just be detecting the mimetype using KMimeType. It would then call the appropriate indexing class (if one exists) which would populate the SimpleResourceGraph or it would just add the appropriate rdf types.<br>
<br>I've created a simple page listing some of the common file formats [2] and how we would handle them. I obviously still need to figure out how we would handle document files. I would love to reuse the code in Calligra + Okular instead of rolling our own. Apart from that it seems fairly straight forward.<br>
<br>What do you guys think?<br><br>I don't think this entire port should take me more than a week. <br><br>[1] <a href="https://projects.kde.org/projects/playground/base/nepomuk-metadata-extractor" target="_blank">https://projects.kde.org/projects/playground/base/nepomuk-metadata-extractor</a><br>
[2] <a href="http://community.kde.org/Projects/Nepomuk/FileIndexing" target="_blank">http://community.kde.org/Projects/Nepomuk/FileIndexing</a><br><br>-- <br><span style="color:rgb(192,192,192)">Vishesh Handa</span><br><br>
</blockquote><br><br>-- <br>Best wishes,<div>Ignacio</div><div><br></div><br>