[Nepomuk] Review Request 112712: MIME/mbox, vCard and "web actions" file extractors

Simeon Bird bladud at gmail.com
Wed Oct 9 16:07:13 UTC 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112712/#review41434
-----------------------------------------------------------



>> Check the resource is non-empty before merging it

> How can the resource be empty ? Every e-mail has at least a title and plain text content. Do you want me to check that the > e-mail is actually valid and not an empty e-mail (a corrupted MIME file for instance) ?

Yup, that's right. Experience shows that any possible corrupt file will be out there somewhere.

- Simeon Bird


On Oct. 9, 2013, 9:06 a.m., Denis Steckelmacher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/112712/
> -----------------------------------------------------------
> 
> (Updated Oct. 9, 2013, 9:06 a.m.)
> 
> 
> Review request for Nepomuk.
> 
> 
> Repository: nepomuk-core
> 
> 
> Description
> -------
> 
> This patch adds three new files extractors to Nepomuk. Two of them are of general use, and the third (that can be removed if it hasn't its place in Nepomuk) is specific to the use-case described in http://steckdenis.be/post-2013-09-06-a-nepomuk-integration-plugin-for-firefox.html .
> 
> The MIME/mbox file extractor takes an mbox file or MIME files (as found in Maildir directory trees) and index them as NMO:Message objects. The full content of the e-mails is indexed along with their title, sender, receiver, CC/BCC, date and message ID. NCO:Contacts and NCO:EmailAddress are created when needed. The main use of this indexer is to index e-mails managed by mutt, Thunderbird or any other e-mail client that does not use Akonadi.
> 
> This indexer is a bit special because it also queries the Nepomuk server. mbox files are typically huge, and change every time the user adds or removes a mail from it. This can cause many re-indexing operations, and as the file is big, every indexing operation can take quite a long time. To fasten the process, the file indexer tries to find already-indexed e-mails with the same messageID as the e-mails to be indexed. If a mail was already indexed, it is skipped. This reduces the amount of data transferred to the Nepomuk server (the full text of the mail doesn't have to be sent to the server only for it to detect a duplicate message), and a mbox file that took several minutes to index now only requires a couple of seconds.
> 
> The vCard indexer parses vCard files using the KABC library and stores every information found in them in NCO:Contact objects. vCard files containing more than one contact are supported. This allows users to export their contacts from a webmail or a contact-management application, and to have them indexed in Nepomuk.
> 
> The last indexer reads .webaction files, that consist of one line describing the action "DOWNLOAD", then one parameter per line. This file indexer is used by the Nepomuk Integration plugin for Firefox, that uses this kind of file to establish a link between a downloaded file and its original location on the Internet. If you don't want such a specific file indexer to be part of the Nepomuk Libraries, it can be removed from this patch.
> 
> All these file indexers create resources but don't touch the indexed file itself. The reason is that a mbox file is not an e-mail, a vCard file is not a contact (it describes a contact), and also that these files can be temporary (for instance, the Firefox add-on creates a temporary MIME file whenever the user reads a mail on a webmail, and this file is deleted when the computer is shut down).
> 
> 
> Diffs
> -----
> 
>   CMakeLists.txt 6e55d5e 
>   services/fileindexer/indexer/CMakeLists.txt bcf8da2 
>   services/fileindexer/indexer/mimeextractor.h PRE-CREATION 
>   services/fileindexer/indexer/mimeextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/nepomukmimeextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/nepomukvcardextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/nepomukwebactionextractor.desktop PRE-CREATION 
>   services/fileindexer/indexer/vcardextractor.h PRE-CREATION 
>   services/fileindexer/indexer/vcardextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/webactionextractor.h PRE-CREATION 
>   services/fileindexer/indexer/webactionextractor.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/112712/diff/
> 
> 
> Testing
> -------
> 
> Nepomuk Core builds with this patch applied. MIME, mbox (as produced by Thunderbird), vCard (exported from Yahoo! Mail) and webactions files are correctly indexed. If you want to test the webaction indexer, create a file somewhere (say "/tmp/test.txt"), and then put this in a .webaction file:
> 
> DOWNLOAD
> http://www.example.com
> http://www.example.com/test.txt
> /tmp/test.txt
> 
> Then, use "nepomukindexer" to index the .webaction file. Use "nepomukshow" on the /tmp/test.txt file, and check that everything is okay. You can also open Dolphin and see that the "downloaded from" information of the test.txt file is correctly displayed.
> 
> 
> Thanks,
> 
> Denis Steckelmacher
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20131009/250ea53f/attachment.html>


More information about the Nepomuk mailing list