[Nepomuk] Review Request 112712: MIME/mbox, vCard and "web actions" file extractors

Denis Steckelmacher steckdenis at yahoo.fr
Fri Sep 13 12:38:06 UTC 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/112712/
-----------------------------------------------------------

Review request for Nepomuk.


Description
-------

This patch adds three new files extractors to Nepomuk. Two of them are of general use, and the third (that can be removed if it hasn't its place in Nepomuk) is specific to the use-case described in http://steckdenis.be/post-2013-09-06-a-nepomuk-integration-plugin-for-firefox.html .

The MIME/mbox file extractor takes an mbox file or MIME files (as found in Maildir directory trees) and index them as NMO:Message objects. The full content of the e-mails is indexed along with their title, sender, receiver, CC/BCC, date and message ID. NCO:Contacts and NCO:EmailAddress are created when needed. The main use of this indexer is to index e-mails managed by mutt, Thunderbird or any other e-mail client that does not use Akonadi.

This indexer is a bit special because it also queries the Nepomuk server. mbox files are typically huge, and change every time the user adds or removes a mail from it. This can cause many re-indexing operations, and as the file is big, every indexing operation can take quite a long time. To fasten the process, the file indexer tries to find already-indexed e-mails with the same messageID as the e-mails to be indexed. If a mail was already indexed, it is skipped. This reduces the amount of data transferred to the Nepomuk server (the full text of the mail doesn't have to be sent to the server only for it to detect a duplicate message), and a mbox file that took several minutes to index now only requires a couple of seconds.

The vCard indexer parses vCard files using the KABC library and stores every information found in them in NCO:Contact objects. vCard files containing more than one contact are supported. This allows users to export their contacts from a webmail or a contact-management application, and to have them indexed in Nepomuk.

The last indexer reads .webaction files, that consist of one line describing the action "DOWNLOAD", then one parameter per line. This file indexer is used by the Nepomuk Integration plugin for Firefox, that uses this kind of file to establish a link between a downloaded file and its original location on the Internet. If you don't want such a specific file indexer to be part of the Nepomuk Libraries, it can be removed from this patch.

All these file indexers create resources but don't touch the indexed file itself. The reason is that a mbox file is not an e-mail, a vCard file is not a contact (it describes a contact), and also that these files can be temporary (for instance, the Firefox add-on creates a temporary MIME file whenever the user reads a mail on a webmail, and this file is deleted when the computer is shut down).


Diffs
-----

  CMakeLists.txt 6e55d5e 
  services/fileindexer/indexer/CMakeLists.txt bcf8da2 
  services/fileindexer/indexer/mimeextractor.h PRE-CREATION 
  services/fileindexer/indexer/mimeextractor.cpp PRE-CREATION 
  services/fileindexer/indexer/nepomukmimeextractor.desktop PRE-CREATION 
  services/fileindexer/indexer/nepomukvcardextractor.desktop PRE-CREATION 
  services/fileindexer/indexer/nepomukwebactionextractor.desktop PRE-CREATION 
  services/fileindexer/indexer/vcardextractor.h PRE-CREATION 
  services/fileindexer/indexer/vcardextractor.cpp PRE-CREATION 
  services/fileindexer/indexer/webactionextractor.h PRE-CREATION 
  services/fileindexer/indexer/webactionextractor.cpp PRE-CREATION 

Diff: http://git.reviewboard.kde.org/r/112712/diff/


Testing
-------

Nepomuk Core builds with this patch applied. MIME, mbox (as produced by Thunderbird), vCard (exported from Yahoo! Mail) and webactions files are correctly indexed. If you want to test the webaction indexer, create a file somewhere (say "/tmp/test.txt"), and then put this in a .webaction file:

DOWNLOAD
http://www.example.com
http://www.example.com/test.txt
/tmp/test.txt

Then, use "nepomukindexer" to index the .webaction file. Use "nepomukshow" on the /tmp/test.txt file, and check that everything is okay. You can also open Dolphin and see that the "downloaded from" information of the test.txt file is correctly displayed.


Thanks,

Denis Steckelmacher

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130913/d29707ca/attachment.html>


More information about the Nepomuk mailing list