[Nepomuk] Review Request 109811: Metadata extractor for archive files (.zip, .tar.*, .ar)

Denis Steckelmacher steckdenis at yahoo.fr
Mon Apr 1 17:36:56 UTC 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/109811/
-----------------------------------------------------------

(Updated April 1, 2013, 5:36 p.m.)


Review request for Nepomuk.


Changes
-------

Say in the description that KRunner is now able to display the files extracted from the archive files. Clicking on them open them successfully.


Description
-------

This patch adds a file metadata extractor for archive files. This extractor handles any file that can be read using KArchive.

The metadata extracted are the uncompressed size of the whole archive (shown in Dolphin, but not formatted like a file size using KB or MB suffixes), and the list of files it contains. The extractor creates one Nepomuk resource per file or directory in the archive (root directory included). These resources have the types ArchiveEntry, and FileDataObject (for files) or Folder (for directories). They also have their nie:url property set to an URL that can be used with the Archive KIO (for instance, "zip:/home/me/archive.zip/one/file" or "tar:/usr/src/linux-3.7.2.tar.xz"). For files, their fileSize is set to the uncompressed size of the file.

The files themselves are not read nor uncompressed. I haven't found a way to recursively extract metadata of archived files (for instance, launching the PlainTextExtractor on any plain text file found in the archive).


Diffs
-----

  services/fileindexer/indexer/CMakeLists.txt 97bedfd 
  services/fileindexer/indexer/archiveextractor.h PRE-CREATION 
  services/fileindexer/indexer/archiveextractor.cpp PRE-CREATION 
  services/fileindexer/indexer/nepomukarchiveextractor.desktop PRE-CREATION 

Diff: http://git.reviewboard.kde.org/r/109811/diff/


Testing (updated)
-------

nepomukindexer seems to work. Nepomukshow displays meaningful information about the files indexed, the archive itself and the files contained in it. For a test archive, nepomukshow displays these informations :

$ nepomukshow test.zip
<nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d>  # Note this ID
        rdf:type              nfo:FileDataObject                                             
        rdf:type              nfo:Archive                                                    
        rdf:type              nie:InformationElement                                         
        nao:created           2013-04-01T13:57:16.586Z                                       
        nao:lastModified      2013-04-01T13:57:17.414Z                                       
        nie:lastModified      2013-02-28T20:49:24Z                                           
        nie:url               file:///home/steckdenis/test.zip  
        nie:mimeType          application/zip                                                
        nie:created           2013-02-28T20:49:24Z                                           
        nfo:fileSize          3368744                                                        
        nfo:uncompressedSize  4171547                                                        
        nfo:fileName          test.zip                                                     
        kext:indexingLevel    2

Displaying the metadata of a file contained in the archive can be done by passing an URL to nepomukshow :

$ nepomukshow 'zip:/home/steckdenis/test.zip/'
<nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7>  # Note this ID, it is the one of the root compressed directory
        rdf:type                nfo:ArchiveItem                                                
        rdf:type                nfo:Folder                                                     
        rdf:type                nfo:FileDataObject                                             
        rdf:type                nfo:DataContainer                                              
        nao:created             2013-04-01T13:57:17.416Z                                       
        nao:lastModified        2013-04-01T13:57:17.416Z                                       
        nie:url                 <zip:/home/steckdenis/test.zip/>  
        nie:created             1970-01-01T00:00:00Z                                           
        nfo:belongsToContainer  nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d # ID of the archive file itself

$ nepomukshow 'zip:/home/steckdenis/test.zip/6 My account1.png'
<nepomuk:/res/ed73aabc-ce18-4ac7-9db7-f301ce07ffc5>
        rdf:type                nfo:ArchiveItem                                                                     
        rdf:type                nfo:FileDataObject                                                                  
        nao:created             2013-04-01T13:57:17.417Z                                                            
        nao:lastModified        2013-04-01T13:57:17.417Z                                                            
        nie:url                 <zip:/home/steckdenis/test.zip/6%20My%20account1.png>  
        nie:created             2012-11-21T08:21:08Z                                                                
        nfo:fileSize            330923                                            # Uncompressed size                         
        nfo:belongsToContainer  nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7 # ID of the root directory

When entering "6 My account1.png" in KRunner, the file is shown as an "Archive entry". When clicking on it, Gwenview is launched and displays the image.


Thanks,

Denis Steckelmacher

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130401/32e59388/attachment-0001.html>


More information about the Nepomuk mailing list