[Nepomuk] Review Request 109811: Metadata extractor for archive files (.zip, .tar.*, .ar)

Vishesh Handa me at vhanda.in
Mon Jul 7 09:59:57 UTC 2014


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/109811/#review61787
-----------------------------------------------------------


Dennis. Would it be okay if this was discarded? Nepomuk is no longer used, and it isn't going to be simple to port this to Baloo. Also, even if we did manage to port this to Baloo we still have the same problems of file paths which are not local.

- Vishesh Handa


On May 11, 2013, 8:48 a.m., Denis Steckelmacher wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/109811/
> -----------------------------------------------------------
> 
> (Updated May 11, 2013, 8:48 a.m.)
> 
> 
> Review request for Nepomuk.
> 
> 
> Repository: nepomuk-core
> 
> 
> Description
> -------
> 
> This patch adds a file metadata extractor for archive files. This extractor handles any file that can be read using KArchive.
> 
> The metadata extracted are the uncompressed size of the whole archive (shown in Dolphin, but not formatted like a file size using KB or MB suffixes), and the list of files it contains. The extractor creates one Nepomuk resource per file or directory in the archive (root directory included). These resources have the types ArchiveEntry, and FileDataObject (for files) or Folder (for directories). They also have their nie:url property set to an URL that can be used with the Archive KIO (for instance, "zip:/home/me/archive.zip/one/file" or "tar:/usr/src/linux-3.7.2.tar.xz"). For files, their fileSize is set to the uncompressed size of the file.
> 
> The files themselves are not read nor uncompressed. I haven't found a way to recursively extract metadata of archived files (for instance, launching the PlainTextExtractor on any plain text file found in the archive).
> 
> 
> Diffs
> -----
> 
>   services/fileindexer/indexer/CMakeLists.txt 3474a43 
>   services/fileindexer/indexer/archiveextractor.h PRE-CREATION 
>   services/fileindexer/indexer/archiveextractor.cpp PRE-CREATION 
>   services/fileindexer/indexer/nepomukarchiveextractor.desktop PRE-CREATION 
> 
> Diff: https://git.reviewboard.kde.org/r/109811/diff/
> 
> 
> Testing
> -------
> 
> nepomukindexer seems to work. Nepomukshow displays meaningful information about the files indexed, the archive itself and the files contained in it. For a test archive, nepomukshow displays these informations :
> 
> $ nepomukshow test.zip
> <nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d>  # Note this ID
>         rdf:type              nfo:FileDataObject                                             
>         rdf:type              nfo:Archive                                                    
>         rdf:type              nie:InformationElement                                         
>         nao:created           2013-04-01T13:57:16.586Z                                       
>         nao:lastModified      2013-04-01T13:57:17.414Z                                       
>         nie:lastModified      2013-02-28T20:49:24Z                                           
>         nie:url               file:///home/steckdenis/test.zip  
>         nie:mimeType          application/zip                                                
>         nie:created           2013-02-28T20:49:24Z                                           
>         nfo:fileSize          3368744                                                        
>         nfo:uncompressedSize  4171547                                                        
>         nfo:fileName          test.zip                                                     
>         kext:indexingLevel    2
> 
> Displaying the metadata of a file contained in the archive can be done by passing an URL to nepomukshow :
> 
> $ nepomukshow 'zip:/home/steckdenis/test.zip/'
> <nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7>  # Note this ID, it is the one of the root compressed directory
>         rdf:type                nfo:ArchiveItem                                                
>         rdf:type                nfo:Folder                                                     
>         rdf:type                nfo:FileDataObject                                             
>         rdf:type                nfo:DataContainer                                              
>         nao:created             2013-04-01T13:57:17.416Z                                       
>         nao:lastModified        2013-04-01T13:57:17.416Z                                       
>         nie:url                 <zip:/home/steckdenis/test.zip/>  
>         nie:created             1970-01-01T00:00:00Z                                           
>         nfo:belongsToContainer  nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d # ID of the archive file itself
> 
> $ nepomukshow 'zip:/home/steckdenis/test.zip/6 My account1.png'
> <nepomuk:/res/ed73aabc-ce18-4ac7-9db7-f301ce07ffc5>
>         rdf:type                nfo:ArchiveItem                                                                     
>         rdf:type                nfo:FileDataObject                                                                  
>         nao:created             2013-04-01T13:57:17.417Z                                                            
>         nao:lastModified        2013-04-01T13:57:17.417Z                                                            
>         nie:url                 <zip:/home/steckdenis/test.zip/6%20My%20account1.png>  
>         nie:created             2012-11-21T08:21:08Z                                                                
>         nfo:fileSize            330923                                            # Uncompressed size                         
>         nfo:belongsToContainer  nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7 # ID of the root directory
> 
> When entering "6 My account1.png" in KRunner, the file is shown as an "Archive entry". When clicking on it, Gwenview is launched and displays the image.
> 
> 
> Thanks,
> 
> Denis Steckelmacher
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20140707/984fadd3/attachment.html>


More information about the Nepomuk mailing list