<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="http://git.reviewboard.kde.org/r/109811/">http://git.reviewboard.kde.org/r/109811/</a>
</td>
</tr>
</table>
<br />
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<p style="margin-top: 0;">On April 1st, 2013, 4:27 p.m. UTC, <b>Ignacio Serantes</b> wrote:</p>
<blockquote style="margin-left: 1em; border-left: 2px solid #d0d0d0; padding-left: 10px;">
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Don't you think could be better add
resource.addProperty(NFO::fileName(), entry->name());
to
if (entry->isFile())
than add a new if?
What happens with nfo:fileName for compressed file? With your change you are not storing any value for this property.</pre>
</blockquote>
</blockquote>
<pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I really need to get used to these "publish" buttons, it's already two responses that don't get published because I don't see the button before you post a new comment.
The if I added prevents the extractor to set the nfo:fileName property of the root directory of the archive ("/"). This property does not seem to be that useful for the root directory of an archive. Either the user is looking for the archive itself (indexed by the SimpleExtractor), or for a file or directory in it.
I put if there because a file and a directory both need to have a file name set. With my test archives, I don't see empty file names. The files get their file name, and the directories have their name in this property. For example, an archive containing an Align directory, with an align.py file in has the nfo:fileName set to "Align" for the directory, and "align.py" for the file. I have tested .zip and .tar archives (.tar archives seem to have a wrong value in their uncompressedSize, though).</pre>
<br />
<p>- Denis</p>
<br />
<p>On April 1st, 2013, 4:14 p.m. UTC, Denis Steckelmacher wrote:</p>
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">
<tr>
<td>
<div>Review request for Nepomuk.</div>
<div>By Denis Steckelmacher.</div>
<p style="color: grey;"><i>Updated April 1, 2013, 4:14 p.m.</i></p>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This patch adds a file metadata extractor for archive files. This extractor handles any file that can be read using KArchive.
The metadata extracted are the uncompressed size of the whole archive (shown in Dolphin, but not formatted like a file size using KB or MB suffixes), and the list of files it contains. The extractor creates one Nepomuk resource per file or directory in the archive (root directory included). These resources have the types ArchiveEntry, and FileDataObject (for files) or Folder (for directories). They also have their nie:url property set to an URL that can be used with the Archive KIO (for instance, "zip:/home/me/archive.zip/one/file" or "tar:/usr/src/linux-3.7.2.tar.xz"). For files, their fileSize is set to the uncompressed size of the file.
The files themselves are not read nor uncompressed. I haven't found a way to recursively extract metadata of archived files (for instance, launching the PlainTextExtractor on any plain text file found in the archive).</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">nepomukindexer seems to work. Nepomukshow displays meaningful information about the files indexed, the archive itself and the files contained in it. For a test archive, nepomukshow displays these informations :
$ nepomukshow test.zip
<nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d> # Note this ID
rdf:type nfo:FileDataObject
rdf:type nfo:Archive
rdf:type nie:InformationElement
nao:created 2013-04-01T13:57:16.586Z
nao:lastModified 2013-04-01T13:57:17.414Z
nie:lastModified 2013-02-28T20:49:24Z
nie:url file:///home/steckdenis/test.zip
nie:mimeType application/zip
nie:created 2013-02-28T20:49:24Z
nfo:fileSize 3368744
nfo:uncompressedSize 4171547
nfo:fileName test.zip
kext:indexingLevel 2
Displaying the metadata of a file contained in the archive can be done by passing an URL to nepomukshow :
$ nepomukshow 'zip:/home/steckdenis/test.zip/'
<nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7> # Note this ID, it is the one of the root compressed directory
rdf:type nfo:ArchiveItem
rdf:type nfo:Folder
rdf:type nfo:FileDataObject
rdf:type nfo:DataContainer
nao:created 2013-04-01T13:57:17.416Z
nao:lastModified 2013-04-01T13:57:17.416Z
nie:url <zip:/home/steckdenis/test.zip/>
nie:created 1970-01-01T00:00:00Z
nfo:belongsToContainer nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d # ID of the archive file itself
$ nepomukshow 'zip:/home/steckdenis/test.zip/6 My account1.png'
<nepomuk:/res/ed73aabc-ce18-4ac7-9db7-f301ce07ffc5>
rdf:type nfo:ArchiveItem
rdf:type nfo:FileDataObject
nao:created 2013-04-01T13:57:17.417Z
nao:lastModified 2013-04-01T13:57:17.417Z
nie:url <zip:/home/steckdenis/test.zip/6%20My%20account1.png>
nie:created 2012-11-21T08:21:08Z
nfo:fileSize 330923 # Uncompressed size
nfo:belongsToContainer nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7 # ID of the root directory
When entering "6 My account1.png" in KRunner, the file is not found, though. I would have liked to see the file there, and to be able to click on it to have it opened in Gwenview using its zip:/ URL.</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>services/fileindexer/indexer/CMakeLists.txt <span style="color: grey">(97bedfd)</span></li>
<li>services/fileindexer/indexer/archiveextractor.h <span style="color: grey">(PRE-CREATION)</span></li>
<li>services/fileindexer/indexer/archiveextractor.cpp <span style="color: grey">(PRE-CREATION)</span></li>
<li>services/fileindexer/indexer/nepomukarchiveextractor.desktop <span style="color: grey">(PRE-CREATION)</span></li>
</ul>
<p><a href="http://git.reviewboard.kde.org/r/109811/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>