<html>

 <body>

  <div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">

   <table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">

    <tr>

     <td>

      This is an automatically generated e-mail. To reply, visit:

      <a href="http://git.reviewboard.kde.org/r/109811/">http://git.reviewboard.kde.org/r/109811/</a>

     </td>

    </tr>

   </table>

   <br />

 <pre style="white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">My main problem with this patch is that it is going to be hard to keep this information up to date. say the zip file moves, then we need to update the url of all the ArchiveItems. We already do that for files in the filewatcher, but then we will have to extend it for zip and other protocols.

Also, what happens when someone tries to open the url in a non kde application which doesn't support the zip/tar protocol. Say vlc or libreoffice? It would just lead to an ugly user experience.

I'm not sure if we want this indexer or not.</pre>

 <br />

<div>

<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; border-collapse: collapse; margin: 2px padding: 2px;">

 <thead>

  <tr>

   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; font-size: 9pt; padding: 4px 8px; text-align: left;">

    <a href="http://git.reviewboard.kde.org/r/109811/diff/3/?file=127299#file127299line92" style="color: black; font-weight: bold; text-decoration: underline;">services/fileindexer/indexer/archiveextractor.cpp</a>

    <span style="font-weight: normal;">

     (Diff revision 3)

    </span>

   </th>

  </tr>

 </thead>

 <tbody>

  <tr>

    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" align="right"><font size="2"></font></th>

    <td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>

    <th bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid #C0C0C0;" align="right"><font size="2">92</font></th>

    <td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">    <span class="n">archiveRes</span><span class="p">.</span><span class="n">addProperty</span><span class="p">(</span><span class="n">NFO</span><span class="o">::</span><span class="n">uncompressedSize</span><span class="p">(),</span> <span class="n">uncompressedSize</span><span class="p">);</span></pre></td>

  </tr>

 </tbody>

</table>

<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This is great, but my first concern is that what is the usre case of storing this information? Does anyone actually need it?

We have a history of storing information in Nepomuk without a clear usecase and that just increases the size of the database without any benefit.</pre>

</div>

<br />

<div>

<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; border-collapse: collapse; margin: 2px padding: 2px;">

 <thead>

  <tr>

   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; font-size: 9pt; padding: 4px 8px; text-align: left;">

    <a href="http://git.reviewboard.kde.org/r/109811/diff/3/?file=127299#file127299line121" style="color: black; font-weight: bold; text-decoration: underline;">services/fileindexer/indexer/archiveextractor.cpp</a>

    <span style="font-weight: normal;">

     (Diff revision 3)

    </span>

   </th>

  </tr>

 </thead>

 <tbody>

  <tr>

    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" align="right"><font size="2"></font></th>

    <td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>

    <th bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid #C0C0C0;" align="right"><font size="2">121</font></th>

    <td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>

  </tr>

 </tbody>

</table>

<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Maybe you could add the file type. Have a look at SimpleIndexer types for mimetype.

Though then it would come in the search results for music and I'm not sure everyone can handle them :/</pre>

</div>

<br />

<div>

<table width="100%" border="0" bgcolor="white" style="border: 1px solid #C0C0C0; border-collapse: collapse; margin: 2px padding: 2px;">

 <thead>

  <tr>

   <th colspan="4" bgcolor="#F0F0F0" style="border-bottom: 1px solid #C0C0C0; font-size: 9pt; padding: 4px 8px; text-align: left;">

    <a href="http://git.reviewboard.kde.org/r/109811/diff/3/?file=127299#file127299line147" style="color: black; font-weight: bold; text-decoration: underline;">services/fileindexer/indexer/archiveextractor.cpp</a>

    <span style="font-weight: normal;">

     (Diff revision 3)

    </span>

   </th>

  </tr>

 </thead>

 <tbody>

  <tr>

    <th bgcolor="#b1ebb0" style="border-right: 1px solid #C0C0C0;" align="right"><font size="2"></font></th>

    <td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; "></pre></td>

    <th bgcolor="#b1ebb0" style="border-left: 1px solid #C0C0C0; border-right: 1px solid #C0C0C0;" align="right"><font size="2">147</font></th>

    <td bgcolor="#c5ffc4" width="50%"><pre style="font-size: 8pt; line-height: 140%; margin: 0; ">    <span class="n">Nepomuk2</span><span class="o">::</span><span class="n">clearIndexedData</span><span class="p">(</span><span class="n">url</span><span class="p">)</span><span class="o">-></span><span class="n">exec</span><span class="p">();</span></pre></td>

  </tr>

 </tbody>

</table>

<pre style="margin-left: 2em; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">I'm not sure if this is a good idea. 

Why would you want to clear the indexed data of the zip url? It typically will not have any data stored.</pre>

</div>

<br />

<p>- Vishesh</p>

<br />

<p>On April 1st, 2013, 5:36 p.m. UTC, Denis Steckelmacher wrote:</p>

<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('http://git.reviewboard.kde.org/static/rb/images/review_request_box_top_bg.ab6f3b1072c9.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">

 <tr>

  <td>

<div>Review request for Nepomuk.</div>

<div>By Denis Steckelmacher.</div>

<p style="color: grey;"><i>Updated April 1, 2013, 5:36 p.m.</i></p>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>

 <table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">

 <tr>

  <td>

   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">This patch adds a file metadata extractor for archive files. This extractor handles any file that can be read using KArchive.

The metadata extracted are the uncompressed size of the whole archive (shown in Dolphin, but not formatted like a file size using KB or MB suffixes), and the list of files it contains. The extractor creates one Nepomuk resource per file or directory in the archive (root directory included). These resources have the types ArchiveEntry, and FileDataObject (for files) or Folder (for directories). They also have their nie:url property set to an URL that can be used with the Archive KIO (for instance, "zip:/home/me/archive.zip/one/file" or "tar:/usr/src/linux-3.7.2.tar.xz"). For files, their fileSize is set to the uncompressed size of the file.

The files themselves are not read nor uncompressed. I haven't found a way to recursively extract metadata of archived files (for instance, launching the PlainTextExtractor on any plain text file found in the archive).</pre>

  </td>

 </tr>

</table>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>

<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">

 <tr>

  <td>

   <pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">nepomukindexer seems to work. Nepomukshow displays meaningful information about the files indexed, the archive itself and the files contained in it. For a test archive, nepomukshow displays these informations :

$ nepomukshow test.zip

<nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d>  # Note this ID

        rdf:type              nfo:FileDataObject                                             

        rdf:type              nfo:Archive                                                    

        rdf:type              nie:InformationElement                                         

        nao:created           2013-04-01T13:57:16.586Z                                       

        nao:lastModified      2013-04-01T13:57:17.414Z                                       

        nie:lastModified      2013-02-28T20:49:24Z                                           

        nie:url               file:///home/steckdenis/test.zip  

        nie:mimeType          application/zip                                                

        nie:created           2013-02-28T20:49:24Z                                           

        nfo:fileSize          3368744                                                        

        nfo:uncompressedSize  4171547                                                        

        nfo:fileName          test.zip                                                     

        kext:indexingLevel    2

Displaying the metadata of a file contained in the archive can be done by passing an URL to nepomukshow :

$ nepomukshow 'zip:/home/steckdenis/test.zip/'

<nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7>  # Note this ID, it is the one of the root compressed directory

        rdf:type                nfo:ArchiveItem                                                

        rdf:type                nfo:Folder                                                     

        rdf:type                nfo:FileDataObject                                             

        rdf:type                nfo:DataContainer                                              

        nao:created             2013-04-01T13:57:17.416Z                                       

        nao:lastModified        2013-04-01T13:57:17.416Z                                       

        nie:url                 <zip:/home/steckdenis/test.zip/>  

        nie:created             1970-01-01T00:00:00Z                                           

        nfo:belongsToContainer  nepomuk:/res/e5eddbdb-995b-472f-9ef1-3a4ba4c9999d # ID of the archive file itself

$ nepomukshow 'zip:/home/steckdenis/test.zip/6 My account1.png'

<nepomuk:/res/ed73aabc-ce18-4ac7-9db7-f301ce07ffc5>

        rdf:type                nfo:ArchiveItem                                                                     

        rdf:type                nfo:FileDataObject                                                                  

        nao:created             2013-04-01T13:57:17.417Z                                                            

        nao:lastModified        2013-04-01T13:57:17.417Z                                                            

        nie:url                 <zip:/home/steckdenis/test.zip/6%20My%20account1.png>  

        nie:created             2012-11-21T08:21:08Z                                                                

        nfo:fileSize            330923                                            # Uncompressed size                         

        nfo:belongsToContainer  nepomuk:/res/71458f55-898c-4374-ad00-6ac5b1d9c9e7 # ID of the root directory

When entering "6 My account1.png" in KRunner, the file is shown as an "Archive entry". When clicking on it, Gwenview is launched and displays the image.</pre>

  </td>

 </tr>

</table>

<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>

<ul style="margin-left: 3em; padding-left: 0;">

 <li>services/fileindexer/indexer/CMakeLists.txt <span style="color: grey">(97bedfd)</span></li>

 <li>services/fileindexer/indexer/archiveextractor.h <span style="color: grey">(PRE-CREATION)</span></li>

 <li>services/fileindexer/indexer/archiveextractor.cpp <span style="color: grey">(PRE-CREATION)</span></li>

 <li>services/fileindexer/indexer/nepomukarchiveextractor.desktop <span style="color: grey">(PRE-CREATION)</span></li>

</ul>

<p><a href="http://git.reviewboard.kde.org/r/109811/diff/" style="margin-left: 3em;">View Diff</a></p>

  </td>

 </tr>

</table>

  </div>

 </body>

</html>