[Owncloud] ownCloud 5.0.10 - lucene fails to index .txt files?

Stefan Vollmar vollmar at nf.mpg.de
Fri Aug 23 10:32:31 UTC 2013


Dear Jörn,

On 22.08.2013, at 23:49, Jörn Friedrich Dreyer wrote:

> The warnings about pdf and word are from getid3 lib and can be ignored if you are using search lucene. It comes with special indexers for these filetypes.
> 
> The error about not beeing able to determine the file format for txt files also is from getid3 and might be caused by empty txt files.

Can get we rid of the error messages?

> Can you check if the reported txt file has 0 bytes? Can you search for a text in the pdf or word files and see if you get any results?

The text file is not empty. We have manually scheduled a re-scan of all files and this might be the reason that now *some* search terms yield results with that txt-file, we also have hits inside the PDF file. So, in principle, search_lucene does seem to do something. Is there a way to monitor what lucene is doing exactly and whether it has already indexed a particular file at all?

However, simple matching of file names (which should be much simpler and is really helpful if you have a nested directory structure with many files) is not nearly as good as it could be: it required the full "readme" before "readme.txt" is offered as a hit, likewise all characters of "tourismus" before "tourismus.jpg" turns up as a potential hit.

Likewise "Serverraum" finds "Serverraum" in a PDF, however "server" or "raum" triggers nothing. I will not say that this is useless, but it does not compare favorably with either the Google or the Spotlight search engine - is this maybe something that is configurable?

Many thanks in advance.
Warm regards,
 Stefan


> 
> So long
> 
> Jörn
> 
> 
> 
> Stefan Vollmar <vollmar at nf.mpg.de> schrieb:
> Hello,
> 
> we seem to have problems with indexing files - this apparently works well for some files and does not for others - so far we have not worked out a pattern.
> 
> uname -a
> Linux owncloud 3.5.0-39-generic #60~precise1-Ubuntu 
> 
> ownCloud 5.0.10
> 
> Error messages in /owncloud/data/owncloud.log (see below) seem to suggest that the file type of simple ".txt" files could not be determined? These days, I would also expect indexing of PDF data - but a failure to index ".txt"-files definitely sound like a bug, right? 
> 
> Many thanks in advance.
> 
> Best regards,
> Stefan
> 
> {"app":"PHP","message":"iconv(): Detected an illegal character in input string at \/var\/www\/owncloud\/apps\/search_lucene\/3rdparty\/Zend\/Search\/Lucene\/Analysis\/Analyzer\/Common\/TextNum.php#58","level":2,"time":"2013-08-22T20:00:07+00:00"}
> {"app":"PHP","message":"Only variables should be passed by reference at
> \/var\/www\/owncloud\/apps\/search_lucene\/lib\/indexer.php#163","level":2,"time":"2013-08-22T20:02:33+00:00"}
> 
> {"app":"search_lucene","message":"failed to extract meta information for \/stefan\/files\/x.pdf: PDF parsing not enabled in this version of getID3() [1.9.3-20111213]","level":2,"time":"2013-08-22T20:02:34+00:00"}
> {"app":"search_lucene","message":"failed to extract meta information for \/stefan\/files\/y.doc: MS Office (.doc, .xls, etc) parsing not enabled in this version of getID3() [1.9.3-20111213]","level":2,"time":"2013-08-22T20:02:55+00:00"}
> {"app":"search_lucene","message":"failed to extract meta information for \/stefan\/files\/z.txt: unable to determine file format","level":2,"time":"2013-08-22T20:03:22+00:00"}
> {"app":"search_lucene","message":"failed to extract meta information for \/stefan\/files\/z (2).txt: unable to determine file format","level":2,"time":"2013-08-22T20:03:42+00:00"}
> 
> -- 
> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
> _______________________________________________
> Owncloud mailing list
> Owncloud at kde.org
> https://mail.kde.org/mailman/listinfo/owncloud

-- 
Dr. Stefan Vollmar, Dipl.-Phys.
Head of IT group
Max-Planck-Institut für neurologische Forschung
Gleuelerstr. 50, 50931 Köln, Germany
Tel.: +49-221-4726-213  FAX +49-221-4726-298
Tel.: +49-221-478-5713  Mobile: 0160-93874279
Email: vollmar at nf.mpg.de   http://www.nf.mpg.de






-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4490 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/owncloud/attachments/20130823/5a8ae264/attachment.bin>


More information about the Owncloud mailing list