Problems with mimetype recognition

David Faure faure at kde.org
Wed Oct 1 20:46:16 BST 2003


On Wednesday 01 October 2003 10:26, Andras Mantia wrote:
> Hi,
> 
>  Altough we have a method to determine if a file is text or not with the help
> of mimetypes and the [X-KDE-Text] property, I still got reports about users
> who say that their PHP files are not recognized as text ones. I got an
> example file from them and was surprised to see that the mimetype detection
> fails in KMimeType. In KMimeType::findFormatByFileContent first the mimetype
> is queried with findByFileContent() which calls
> KMimeMagic::self()->findFileType() on it's own. This call returns for the PHP
> file in case "application/octet-stream" and for another which is recognized
> as text file, suprise: "text/x-c++-src". So even for files that were
> recognized as text, it was only by chance as they were not treated as PHP.
> 
> This means that either the "magic code" is broken or the magic fields for PHP
> are broken. Can someone with more knowledge take a look? If needed, I can
> send the files in question.

I just added a rule that looks for <?php at the beginning of the file.
Problem is, we can only check at offset 0 (the XDG mimetype standard
suggests checking between offsets 0 and 64, this isn't possible in our
magic file currently, only by code).

> I think a workaround would be to use findByURL() instead of
> findByFileContent() as that one first tries a match by the extension, but
> this doesn't solve the problem that findByFileContent() returns the wrong
> mimetype.

Yes, you should definitely use findByURL().
The amount of PHP files not named *.php must be very very small, IMHO.

-- 
David FAURE, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).




More information about the kde-core-devel mailing list