Problems with mimetype recognition

Andras Mantia amantia at kde.org
Wed Oct 1 21:04:22 BST 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 01 October 2003 22:46, David Faure wrote:
> On Wednesday 01 October 2003 10:26, Andras Mantia wrote:
> > Hi,
> > 
> >  Altough we have a method to determine if a file is text or not with the 
help
> > of mimetypes and the [X-KDE-Text] property, I still got reports about 
users
> > who say that their PHP files are not recognized as text ones. I got an
> > example file from them and was surprised to see that the mimetype 
detection
> > fails in KMimeType. In KMimeType::findFormatByFileContent first the 
mimetype
> > is queried with findByFileContent() which calls
> > KMimeMagic::self()->findFileType() on it's own. This call returns for the 
PHP
> > file in case "application/octet-stream" and for another which is 
recognized
> > as text file, suprise: "text/x-c++-src". So even for files that were
> > recognized as text, it was only by chance as they were not treated as PHP.
> > 
> > This means that either the "magic code" is broken or the magic fields for 
PHP
> > are broken. Can someone with more knowledge take a look? If needed, I can
> > send the files in question.
> 
> I just added a rule that looks for <?php at the beginning of the file.
> Problem is, we can only check at offset 0 (the XDG mimetype standard
> suggests checking between offsets 0 and 64, this isn't possible in our
> magic file currently, only by code).
Not so good, but may be acceptable.

> 
> > I think a workaround would be to use findByURL() instead of
> > findByFileContent() as that one first tries a match by the extension, but
> > this doesn't solve the problem that findByFileContent() returns the wrong
> > mimetype.
> 
> Yes, you should definitely use findByURL().
> The amount of PHP files not named *.php must be very very small, IMHO.

I use findByURL() in my code, but it returns application/x-php and from this I 
can't figure out that it's a text file or not (well, I can if I have a list 
of text mimetypes...) This was the reason why the [X-KDE-Text] and 
findFormatByFileContent() was introduced. And this findFormatByFileContent() 
fails, because it doesn't use findByURL(). I think I will add the findByURL() 
call also there altough in this case findFormatByFileContent() doesn't look 
for format by content in every case. ;-)

Andras

> 
> -- 
> David FAURE, faure at kde.org, sponsored by Trolltech to work on KDE,
> Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
> 
> 
> 
> 
> 
> 

- -- 
Quanta Plus developer - http://quanta.sourceforge.net
K Desktop Environment - http://www.kde.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)

iD8DBQE/ezNHTQdfac6L/08RAgAlAKCSQtsp4TKkup/uMnQyOsqd6OiR6wCgjd5V
Ui6XyAhB5cANEOo+n8JXIqM=
=9vkX
-----END PGP SIGNATURE-----




More information about the kde-core-devel mailing list