mimetype guessing is fooled by extension
Allan Sandfeld Jensen
kde at carewolf.com
Sun Jul 25 12:39:31 BST 2004
On Sunday 25 July 2004 13:14, Allan Sandfeld Jensen wrote:
> On Wednesday 21 July 2004 16:25, Luciano Montanaro wrote:
> > I created a very big file to test the file plugins (I noticed there were
> > problems earlier this year), and I have found that, at least, the c++ and
> > diff file plugin are tricked in a tight loop by it. I think this kind of
> > plugins should bail out on files of unreasonable length, however, another
> > issue is that the file was wrongly identified as a c++ file, while it
> > does not even qualify as a text file (I don't think '\0' a valid
> > character in a text file).
> >
> > "file prova.cpp" correctly says the file is a "data" file.
> > Can't the mime identification be made smarter, using the file extension
> > as an additinal hint instead of the only way to identify the file?
>
> Yes, by setting X-KDE-PatternAccuracy to <100.
> Notice that if you open the properties for the file, it will detect the
> content-mimetype more accurately.
>
> I will make take a look at the issue.
>
Oops. One major problem. The magic(content) detection code can correctly
detect diff, c++ and c-files. Diff will work fine by setting
X-KDE-PatternAccuracy as suggested above, but C and C++ is detected as
"text/x-c++" and "text/x-c" which does not exists as mimetypes in KDE (has
"text/x-csrc" and "text/x-chdr"). What is worse is that the magic-code
_cannot_ detect the difference between headers and source, so we end up in
situation where a combination of patterns and magic is needed to do proper
detection. There is currently no way to do that.
A partial fix would be to add "text/x-c" and "text/x-c++" as valid mimetypes
and let the "text/x-chdr"-type of mimetypes inherit from them. It would mean
though that a thourough mimetype detection (with magic) would leed to less
accurate results than a fast mimetype detection (only with patterns).
`Allan
More information about the kfm-devel
mailing list