make khtml/misc/decoder.* public

Ingo Klöcker kloecker at kde.org
Mon Mar 5 21:26:32 GMT 2007


On Monday 05 March 2007 17:02, Andreas Pakulat wrote:
> On 05.03.07 16:45:17, Anders Lund wrote:
> > On Monday 05 March 2007, Olaf Schmidt wrote:
> > > [ Anders Lund, Mo., 5. Mär. 2007 16:05 ]
> > >
> > > > (Any encoding claimed by the file only might be correct, even
> > > > if we look for it)
> > >
> > > If the encoding claimed by the file is invalid, then there should
> > > always be a warning. For example, if I open an HTML file that
> > > claims to be latin1 but actually uses utf8, then Kate should warn
> > > me (even if my system default is utf8).
> >
> > This would happen automatically I think.
>
> Uhm, I don't think the example is possible, after all a latin1
> encoded file may use any character from 0 to 255 and you can't find
> out if two adjactent bytes are utf-8 encoded character or 2 latin1
> encoded characters, at least unless you "define" that only letters,
> numbers and a few special symbols are allowed in a file... The other
> way around is possible.

In latin1 the characters 0x00-0x1F and 0x80-0x9F with the exception of 
\t, \n and \r (and probably \pagefeed which is used in the source code 
of some projects) are unprintable characters and they do not occur in 
latin1 text files.

I suggest to allow the user to specify a list of preferred encodings 
which should be tried one after the other until a suitable one is found 
similar to what KMail does when it tries to find a suitable encoding 
for encoding the composed message.

Regards,
Ingo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20070305/51a116cd/attachment.sig>


More information about the kde-core-devel mailing list