Hi, some comments about encoding detection (KEncodingDetector)

Andreas Hartmetz ahartmetz at gmail.com
Wed Jul 23 12:07:33 BST 2008



On Tuesday 22 July 2008 13:56:25 Mark Kretschmann wrote:
> On 7/22/08, wang kai <fearee at gmail.com> wrote:
> >  A composite approach to language/encoding detection :
> >  http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
> >
> >  Firefox has a great Universal Charset Detection with a mix of above 3
> > methods. I've tested another charset/encoding detector: python-chardet,
> > it's worse, Firefox' is the best  detector by far, Java already port it.
> > and its License(MPL) is a triple license now  and compatible with GPL. i
> > suggest you use it, not to re-invent the wheel.
>
> FYI, we're now also using Mozilla's encoding detector in Amarok 2,
> since KDE's method didn't seem as effective.
>
> It would be nice to have Mozilla's detector in KDElibs though :)

KMail needs this too to detect the encoding of attachments when adding them 
from a local file. For KMail/KDE 3 I backported KEncodingDetector but it's an 
ugly beast in more than one way (*) and apparently not the best at doing its 
job anyway.
I am very much in favor of putting Mozilla's detector into a nice class and 
adding that to kdelibs. Are the licenses compatible?

(*) The API is somewhat inconvenient and it also takes a const QByteArray, 
does a const_cast and goes on to modify it. That caused a nice bug in KMail 
which trashed attachments. UGH.

-- 
- I love making a racket. It's one of my favorite parts of this job.
- I love discharging unregistered firearms within city limits.




More information about the kde-core-devel mailing list