Hi, some comments about encoding detection (KEncodingDetector)
Andreas Hartmetz
ahartmetz at gmail.com
Sat Jul 26 21:32:32 BST 2008
On Wednesday 23 July 2008 08:32:11 wang kai wrote:
> i attach a big patch
> summary:
> 1.port mozilla's detection code
> 2.ChinseSimplified/Tradition encoding detection for KEncodingDetector:
> automaticDetectForChinese() can detect gb18030/big5/utf8 encoding
>
> patch is too long to include in the mail (size: 529k)
> get it from
> ftp://orafy:public@public.sjtu.edu.cn/encodingDetection.patch
>
The patch is maybe somewhat too conservative - I assume that Mozilla's charset
detector is better than KDE's for all encodings. It contains lots of big
tables. They must be good for something :)
The API of KEncodingDetector is not nice anyway. What I'd like to see is a
KEncodingDetector2 (for lack of a better name) with a *very* simple API:
void reset();
void feed(const QByteArray &input); //or call it input() ?
<some enum> detectedEncoding() const;
int percentConfidence() const; //if possible, not very important
If feed() gets an incomplete unicode/otherwise composite char at the end there
should be no need to tell the detector "watch out, more blocks are coming".
It should just cache the incomplete char and put it together when more input
arrives. Ignore it for the result in the meantime.
--
- This place reeks of adventure and excitement, Sam!
- I thought it was this tuna fish sandwich I found crawling with life in my
coat pocket.
More information about the kde-core-devel
mailing list