[Bug 166222] New: universal charset encoding detection in katepart kencodingdetector
yoyocat
fearee at gmail.com
Thu Jul 10 11:39:57 BST 2008
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
http://bugs.kde.org/show_bug.cgi?id=166222
Summary: universal charset encoding detection in katepart
kencodingdetector
Product: kde
Version: unspecified
Platform: Compiled Sources
OS/Version: Linux
Status: UNCONFIRMED
Severity: wishlist
Priority: NOR
Component: general
AssignedTo: unassigned-bugs kde org
ReportedBy: fearee gmail com
Version: 4.0.85 (using Devel)
Installed from: Compiled sources
Compiler: gcc 4.1.2 --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran --disable-libgcj --with-cpu=generic --host=i686-pc-linux
OS: Linux
Hi, i have port firefox's charset detection and add some patches to kde4.0.85, to make universal charset autodetection works in kwrite(kate) and konqueror(all apps use KEncodingDetector).
ftp://orafy:public public sjtu edu cn/mozilla-chardet-0 1 tar bz2
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-cmake patch
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-katedialogs patch
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-kcodecaction patch
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-kencoding patch
Screenshot: kwrite's config dialog which shows the new added "Universal" option is
charset detection combox.
http://img61.imageshack.us/my.php?image=80961170fw7.png
untar mozilla-chardet-0.1.tar.bz2 and cmake && make && make install
mozilla-chardet depends on nothing so it's also easy to include it in the source branch.
I'm a chinese kde user,
after I apply these patches, i have tested it by using kwrite to open big5/gb18030/enc-jp encoded documents, correctness is nearly 100%.
The encoding detection algorithm is very complex compared to kdecore/localization/*
A paper describe their methods:
http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Mozilla's related sourcecode:
http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/src/
More information about the Unassigned-bugs
mailing list