[Bug 166222] New: universal charset encoding detection in katepart kencodingdetector

yoyocat fearee at gmail.com
Thu Jul 10 11:39:57 BST 2008


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
         
http://bugs.kde.org/show_bug.cgi?id=166222         
           Summary: universal charset encoding detection in katepart
                    kencodingdetector
           Product: kde
           Version: unspecified
          Platform: Compiled Sources
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: wishlist
          Priority: NOR
         Component: general
        AssignedTo: unassigned-bugs kde org
        ReportedBy: fearee gmail com


Version:           4.0.85 (using Devel)
Installed from:    Compiled sources
Compiler:          gcc 4.1.2 --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran --disable-libgcj --with-cpu=generic --host=i686-pc-linux
OS:                Linux

Hi, i have port firefox's charset detection and add some patches to kde4.0.85, to make universal charset autodetection works in kwrite(kate) and konqueror(all apps use KEncodingDetector).

ftp://orafy:public public sjtu edu cn/mozilla-chardet-0 1 tar bz2
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-cmake patch
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-katedialogs patch
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-kcodecaction patch
ftp://orafy:public public sjtu edu cn/kdelibs-4 0 85-kencoding patch

Screenshot: kwrite's config dialog which shows the new added "Universal" option is
charset detection combox.
http://img61.imageshack.us/my.php?image=80961170fw7.png

untar mozilla-chardet-0.1.tar.bz2 and cmake && make && make install
mozilla-chardet depends on nothing so it's also easy to include it in the source branch.

I'm a chinese kde user,
 after I apply these patches, i have tested it by using kwrite to open big5/gb18030/enc-jp encoded documents, correctness is nearly 100%.
The encoding detection algorithm is very complex compared to kdecore/localization/*
A paper describe their methods:
 http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html
Mozilla's related sourcecode:
http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/src/



More information about the Unassigned-bugs mailing list