[WebKit-devel] [Bug 287690] New: KWebkitPart does not apply correct locale encoding settings on some pages with CJK characters.

moriramar at gmail.com moriramar at gmail.com
Sun Nov 27 16:11:54 UTC 2011


           Summary: KWebkitPart does not apply correct locale encoding
                    settings on some pages with CJK characters.
           Product: kwebkitpart
           Version: unspecified
          Platform: Gentoo Packages
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: general
        AssignedTo: webkit-devel at kde.org
        ReportedBy: moriramar at gmail.com

Version:           unspecified (using KDE 4.7.2) 
OS:                Linux

When I open some pages with both simplified Chinese characters and traditional
Chinese characters, some characters are not displayed correctly. Pages
containing both Chinese characters and Japanese characters might cause this
problem as well.

Personal guess:
These pages might be encoded in zh_CN.GBK or zh_CN.GB18030 (which contains more
character encodings), while KWebkitPart might apply zh_CN.GB2312 (which is
generally considered as a subset of GBK.).

Reproducible: Always

Steps to Reproduce:
1. Install a font covering CJK characters. Bitstream Cyberbit, WenQuanYi Zen
Hei, WenQuanYi Microhei or Droid is OK.
2. Make sure zh_CN.GBK, zh_CN.GB2312, zh_CN.GB18030, zh_CN.UTF-8 locales are
available on the system.
3. Open Konqueror 4.7.2 and enable Webkit mode.
4. Go to http://www.acfun.tv/v/ac265957/ , which might be a little slow.

Actual Results:  
In the top bold title line of the page content, a black box with white question
mark appears. In the next line, there are two black boxes seperated by a "W"
character, followed by a "o" character.
Trying "View >> Encoding >> Simplified Chinese >>" any GB* locales does not
solve the problem.
Opening this kind of pages has a chance to crash Konqueror.

Expected Results:  
No these black boxes and "W" or "o" characters in these two line.
KHTML can show this page well when encoding is set to "Simplified Chinese >>
GBK" or "Simplified Chinese >> GB18030", which can be referred to.

