Charset Detector screwups

Peter Zhou peterzhoulei at gmail.com
Wed Oct 28 05:30:44 CET 2009


Hi Jeff,

There is a trade off
.
The src/meta/file_h.f(line 234) will ONLY decode the string when it detects
as chinese, korean or Japanese charsets. There is a similar implementation
in the scanner.

if ( ( track_encoding.toUtf8() == "gb18030" ) || ( track_encoding.toUtf8()
== "big5" )
                    || ( track_encoding.toUtf8() == "euc-kr" ) || (
track_encoding.toUtf8() == "euc-jp" )
                    || ( track_encoding.toUtf8() == "koi8-r" ) )
//decode
else
//leave it as UTF-8

I assume there is little possibility for the UTF-8 tracks to be detected as
gb18030 or euc-kr or whatever. But that does not means 100% accuracy. I am
expecting 95% accuracy. But if you guys think most users only have UTF-8
tags, then just remove the detecter.



On Wed, Oct 28, 2009 at 3:15 AM, Jeff Mitchell <mitchell at kde.org> wrote:

> Peter,
>
> I have some tracks that were sent to me from some Russian guy.
>
> Two of the tracks are showing up as all ????????????? for every tag. I
> tracked this down to the changes made by the charset detector, which
> detects the charset as gb18030.
>
> Thing is, it actually detects the charset of each track in the album as
> random things -- some are "", some are gb18030, some are windows-1252.
> But I think they're all actually UTF-8 -- even when I tell eyeD3 to
> force-set the encoding to UTF-8 it still shows gb18030 on one of the
> problematic tracks.
>
> I've tried explicitly removing the ID3v1 tags to ensure that the ID3v2
> tags are being read. No dice. The only thing that has worked is to
> comment out the charset detector stuff, at which point it looks normal
> in Amarok.
>
> So, what to do here? There are comments in the file:
>
> // HACK: charset-detector disabled, so all tags assumed utf-8
> // TODO: fix charset-detector to detect encoding with higher accuracy
>
> But Dan thinks it got fixed (although apparently not entirely) but that
> the comment simply didn't get removed.
>
> Any advice is appreciated.
>
> --Jeff
>
>


-- 
Best Regards,
Peter Zhou
-------------------------------
http://www.peterzl.net/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/amarok-devel/attachments/20091028/6cb4929d/attachment.htm 


More information about the Amarok-devel mailing list