Charset Detector screwups

Jeff Mitchell mitchell at kde.org
Tue Oct 27 20:15:02 CET 2009


Peter,

I have some tracks that were sent to me from some Russian guy.

Two of the tracks are showing up as all ????????????? for every tag. I
tracked this down to the changes made by the charset detector, which
detects the charset as gb18030.

Thing is, it actually detects the charset of each track in the album as
random things -- some are "", some are gb18030, some are windows-1252.
But I think they're all actually UTF-8 -- even when I tell eyeD3 to
force-set the encoding to UTF-8 it still shows gb18030 on one of the
problematic tracks.

I've tried explicitly removing the ID3v1 tags to ensure that the ID3v2
tags are being read. No dice. The only thing that has worked is to
comment out the charset detector stuff, at which point it looks normal
in Amarok.

So, what to do here? There are comments in the file:

// HACK: charset-detector disabled, so all tags assumed utf-8
// TODO: fix charset-detector to detect encoding with higher accuracy

But Dan thinks it got fixed (although apparently not entirely) but that
the comment simply didn't get removed.

Any advice is appreciated.

--Jeff

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: OpenPGP digital signature
Url : http://mail.kde.org/pipermail/amarok-devel/attachments/20091027/442ecfaf/attachment.sig 


More information about the Amarok-devel mailing list