[Fwd: [Bug 200596] [Patch] id3v1 japanese characters encoding]

Mon Nov 16 19:52:53 CET 2009

On 2009-11-15 1:58 PM, Jeff Mitchell wrote:
> Hello,
>
> We've had some trouble with character set detection that Cesar has
> managed to narrow down to TagLib stripping the Unicode BOM from its
> strings -- see the attached message, and if you need more background the
> last few comments of the associated bug report. Would it be possible to
> provide one of these two solutions?
>
> Thanks,
> Jeff
>    
>
For reference, Songbird has a local patch to do B) - it exposes whether 
the string was constructed with the Latin1 or one of the Unicode types.  
That's an API change, of course, so it probably wouldn't be valid for 
upstream until 2.0.

TagLib::String::isLatin1() can be used for now, but will still get you 
false positives because it _is_ valid to have metadata that happens to 
be Unicode, contains non-ASCII Latin1, and will be misdetected.  At 
least, I don't trust my charset detector that much :)

(The patch is http://timeline.songbirdnest.com/vendor/changeset/10855 
which doesn't make much sense until you realize we had a previous patch 
in http://timeline.songbirdnest.com/vendor/changeset/10852 which felt 
suckier.)

-- 
Mook
mook at songbirdnest