news at onastick.clara.co.uk
Sun Oct 14 09:38:33 UTC 2012
<CAGUtLj99L=hn-enSMMfmegekPvgjorLv+rLgm2Ys5Qie8-fCNA at mail.gmail.com>,
Lukáš Lalinský <lalinsky at gmail.com> writes
>On Sat, Oct 13, 2012 at 6:08 PM, jon bird <news at onastick.clara.co.uk> wrote:
>> Livin’ On The Edge
>> The issue seems to be on the translation of the accent character "’".
>> In the ISO character set I believe this is 0xB4.
>> The text is stored in the tag in unicode, with the accent character
>> encoded as:
>U+2019 and U+00B4 are two different characters, both exist in Unicode.
>The one in the string is U+2019, which is not representable in
Ok, thanks for the clarification on that one
>> As I understand it, the default text encoding is ISO-8859-1. I don't
>> change this so I would expect this character to be converted to 0xb4 in
>> the return string. However it isn't, what I end up with is 0x19 - in
>> effect the lower byte of the original UTF-16 string.
>You are right that toCString will convert the string to ISO-8859-1,
>but it does so very simply by simply stripping the Unicode code-points
>to 8-bits. That does the trick for ISO-8859-1, but for characters
>outside of ISO-8859-1 it simply returns the lower byte instead of
>either ignoring it or returning '?'. This could be seen as a bug, but
>you would not get the original string anyway, as it's not possible to
>encode it in ISO-8859-1.
I would think it a minor bug even so since you can end up with
characters outside the ISO character set - 0x19 falls outside that
range. For me, all I'd be looking for is those characters replaced (as
you suggest) with a '?' character although you could go to town and make
it customisable. For now, I'll need to perform this check on the string
returned back from taglib before handing it over to the rest of my s/w.
== jon bird - software engineer
== <reply to address _may_ be invalid, real mail below>
== <reduce rsi, stop using the shift key>
== posted as: news 'at' onastick 'dot' clara.co.uk
More information about the taglib-devel