Generic API useless with Unicode for ID3v2?
Vitali Lovich
vlovich at gmail.com
Wed Oct 17 11:11:13 CEST 2007
From the header documentation:
* \note This will not change the text encoding of the frame even if the
* strings passed in are not of the same encoding. Please use
* setEncoding(s.type()) if you wish to change the encoding of the frame
The better question is why UTF-8 isn't used as the default encoding
everywhere. It's the most compatible since it doesn't change the ASCII
byte stream in any way and UTF16 & UTF32 are, as far as I'm aware, equal
sets with UTF-8 in terms of displayable characters. That way, all
encodings could be converted into UTF8 and all functions from String
return a UTF8 string and the host program doesn't have to worry about
encoding. Additionally, when modifying a frame, it would automatically
have it's encoding set to UTF8.
The only exception would be a conversion exception (which shouldn't
happen as far as I'm aware). In this case, a UTF16 string would be
created with the same endianess as the host system (i.e. UTF16), and
only the "data" function call could retrieve the raw byte stream. In
this case, the encoding would be changed in the text identification
frame on a set text by calling some kind of check routine to determine
if the new string is UTF16 or UTF8.
Given the programming & conceptually simplicity & compatability of UTF8,
it seems to be the best encoding for mutlilingual support (or just
general string encoding IMHO).
"The *UTF-8* encoding defined in ISO 10646-1:2000 Annex D
<http://www.cl.cam.ac.uk/%7Emgk25/ucs/ISO-10646-UTF-8.html> and also
described in RFC 3629 <http://www.ietf.org/rfc/rfc3629.txt> as well as
section 3.9 of the Unicode 4.0 standard does not have these problems. It
is clearly the way to go for using *Unicode* under Unix-style operating
systems."
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 (Also talks about some
security considerations)
Also on a side note - UTF8 is the de-facto and preferred string encoding
on the Mac operating system. Also Windows' code pages are based upon a
bastardization of Latin1 called Windows 1252 and I'm not sure of the
issues of converting that to UTF-8, but given the platform origin of
this project and the names & concepts for the encodings, and the mp3
spec, I don't think it's an issue.
Vitali
On a side note - can the Reply-To address in emails be set to
taglib-devel because I always hit the reply button and almost never
bother to change the To address, thus the response only gets sent to the
Linus Walleij wrote:
> 2007/10/17, Andreas Klöckner <lists at informa.tiker.net>:
>
>
>> I know (or rather, found out painfully) that I also have to set the text
>> encoding when setting the text.
>>
>
> As far as I've seen all tag libraries have this problem, atleast
> libid3tag and id3lib.
>
> In the end I wrote code that removed the old ID3v2 header
> altogether and rendered and attached an entirely new header,
> explicitly setting the encoding of each string field. This worked...
>
> I would also like taglib to do this, so it's +1.
>
> Linus
> _______________________________________________
> taglib-devel mailing list
> taglib-devel at kde.org
> https://mail.kde.org/mailman/listinfo/taglib-devel
>
>
More information about the taglib-devel
mailing list