Generic API useless with Unicode for ID3v2?

Vitali Lovich vlovich at gmail.com
Wed Oct 17 11:11:13 CEST 2007


 From the header documentation:

* \note This will not change the text encoding of the frame even if the
* strings passed in are not of the same encoding.  Please use
* setEncoding(s.type()) if you wish to change the encoding of the frame

The better question is why UTF-8 isn't used as the default encoding 
everywhere.  It's the most compatible since it doesn't change the ASCII 
byte stream in any way and UTF16 & UTF32 are, as far as I'm aware, equal 
sets with UTF-8 in terms of displayable characters.  That way, all 
encodings could be converted into UTF8 and all functions from String 
return a UTF8 string and the host program doesn't have to worry about 
encoding.  Additionally, when modifying a frame, it would automatically 
have it's encoding set to UTF8.

The only exception would be a conversion exception (which shouldn't 
happen as far as I'm aware).  In this case, a UTF16 string would be 
created with the same endianess as the host system (i.e. UTF16), and 
only the "data" function call could retrieve the raw byte stream.  In 
this case, the encoding would be changed in the text identification 
frame on a set text by calling some kind of check routine to determine 
if the new string is UTF16 or UTF8.

Given the programming & conceptually simplicity & compatability of UTF8, 
it seems to be the best encoding for mutlilingual support (or just 
general string encoding IMHO).

"The *UTF-8* encoding defined in ISO 10646-1:2000 Annex D 
<http://www.cl.cam.ac.uk/%7Emgk25/ucs/ISO-10646-UTF-8.html> and also 
described in RFC 3629 <http://www.ietf.org/rfc/rfc3629.txt> as well as 
section 3.9 of the Unicode 4.0 standard does not have these problems. It 
is clearly the way to go for using *Unicode* under Unix-style operating 
systems."

http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 (Also talks about some 
security considerations)

Also on a side note - UTF8 is the de-facto and preferred string encoding 
on the Mac operating system.  Also Windows' code pages are based upon a 
bastardization of Latin1 called Windows 1252 and I'm not sure of the 
issues of converting that to UTF-8, but given the platform origin of 
this project and the names & concepts for the encodings, and the mp3 
spec, I don't think it's an issue.

Vitali
On a side note - can the Reply-To address in emails be set to 
taglib-devel because I always hit the reply button and almost never 
bother to change the To address, thus the response only gets sent to the

Linus Walleij wrote:
> 2007/10/17, Andreas Klöckner <lists at informa.tiker.net>:
>
>   
>> I know (or rather, found out painfully) that I also have to set the text
>> encoding when setting the text.
>>     
>
> As far as I've seen all tag libraries have this problem, atleast
> libid3tag and id3lib.
>
> In the end I wrote code that removed the old ID3v2 header
> altogether and rendered and attached an entirely new header,
> explicitly setting the encoding of each string field. This worked...
>
> I would also like taglib to do this, so it's +1.
>
> Linus
> _______________________________________________
> taglib-devel mailing list
> taglib-devel at kde.org
> https://mail.kde.org/mailman/listinfo/taglib-devel
>
>   


More information about the taglib-devel mailing list