wide char patch

Yoshiki Yazawa yaz at cc.rim.or.jp
Fri Nov 24 07:33:30 CET 2006


Dear authors of taglib,

I am Yoshiki Yazawa, a developer of audacious media player.

Taglib is the primary tag library of our software. Thank you for the
great library.

I have a proposal to change the behavior when taglib is asked to
return meta data in latin1. Current implementation simply picks up
lower byte of internal ucs2be character and composes result string.
However, this behavior easily ruins wide characters into unrecoverable
garbage.

I think it is very reasonable and safe if taglib checks wide character
and returns an utf-8 string instead of a chain of lower bytes when the
internal string has wide characters even though it was asked to return
latin1.

Attached is the patch for this purpose. The patched taglib returns
exactly same latin1 string as before if internal string does not
contain any wide character, and returns a valid utf-8 string instead
of meaningless byte stream if internal string has any wide character.

I believe this behavior is much safer, and additionally, it is quite
useful for users who want raw meta data output for providing latin1
mode as the safe way to obtain raw meta data.

Best regards,


---------------------
Yoshiki Yazawa



diff -ruN taglib-1.4.org/taglib/toolkit/tstring.cpp taglib-1.4/taglib/toolkit/tstring.cpp
--- taglib-1.4.org/taglib/toolkit/tstring.cpp	2005-07-26 06:31:15.000000000 +0900
+++ taglib-1.4/taglib/toolkit/tstring.cpp	2006-05-26 12:02:55.000000000 +0900
@@ -202,12 +202,22 @@
   s.resize(d->data.size());
 
   if(!unicode) {
-    std::string::iterator targetIt = s.begin();
-    for(wstring::const_iterator it = d->data.begin(); it != d->data.end(); it++) {
-      *targetIt = char(*it);
-      ++targetIt;
+    bool haswide = false;
+    //pre-scan: is there any wide character? if so, convert the string into utf-8.
+    for(unsigned int i=0; i< d->data.size(); i++){
+      if(d->data[i] > 0xff){
+        haswide = true;
+        break;
+      }
+    }
+    if(!haswide){
+      std::string::iterator targetIt = s.begin();
+      for(wstring::const_iterator it = d->data.begin(); it != d->data.end(); it++) {
+        *targetIt = char(*it);
+        ++targetIt;
+      }
+      return s;
     }
-    return s;
   }
 
   const int outputBufferSize = d->data.size() * 3 + 1;


More information about the taglib-devel mailing list