[patch] tbytevector::find() bug with unicode?

Yi-an Huang yianwillis at gmail.com
Sun Mar 26 20:26:40 CEST 2006


Hi,

I tried to study the source code for a personal project to add SYLT support.
And
it seems to me that the find() routine in tbytevector has a problem to
handle
cases when byteAlign>1, which is typically the case when unicode is used.

To be more precise, the code segment in tbytevector.cpp::template<>
vectorfind()
is (may be) problematic. When offset > 0, it tries to "start at the next
byte
aligned block" and call find() recursively, however, the actual code seems
always to jump to one byte earlier. [toolkit/tbytevector.cpp:99]

      // start at the next byte aligned block

      Vector section = v.mid(offset + byteAlign - 1 - offset % byteAlign);
      int match = section.find(pattern, 0, byteAlign);
      return match >= 0 ? int(match + offset) : -1;

For example, when byteAlign==2 and offset is an even number, the offset is
always incremented by one, and the search within "section" actually scans
the
pattern on odd offsets only. What is worse, when a match is found (which is
incorrect anyway), match + offset actually points to the place one byte
earlier
than the match.

A quick fix can solve the problem. Note that it is obvious that when
byteAlign==1, nothing will change since no real "shifting" occurs anyway.
Besides that, I have not tested whether it may break code anywhere else.

      // start at the next byte aligned block

-      Vector section = v.mid(offset + byteAlign - 1 - offset % byteAlign);
+      Vector section = v.mid(offset % byteAlign == 0 ? offset :
+          (offset += byteAlign - offset % byteAlign));
      int match = section.find(pattern, 0, byteAlign);
      return match >= 0 ? int(match + offset) : -1;

-Willis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/taglib-devel/attachments/20060326/53c0198b/attachment.html 


More information about the taglib-devel mailing list