Slow MP3 tag parsing/ByteVector::replace() performance

Mathias Panzenböck grosser.meister.morti at gmx.net
Mon Aug 1 02:19:04 UTC 2011


I'm no official TagLib contributor, just submitted one patch that got merged so far. Still:

Someone else reported the same issue a while ago and even proposed a patch. However, out of interest 
(or boredom) I wrote a replacement for the replace function myself. Could you please compare the 
performance of the two patches and say which one is faster? For me the patch from Thomas Post did 
produce segmentation faults in my micro benchmarks, so I couldn't test it's performance at all. :/

I attached my patch as diff and as cpp source of the changed function. I also pushed the changes here:
https://github.com/panzi/taglib

	-panzi

On 08/01/2011 01:52 AM, Stephen F. Booth wrote:
> I've recently come across an MP3 file (13 MB) that TagLib takes approximately 4 seconds to parse on my machine:
>
> sbooth$ time ./tagreader slow.mp3
> <snip>
> real	0m4.312s
>
> My laptop is fairly quick- 2.3 GHz Core i7 with 8 GB of RAM- so the problem isn't computing power.
>
> I've done a bit of profiling, and the offending code is ByteVector::replace() by way of ID3v2::Tag::parse().  Specifically, line 518 of id3v2tag.cpp:
>
>    data = SynchData::decode(data);
>
> This basically calls through to ByteVector::replace(), which ends up spending 90% of its time in ::memcpy().
>
> Here is the source for replace():
>
> ByteVector&ByteVector::replace(const ByteVector&pattern, const ByteVector&with)
> {
>    if(pattern.size() == 0 || pattern.size()>  size())
>      return *this;
>
>    const int patternSize = pattern.size();
>    const int withSize = with.size();
>
>    int offset = find(pattern);
>
>    while(offset>= 0) {
>
>      const int originalSize = size();
>
>      if(withSize>  patternSize)
>        resize(originalSize + withSize - patternSize);
>
>      if(patternSize != withSize)
>        ::memcpy(data() + offset + withSize, mid(offset + patternSize).data(), originalSize - offset - patternSize);
>
>      if(withSize<  patternSize)
>        resize(originalSize + withSize - patternSize);
>
>      ::memcpy(data() + offset, with.data(), withSize);
>
>      offset = find(pattern, offset + withSize);
>    }
>
>    return *this;
> }
>
> I'm hoping there is a way to optimize this to speed things up.  find() seems quite fast, at least comparatively, so I was looking for a way to eliminate some of the temporaries, specifically the one generated by the call to mid().  However I'm not intimately familiar with this code and its relation to std::vector so I thought perhaps someone else would have more insight.
>
> Stephen
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: taglib-replace-performance.cpp
Type: text/x-c++src
Size: 1528 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/taglib-devel/attachments/20110801/4e0c7df2/attachment.cpp 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: taglib-replace-performance.patch
Type: text/x-patch
Size: 2394 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/taglib-devel/attachments/20110801/4e0c7df2/attachment.patch 


More information about the taglib-devel mailing list