Slow MP3 tag parsing/ByteVector::replace() performance

Stephen F. Booth me at sbooth.org
Mon Aug 1 01:52:05 CEST 2011


I've recently come across an MP3 file (13 MB) that TagLib takes approximately 4 seconds to parse on my machine:

sbooth$ time ./tagreader slow.mp3 
<snip>
real	0m4.312s

My laptop is fairly quick- 2.3 GHz Core i7 with 8 GB of RAM- so the problem isn't computing power.

I've done a bit of profiling, and the offending code is ByteVector::replace() by way of ID3v2::Tag::parse().  Specifically, line 518 of id3v2tag.cpp:

  data = SynchData::decode(data);

This basically calls through to ByteVector::replace(), which ends up spending 90% of its time in ::memcpy().  

Here is the source for replace():

ByteVector &ByteVector::replace(const ByteVector &pattern, const ByteVector &with)
{
  if(pattern.size() == 0 || pattern.size() > size())
    return *this;

  const int patternSize = pattern.size();
  const int withSize = with.size();

  int offset = find(pattern);

  while(offset >= 0) {

    const int originalSize = size();

    if(withSize > patternSize)
      resize(originalSize + withSize - patternSize);

    if(patternSize != withSize)
      ::memcpy(data() + offset + withSize, mid(offset + patternSize).data(), originalSize - offset - patternSize);

    if(withSize < patternSize)
      resize(originalSize + withSize - patternSize);

    ::memcpy(data() + offset, with.data(), withSize);

    offset = find(pattern, offset + withSize);
  }

  return *this;
}

I'm hoping there is a way to optimize this to speed things up.  find() seems quite fast, at least comparatively, so I was looking for a way to eliminate some of the temporaries, specifically the one generated by the call to mid().  However I'm not intimately familiar with this code and its relation to std::vector so I thought perhaps someone else would have more insight.

Stephen




More information about the taglib-devel mailing list