Slow MP3 tag parsing/ByteVector::replace() performance
Stephen F. Booth
me at sbooth.org
Mon Aug 1 01:52:05 CEST 2011
I've recently come across an MP3 file (13 MB) that TagLib takes approximately 4 seconds to parse on my machine:
sbooth$ time ./tagreader slow.mp3
<snip>
real 0m4.312s
My laptop is fairly quick- 2.3 GHz Core i7 with 8 GB of RAM- so the problem isn't computing power.
I've done a bit of profiling, and the offending code is ByteVector::replace() by way of ID3v2::Tag::parse(). Specifically, line 518 of id3v2tag.cpp:
data = SynchData::decode(data);
This basically calls through to ByteVector::replace(), which ends up spending 90% of its time in ::memcpy().
Here is the source for replace():
ByteVector &ByteVector::replace(const ByteVector &pattern, const ByteVector &with)
{
if(pattern.size() == 0 || pattern.size() > size())
return *this;
const int patternSize = pattern.size();
const int withSize = with.size();
int offset = find(pattern);
while(offset >= 0) {
const int originalSize = size();
if(withSize > patternSize)
resize(originalSize + withSize - patternSize);
if(patternSize != withSize)
::memcpy(data() + offset + withSize, mid(offset + patternSize).data(), originalSize - offset - patternSize);
if(withSize < patternSize)
resize(originalSize + withSize - patternSize);
::memcpy(data() + offset, with.data(), withSize);
offset = find(pattern, offset + withSize);
}
return *this;
}
I'm hoping there is a way to optimize this to speed things up. find() seems quite fast, at least comparatively, so I was looking for a way to eliminate some of the temporaries, specifically the one generated by the call to mid(). However I'm not intimately familiar with this code and its relation to std::vector so I thought perhaps someone else would have more insight.
Stephen
More information about the taglib-devel
mailing list