Slow MP3 tag parsing/ByteVector::replace() performance

Stephen F. Booth me at sbooth.org
Mon Aug 1 11:13:40 UTC 2011


I tested the performance of the two patches (not with anything official, just with the command-line time) and got the following results:

Original:			~4. sec
Yours: 			0.215 sec
Thomas Post:	0.119 sec (compilation warning about comparison between signed and unsigned)

Although Thomas Post's patch appears to be slightly faster, I think the coding style of yours matches the rest of TagLib a bit better.  It is definitely worth submitting one of the two as a pull request on git.  Perhaps Lukas or Scott prefer one patch over the other?

I found the code for Thomas' patch in http://mail.kde.org/pipermail/taglib-devel/2011-July/002014.html

Stephen

On Jul 31, 2011, at 10:19 PM, Mathias Panzenböck wrote:

> I'm no official TagLib contributor, just submitted one patch that got merged so far. Still:
> 
> Someone else reported the same issue a while ago and even proposed a patch. However, out of interest (or boredom) I wrote a replacement for the replace function myself. Could you please compare the performance of the two patches and say which one is faster? For me the patch from Thomas Post did produce segmentation faults in my micro benchmarks, so I couldn't test it's performance at all. :/
> 
> I attached my patch as diff and as cpp source of the changed function. I also pushed the changes here:
> https://github.com/panzi/taglib
> 
> 	-panzi
> 
> On 08/01/2011 01:52 AM, Stephen F. Booth wrote:
>> I've recently come across an MP3 file (13 MB) that TagLib takes approximately 4 seconds to parse on my machine:
>> 
>> sbooth$ time ./tagreader slow.mp3
>> <snip>
>> real	0m4.312s
>> 
>> My laptop is fairly quick- 2.3 GHz Core i7 with 8 GB of RAM- so the problem isn't computing power.
>> 
>> I've done a bit of profiling, and the offending code is ByteVector::replace() by way of ID3v2::Tag::parse().  Specifically, line 518 of id3v2tag.cpp:
>> 
>>   data = SynchData::decode(data);
>> 
>> This basically calls through to ByteVector::replace(), which ends up spending 90% of its time in ::memcpy().
>> 
>> Here is the source for replace():
>> 
>> ByteVector&ByteVector::replace(const ByteVector&pattern, const ByteVector&with)
>> {
>>   if(pattern.size() == 0 || pattern.size()>  size())
>>     return *this;
>> 
>>   const int patternSize = pattern.size();
>>   const int withSize = with.size();
>> 
>>   int offset = find(pattern);
>> 
>>   while(offset>= 0) {
>> 
>>     const int originalSize = size();
>> 
>>     if(withSize>  patternSize)
>>       resize(originalSize + withSize - patternSize);
>> 
>>     if(patternSize != withSize)
>>       ::memcpy(data() + offset + withSize, mid(offset + patternSize).data(), originalSize - offset - patternSize);
>> 
>>     if(withSize<  patternSize)
>>       resize(originalSize + withSize - patternSize);
>> 
>>     ::memcpy(data() + offset, with.data(), withSize);
>> 
>>     offset = find(pattern, offset + withSize);
>>   }
>> 
>>   return *this;
>> }
>> 
>> I'm hoping there is a way to optimize this to speed things up.  find() seems quite fast, at least comparatively, so I was looking for a way to eliminate some of the temporaries, specifically the one generated by the call to mid().  However I'm not intimately familiar with this code and its relation to std::vector so I thought perhaps someone else would have more insight.
>> 
>> Stephen
>> 
> <taglib-replace-performance.cpp><taglib-replace-performance.patch>_______________________________________________
> taglib-devel mailing list
> taglib-devel at kde.org
> https://mail.kde.org/mailman/listinfo/taglib-devel



More information about the taglib-devel mailing list