Slow MP3 tag parsing/ByteVector::replace() performance

Mathias Panzenböck grosser.meister.morti at gmx.net
Mon Aug 1 13:20:56 UTC 2011


Ah, I forgot a detach() call when I do in place replacement in the case where pattern.size() == 
with.size(). Also I optimized the case if the pattern does not occur in the ByteVector and I fixed 
indentation.

Could you test my patch again? Maybe this improves performance in your test. :)

	-panzi

On 08/01/2011 01:13 PM, Stephen F. Booth wrote:
> I tested the performance of the two patches (not with anything official, just with the command-line time) and got the following results:
>
> Original:			~4. sec
> Yours: 			0.215 sec
> Thomas Post:	0.119 sec (compilation warning about comparison between signed and unsigned)
>
> Although Thomas Post's patch appears to be slightly faster, I think the coding style of yours matches the rest of TagLib a bit better.  It is definitely worth submitting one of the two as a pull request on git.  Perhaps Lukas or Scott prefer one patch over the other?
>
> I found the code for Thomas' patch in http://mail.kde.org/pipermail/taglib-devel/2011-July/002014.html
>
> Stephen
>
> On Jul 31, 2011, at 10:19 PM, Mathias Panzenböck wrote:
>
>> I'm no official TagLib contributor, just submitted one patch that got merged so far. Still:
>>
>> Someone else reported the same issue a while ago and even proposed a patch. However, out of interest (or boredom) I wrote a replacement for the replace function myself. Could you please compare the performance of the two patches and say which one is faster? For me the patch from Thomas Post did produce segmentation faults in my micro benchmarks, so I couldn't test it's performance at all. :/
>>
>> I attached my patch as diff and as cpp source of the changed function. I also pushed the changes here:
>> https://github.com/panzi/taglib
>>
>> 	-panzi
>>
>> On 08/01/2011 01:52 AM, Stephen F. Booth wrote:
>>> I've recently come across an MP3 file (13 MB) that TagLib takes approximately 4 seconds to parse on my machine:
>>>
>>> sbooth$ time ./tagreader slow.mp3
>>> <snip>
>>> real	0m4.312s
>>>
>>> My laptop is fairly quick- 2.3 GHz Core i7 with 8 GB of RAM- so the problem isn't computing power.
>>>
>>> I've done a bit of profiling, and the offending code is ByteVector::replace() by way of ID3v2::Tag::parse().  Specifically, line 518 of id3v2tag.cpp:
>>>
>>>    data = SynchData::decode(data);
>>>
>>> This basically calls through to ByteVector::replace(), which ends up spending 90% of its time in ::memcpy().
>>>
>>> Here is the source for replace():
>>>
>>> ByteVector&ByteVector::replace(const ByteVector&pattern, const ByteVector&with)
>>> {
>>>    if(pattern.size() == 0 || pattern.size()>   size())
>>>      return *this;
>>>
>>>    const int patternSize = pattern.size();
>>>    const int withSize = with.size();
>>>
>>>    int offset = find(pattern);
>>>
>>>    while(offset>= 0) {
>>>
>>>      const int originalSize = size();
>>>
>>>      if(withSize>   patternSize)
>>>        resize(originalSize + withSize - patternSize);
>>>
>>>      if(patternSize != withSize)
>>>        ::memcpy(data() + offset + withSize, mid(offset + patternSize).data(), originalSize - offset - patternSize);
>>>
>>>      if(withSize<   patternSize)
>>>        resize(originalSize + withSize - patternSize);
>>>
>>>      ::memcpy(data() + offset, with.data(), withSize);
>>>
>>>      offset = find(pattern, offset + withSize);
>>>    }
>>>
>>>    return *this;
>>> }
>>>
>>> I'm hoping there is a way to optimize this to speed things up.  find() seems quite fast, at least comparatively, so I was looking for a way to eliminate some of the temporaries, specifically the one generated by the call to mid().  However I'm not intimately familiar with this code and its relation to std::vector so I thought perhaps someone else would have more insight.
>>>
>>> Stephen
>>>
>> <taglib-replace-performance.cpp><taglib-replace-performance.patch>_______________________________________________
>> taglib-devel mailing list
>> taglib-devel at kde.org
>> https://mail.kde.org/mailman/listinfo/taglib-devel
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: taglib-replace-performance.cpp
Type: text/x-c++src
Size: 1588 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/taglib-devel/attachments/20110801/cc7f5f2d/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: taglib-replace-performance.patch
Type: text/x-patch
Size: 2458 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/taglib-devel/attachments/20110801/cc7f5f2d/attachment.patch>


More information about the taglib-devel mailing list