Review Request: speed up of rxx_allocator
Floris Ruijter
flo.ruijt at hotmail.com
Fri Mar 4 16:58:31 UTC 2011
> On Feb. 26, 2011, 12:14 p.m., David Nolden wrote:
> > The improvements you report from duchainify are far too huge, allociation has never played such a huge role during parsing. The problem is probably, that during the first (slow) run the duchain is built, then it is stored, and during the next run, it's only updated.
> >
> > Please make sure that:
> > A) You call duchainify twice for each case, and use the second timing
> > B) You always use the "--force-update-recursive" parameter, which should eliminate the duchain caching issues
> >
> > This issue also explains why you have observed the 50x difference in the number of calls to "new". From what I understand, you've changed the implementation of "new", so it shouldn't affect that number, or did I misunderstand that?
I reran the test with this script:
=====
#!/bin/bash
echo 'starting the test. first the old version.'
date
~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null
time ~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null
echo 'building the new version.'
cd ~/src/kdevelop/
git apply build/0001-rewrite-rxx_allocator-and-move-includes-from-the-hea.patch
cd build
make install > /dev/null
cd
echo 'new version:'
date
~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null
time ~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null
=====
the new version was about a minut faster(6:14 vs 5:18), i also used valgrind with the iostream including file again(this time running duchainify one time solo first) and here too i could see a very significant improvement in libkdev4cppparser.so .
the patch does NOT reimplement operator new. it just changes the way rxx_allocator works. rxx_allocator tries to get blocks from a static linked list first(which is tryLocked with a QMutex) if the chain is already locked or is empty it will instead create a new block and add that to it's own linked list of blocks. on destruction of a rxx_allocator, it will lock the static chain of unallocated blocks and add it's own chain to the chain of unallocated blocks.
- Floris
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/100730/#review1663
-----------------------------------------------------------
On Feb. 24, 2011, 1:47 a.m., Floris Ruijter wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/100730/
> -----------------------------------------------------------
>
> (Updated Feb. 24, 2011, 1:47 a.m.)
>
>
> Review request for KDevelop.
>
>
> Summary
> -------
>
> rxx_allocator was according to my measurements done with kcachegrind, valgrind, duchainify and iostream. The allocator had three basic defects:
> 1) all allocated memory was deallocated whilst we need a lot of rxx_allocators (1 per file i presume?), so these blocks can be reused
> 2) it cleared the memory on a per block basis, but if not all of the block is used, then that is a waste of effort
> 3) it used realloc to manage the list of blocks, this isn't too bad but could cause a move of the list which is totaly unnecessary
>
> i solved the problems mostly by making the blocks act as linked list nodes: a next pointer + a really long char array. deallocated blocks are kept in a static linked list, whilst actual rxx_allocators have their own(personal some would say)linked list of blocks. access to the deallocated blocks list is synchronized through a static QMutex.
>
> the access could be threadsafe by using a thread local linked list of deallocated items too, but i don't think that'd be practical, the global static list is probably more effective (eventhough it requires locking)
>
>
> Diffs
> -----
>
> languages/cpp/codecompletion/item.cpp b25d1ae
> languages/cpp/cppparsejob.cpp f4819f2
> languages/cpp/parser/ast.h 0281c6b
> languages/cpp/parser/control.h 0b06248
> languages/cpp/parser/listnode.h d1eda36
> languages/cpp/parser/parser.cpp 281ad8d
> languages/cpp/parser/rxx_allocator.h f0159e9
> languages/cpp/parser/tests/test_parser.cpp de5f804
>
> Diff: http://git.reviewboard.kde.org/r/100730/diff
>
>
> Testing
> -------
>
> as mentioned i ran a file which only included iostream through duchainify which i callgrinded.
>
> old: new:
> pool::allocate ~450 000 000 ~7 000 000
>
> all time spend in libkdev4cppparser:
> ~585 000 000 ~140 000 000
>
>
> the pool::allocate numbers are both the 'inclusive' numbers
>
> looking at the data for the amount of "operator new" calls I can see that the cost per call are pretty much the same but that the old implementation called it about 50x more.
>
>
> Thanks,
>
> Floris
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20110304/8c5e92b2/attachment.html>
More information about the KDevelop-devel
mailing list