Review Request: speed up of rxx_allocator

Fri Mar 4 16:58:31 UTC 2011

> On Feb. 26, 2011, 12:14 p.m., David Nolden wrote:
> > The improvements you report from duchainify are far too huge, allociation has never played such a huge role during parsing. The problem is probably, that during the first (slow) run the duchain is built, then it is stored, and during the next run, it's only updated.
> > 
> > Please make sure that:
> > A) You call duchainify twice for each case, and use the second timing
> > B) You always use the "--force-update-recursive" parameter, which should eliminate the duchain caching issues
> > 
> > This issue also explains why you have observed the 50x difference in the number of calls to "new". From what I understand, you've changed the implementation of "new", so it shouldn't affect that number, or did I misunderstand that?

I reran the test with this script:
=====
#!/bin/bash
echo 'starting the test. first the old version.'
date
~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null
time ~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null

echo 'building the new version.'
cd ~/src/kdevelop/
git apply build/0001-rewrite-rxx_allocator-and-move-includes-from-the-hea.patch
cd build
make install > /dev/null
cd

echo 'new version:'
date
~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null
time ~/src/kdevplatform/build/util/duchainify/duchainify --force-update-recursive ~/src/kdevelop/languages/cpp/ > /dev/null
=====
the new version was about a minut faster(6:14 vs 5:18), i also used valgrind with the iostream including file again(this time running duchainify one time solo first) and here too i could see a very significant improvement in libkdev4cppparser.so . 

the patch does NOT reimplement operator new. it just changes the way rxx_allocator works. rxx_allocator tries to get blocks from a static linked list first(which is tryLocked with a QMutex) if the chain is already locked or is empty it will instead create a new block and add that to it's own linked list of blocks. on destruction of a rxx_allocator, it will lock the static chain of unallocated blocks and add it's own chain to the chain of unallocated blocks. 

- Floris

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/100730/#review1663
-----------------------------------------------------------

On Feb. 24, 2011, 1:47 a.m., Floris Ruijter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/100730/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2011, 1:47 a.m.)
> 
> 
> Review request for KDevelop.
> 
> 
> Summary
> -------
> 
> rxx_allocator was according to my measurements done with kcachegrind, valgrind, duchainify and iostream. The allocator had three basic defects:
> 1) all allocated memory was deallocated whilst we need a lot of rxx_allocators (1 per file i presume?), so these blocks can be reused
> 2) it cleared the memory on a per block basis, but if not all of the block is used, then that is a waste of effort
> 3) it used realloc to manage the list of blocks, this isn't too bad but could cause a move of the list which is totaly unnecessary
> 
> i solved the problems mostly by making the blocks act as linked list nodes: a next pointer + a really long char array. deallocated blocks are kept in a static linked list, whilst actual rxx_allocators have their own(personal some would say)linked list of blocks. access to the deallocated blocks list is synchronized through a static QMutex.
> 
> the access could be threadsafe by using a thread local linked list of deallocated items too, but i don't think that'd be practical, the global static list is probably more effective (eventhough it requires locking) 
> 
> 
> Diffs
> -----
> 
>   languages/cpp/codecompletion/item.cpp b25d1ae 
>   languages/cpp/cppparsejob.cpp f4819f2 
>   languages/cpp/parser/ast.h 0281c6b 
>   languages/cpp/parser/control.h 0b06248 
>   languages/cpp/parser/listnode.h d1eda36 
>   languages/cpp/parser/parser.cpp 281ad8d 
>   languages/cpp/parser/rxx_allocator.h f0159e9 
>   languages/cpp/parser/tests/test_parser.cpp de5f804 
> 
> Diff: http://git.reviewboard.kde.org/r/100730/diff
> 
> 
> Testing
> -------
> 
> as mentioned i ran a file which only included iostream through duchainify which i callgrinded.
> 
>                       old:              new: 
> pool::allocate        ~450 000 000      ~7 000 000
> 
> all time spend in libkdev4cppparser:
>                       ~585 000 000      ~140 000 000
> 
> 
> the pool::allocate numbers are both the 'inclusive' numbers
> 
> looking at the data for the amount of "operator new" calls I can see that the cost per call are pretty much the same but that the old implementation called it about 50x more.
> 
> 
> Thanks,
> 
> Floris
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20110304/8c5e92b2/attachment.html>