KDevelop-PG-Location-Table::positionAt updated with new combined relative/binary algorithm

Tue Oct 27 07:36:49 UTC 2009

On 27.10.09 01:10:41, Milian Wolff wrote:
> On Monday 26 October 2009 21:00:44 Andreas Pakulat wrote:
> > On 26.10.09 17:37:31, Milian Wolff wrote:
> > > On Monday, 26. October 2009 17:07:29 Andreas Pakulat wrote:
> > > > On 26.10.09 17:13:09, Milian Wolff wrote:
> > > > > Hey guys,
> > > > >
> > > > > just wanted to notify you that I updated the algorithm that is used
> > > > > in the KDevelop-PG-Qt LocationTable::positionAt method. The old
> > > > > linear algorithm was of course pretty bad for huge files (i.e. our
> > > > > phpfunctions.php with its 80k lines).
> > > >
> > > > Nice and btw, of course the linear search algorithm was simply
> > > > implemented because it was easiest and "fast enough" for the few
> > > > dummy-usecases we(I) had back then :)
> > >
> > > True, but why did you not use an existing container class?
> > 
> > Ssooooo, I've just checked the original code in kdev-pg and where it
> > "came from". Apparently Jakob "copied" it from the location table of C++
> > support.
> > 
> > If I understand that code correctly it works similar to what you've
> > implemented now. It first tries the 5 "closest" lines to the last offset
> > to check wether they contain the searched-for offset. If that fails it
> > uses QMap::lowerBound which is apparently fast enough. So maybe we
> > should re-use that idea and convert to a real map (unless you did so
> > already) to allow re-using the lowerBount(). I think the "close to
> > last-used offset" check is still worth it, as it vastly increases linear
> > access, which usually happens a lot during initial parsing (IIRC)...
> 
> Yes, I'll add a benchmark if I get to it. For now I'll use std::lower_bound as 
> suggested by Roberto. And the code you talked about was cpp/parser/rpp/pp-
> location.cpp, right?

Right.

> Roberto also told me on my blog that using a cache member var is a bad idea 
> since it would make the function non-reentrant. But isn't it non-reentrant 
> nontheless? I mean the lines array could be resized/reallocated when newline() 
> is called from a different thread? That would make things much worse than a 
> simple cache member variable?

In theory: Yes, in reality no, because the newline splitting is done
during the lexing phase - usually. In that phase you don't need position
information from the tokens. Hence the lines-member doesn't change
anymore once you're calculating positions. 

> But would using a QMap prevent from that? Oh and btw.: We don't even need a 
> QMap, the current code would perfectly fit into either 
> QList/QVector/QLinkedList - since David said QVector is the fastest, I would 
> use that one.

Same as above, as long as its guaranteed that the splitting-process is
done before fetching any positions there's no problem. A QMap wouldn't
change anything on that. Hmm, I wonder how the c++-locationtable is made
thread-safe as it is not right now..

Andreas

-- 
You may be infinitely smaller than some things, but you're infinitely
larger than others.