KDevelop-PG-Location-Table::positionAt updated with new combined relative/binary algorithm

Milian Wolff mail at milianw.de
Tue Oct 27 20:36:05 UTC 2009


On Tuesday, 27. October 2009 08:36:49 Andreas Pakulat wrote:
> On 27.10.09 01:10:41, Milian Wolff wrote:
> > On Monday 26 October 2009 21:00:44 Andreas Pakulat wrote:
> > > On 26.10.09 17:37:31, Milian Wolff wrote:
> > > > On Monday, 26. October 2009 17:07:29 Andreas Pakulat wrote:
> > > > > On 26.10.09 17:13:09, Milian Wolff wrote:
> > > > > > Hey guys,
> > > > > >
> > > > > > just wanted to notify you that I updated the algorithm that is
> > > > > > used in the KDevelop-PG-Qt LocationTable::positionAt method. The
> > > > > > old linear algorithm was of course pretty bad for huge files
> > > > > > (i.e. our phpfunctions.php with its 80k lines).
> > > > >
> > > > > Nice and btw, of course the linear search algorithm was simply
> > > > > implemented because it was easiest and "fast enough" for the few
> > > > > dummy-usecases we(I) had back then :)
> > > >
> > > > True, but why did you not use an existing container class?
> > >
> > > Ssooooo, I've just checked the original code in kdev-pg and where it
> > > "came from". Apparently Jakob "copied" it from the location table of
> > > C++ support.
> > >
> > > If I understand that code correctly it works similar to what you've
> > > implemented now. It first tries the 5 "closest" lines to the last
> > > offset to check wether they contain the searched-for offset. If that
> > > fails it uses QMap::lowerBound which is apparently fast enough. So
> > > maybe we should re-use that idea and convert to a real map (unless you
> > > did so already) to allow re-using the lowerBount(). I think the "close
> > > to last-used offset" check is still worth it, as it vastly increases
> > > linear access, which usually happens a lot during initial parsing
> > > (IIRC)...
> >
> > Yes, I'll add a benchmark if I get to it. For now I'll use
> > std::lower_bound as suggested by Roberto. And the code you talked about
> > was cpp/parser/rpp/pp- location.cpp, right?
> 
> Right.
> 
> > Roberto also told me on my blog that using a cache member var is a bad
> > idea since it would make the function non-reentrant. But isn't it
> > non-reentrant nontheless? I mean the lines array could be
> > resized/reallocated when newline() is called from a different thread?
> > That would make things much worse than a simple cache member variable?
> 
> In theory: Yes, in reality no, because the newline splitting is done
> during the lexing phase - usually. In that phase you don't need position
> information from the tokens. Hence the lines-member doesn't change
> anymore once you're calculating positions.
> 
> > But would using a QMap prevent from that? Oh and btw.: We don't even need
> > a QMap, the current code would perfectly fit into either
> > QList/QVector/QLinkedList - since David said QVector is the fastest, I
> > would use that one.
> 
> Same as above, as long as its guaranteed that the splitting-process is
> done before fetching any positions there's no problem. A QMap wouldn't
> change anything on that. Hmm, I wonder how the c++-locationtable is made
> thread-safe as it is not right now..

Roberto answered my questions via mail:

He suggests providing an overloaded (additional)

positionAt(offset, *line, *column, lastLine);

that applies the optimization. Of course you'll have to set lastLine in the 
visitor.

But I mean do we really have _multiple_ visitors on the _same_ file?

I mean in PHP we don't do that right now. Are there usecases for it? I mean 
you cannot really make any of the Visitors multithreaded or make them run in 
parallel as declarations on top of the file have to be available when parsing 
the bottom of the file...

Or would this be requried/nice to have for refactoring?

I'm still somewhat hesitant to jump through hoops just for theoretical 
purposes without any practical usecases...
-- 
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20091027/7d001427/attachment.sig>


More information about the KDevelop-devel mailing list