Fwd: Clang

Olivier J. G. olivier.jg at gmail.com
Thu May 9 15:28:34 UTC 2013


And again I reply only to the last poster, instead of the kdevelop-devel
list. Original message follows...

---------- Forwarded message ----------
From: Olivier J. G. <olivier.jg at gmail.com>
Date: Thu, May 9, 2013 at 5:27 PM
Subject: Re: Clang
To: Milian Wolff <mail at milianw.de>


I didn't spend much time getting stuff working well once it could
apparently be done, as I was focused on finding if it was a reasonable
thing to do. That said, it can build declarations, contexts and uses.

Using libclang would make sense if you just want access to the AST, I
mainly didn't use it because I assumed I'd have more control over the
parsing process using the C++ API.

Using the current C++ support is, as Milian mentioned, not going to help.
Making the parser support C++11 is fairly easy (and it mostly does). Our
weakness is currently in the semantic analysis, especially support for
templates. The big win with Clang is reusing the semantic analysis. My
thinking here is especially to not go the TemplateDeclaration route, and
instead put all the template information in the type system, i.e.
UnsureType (I'd be interested to hear from David about his take on this
though). This allows us to better support writing code which uses template
parameters, at the very least.

It's important to note that the AST generated by clang is fully 'annotated'
with everything you'd ever need to know to make a duchain. It's trivial to
go from a Clang AST to a duchain structure. After that, it's only a matter
of subclassing the duchain classes to provide more of the information
already collected in the Clang AST.

Our problem is that Clang creates ASTs for translation units, and we
want/need ASTs for files (lets ignore the preprocessor elephant in the
room, for now).
Currently, using Clang, when a cpp file is edited, you have to regenerate
the entire translation unit for that file, which could involve reparsing a
lot of headers. This is why you often hear about pch files when people talk
about clang-complete plugins. We have the TopDUContext, meant to represent
a file, which is the solution to clang being unusable for
update-as-you-type situations. In order to use that though, we need Clang
to be able to operate on individual files, and not expand includes.

Now, to not ignore the preprocessor elephant, this is not an easy problem,
and may be as hard as the holy grail of C++ compilers: incremental
compilation. KDevelop solves this problem with the ParsingEnvironment,
which I'm not familiar enough with to say anything about. In short, the
same C++ file could be represented by any number of different DUChain
structures depending on the state of the preprocessor by the time it is
reached. From an IDE perspective, the only way to represent this concept is
to allow the end-user to select the current preprocessor state of a file
from within the IDE.

Obviously Clang cannot give us its perfect "AST" with everything known, and
all types and functions resolved and cross-referenced, while only working
with a single file. The solution to this from a Clang perspective might be
to have it create lower-level, per-file ASTs, each of which would have the
information: "given preprocessor state A and file B, the AST for this file
is X and the outgoing preprocessor state is Y". From there you could
hopefully generate semantic information for any subset of the entire TU
much more cheaply, on demand. Possibly I need to be hit with a clue bat,
and possibly this is incremental parsing, but in any case I can't imagine
that this wouldn't mean rewriting much of Clang.

Here's the real fun though... I just reinvented the DUChain and put it in
Clang. All those ASTs need to be cached, otherwise they're completely
useless, and basically all that KDevelop would do is copy the information
into the language agnostic DUChain, which would cache it again. The reason
for this apparent paradox is that creating a DUChain requires semantic
analysis, and the DUChain expects to shoulder the cost of keeping that
analysis in memory. Clang provides an interesting opportunity here to take
advantage of some incredibly complicated semantic analysis done for free,
where most language plugins have to do all that themselves starting with
just a basic AST (not in the Clang sense). The problem is that the semantic
analysis has to feed into itself. Once I've duchainified foo.h, I use that
information to duchainify foo.cpp. Therefore the semantic analyser must be
the cacher as well.

So, to summarize, there are two ways to use Clang:
1. Get into the C++ API, use it as a preprocessor/parser, and based on that
do semantic analysis and build the duchain
2. Write some manner of reasonable caching for Clang's completed "AST",
copy into duchain

I hope someone has identified a flaw in my analysis here, and hits me with
a clue bat, because this isn't very inspiring.

-Olivier JG

PS: What I mentioned is not the only way to cache Clang ASTs, there was a
discussion/proposal on the Clang ML a while back about a "Clang Daemon",
which would be "good enough" for our purposes. Unfortunately, as far as I
know, no lines of code have been written.

Footnote: The "semantic analysis" I keep going on about is knowing what
uses refer to what declarations.


On Sun, May 5, 2013 at 4:14 PM, Milian Wolff <mail at milianw.de> wrote:

> On Sunday 05 May 2013 21:52:59 Alexandre Courbot wrote:
> > On Sun, May 5, 2013 at 8:36 PM, Aleix Pol <aleixpol at kde.org> wrote:
> > > I'm unsure if it doesn't make sense, but has it been considered forking
> > > the
> > > current c++ implementation and just replacing the current parser (and
> even
> > > the AST) for Clang's?
> > >
> > > Maybe it's easier if we do a 1:1 port...
> >
> > That's what I did in my own fork. Actually I just replaced the
> > parser/AST part with CLang's (in a CLangParseJob class) and build the
> > DUChain from there, using the helper functions available in
> > cpplanguagesupport. That seems to work and I agree it's probably the
> > shortest path to CLang parsing.
>
> Parsing alone is not helpful. The most brittle part of our C++ support, and
> also the part that requires the most work, is the semantic analysis. So we
> really want to use the annotated/analyzed AST that clang provides and throw
> away as much of our code as possible (long term).
>
> Anyhow, I still didn't have time to look into it. All I know is from
> speaking
> with Olivier about it. I'll definitely look into this eventually though
> after
> this semester is finally done...
>
> Cheers
> --
> Milian Wolff
> mail at milianw.de
> http://milianw.de
>
> _______________________________________________
> KDevelop-devel mailing list
> KDevelop-devel at kde.org
> https://mail.kde.org/mailman/listinfo/kdevelop-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20130509/80ed64d0/attachment.html>


More information about the KDevelop-devel mailing list