More on Clang

Wed Nov 13 06:31:02 UTC 2013

Hello,

Just writing about my latest clang experimentation before memories fade, no
call to action but suggestions welcome.

Q: First, how can clang help KDevelop?

A: Clang provides a fully correct AST with complete semantic analysis,
which can be used for cross referencing, refactoring, and even code
completion. It could be used to either create a DUChain, or with changes to
KDevelop's interfaces act as a replacement.

Q: Why hasn't this happened already? Clang has had a stable, easy-to-use
API for a long time...

A: AST generation is really slow, and in order to be usable in KDevelop the
ast has to be regenerated constantly, as you type. This is particularly
problematic for code completion and quick-tip refactoring helpers, but
would also represent a regression for highlighting.

Q: What about PCH files?

A: It's a regression to require the user to manage PCH generation to have a
working project in KDevelop. A carefully managed project might be able to
get reasonable speed using PCH, but even PCH loading isn't free, and
currently clang has no support for in-memory PCH. Supporting only projects
that use PCH is a non-starter for KDevelop, in any case.

Q: Once you have an AST, it's not really much more work to save it to
disk.. couldn't KDevelop manage an Ast/PCH cache transparently?

A: This seems like a reasonable starting point, if Clang is going to work
at all. If /something/ could be made to work with this, perhaps some
patches to clang could provide better support for this model.

----------
There are several PCH schemes I've considered, which basically attempt to
make Clang do what KDevelop does now ("If you change the context of an
included file, screw you: undefined behavior"). Note here that simply
caching whole translation units (generally cpp file ASTs) from a project
doesn't help anything, only caching #included files.

Schemes:
1. Make it really easy to create PCH for headers from the UI, and let it be
slow otherwise. (something like "when cursor over #include, hotkey to
gen/cache ast")
2. Autogen PCH for all #included files, recursively (or at least until
project boundaries)
3. Autogen PCH for all #included files, non-recursively.

Problems:
1. Clang doesn't currently support multiple -include-pch arguments, it
seems to silently ignore them. This is obviously a rather glaring issue
affecting all schemes.
2. Clang doesn't provide a (stable) API for just getting preprocessor
results, ie "-E" commands. so for now dependancy management is a matter of
reading stdout from clang. (You can actually make clang spit it out with
clang_parseTranslationUnit using the right args, but then you'd have to
read your own stdout)
3. Not all inclusions are created equal. Even aside from preprocessor
configuration, some files just aren't meant to stand alone, and only make
sense when their contents are #included. Automatically detecting these
cases isn't straightforward, so for schemes 2 + 3, you might need a way to
blacklist. The good news is that this situation should improve when using
clang (vs currently in KDevelop), as you'd be able to choose your "view" of
these files from the perspective of various ASTs which include them.
4. This is all a big hack right now. Currently you have to detect #includes
before ast generation and then generate the AST with the correct
-include-pch (but see also problem #1). You have to do complete dependancy
management from scratch or clang will simply crash when loading ASTs.
----------

In summary: what we really want from clang is a callback for
clang_parseTranslationUnit that gives us the option to swap an AST in for
an #include, ideally with an additional signal that lets us know if the AST
needs an update. For speed it would be good if we can provide an AST
already residing in memory, or even a custom data stream, to avoid clang
hitting the filesystem.
----------

Appendix:

There is also the option of forgetting about PCH and using the clang
provided TU to generate some other kind of cache, which has enough
information to provide functionality needed in a pinch. You'd still have to
update the TU very regularly as the user types, which is quite expensive
for time and power, and then you'd have to implement code completion and
whatever else you want with no help from Clang (and
highlighting/diagnostics would still be slow). I've dismissed this option,
personally.
----------

Anyone have any other ideas? Is there something else in Clang that I should
know about that can be helpful here?

-Olivier JG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20131113/1dd0c448/attachment.html>