Definition-Use chain documentation
Hamish Rodda
rodda at kde.org
Wed Jan 3 14:43:00 UTC 2007
Hi,
As I have less time than usual for programming at the moment, I've taken a
moment to try to document the duchain code a bit better. Hopefully this will
reduce the barrier to entry for others who want to help.
The following documentation is maintained in svn, under
kdevelop/languages/cpp/duchain/Design.txt.
Definition-Use Chain (duchain) design document
==============================================
Hamish Rodda
rodda at kde.org
This document's purpose is to outline the design of the definition-use chain
as currently implemented for the c++ language support in KDevelop 4.x.
Overview
========
The duchain is a sequence of contexts in a code file, and the associated
definitions which occur in those contexts. A simplified way of thinking
about it is that for each set of brackets (curly {} or not ()), there is a
separate context. Each context is represented by a DUContext. Each context
will have one parent context (except in the case of the top level context
which has none), and any number of child contexts (including none).
Additionally, each context can import any number of other contexts. The
reason for this will become clear later. Thus, the DUContext structure
resembles a directed acyclic graph, for those familiar with the concept.
Parsing
=======
These DUContexts are created on the first pass after parsing the code to an
AST (abstract syntax tree), which is accomplished by Roberto's c++ parser.
Also, in this stage the data types are parsed, and any declarations which are
encountered are recorded against the context in which they are encountered
in. Each declaration is represented by a Declaration.
Parsing code is arranged into Builder classes, which subclass the AST visitor
pattern. They are designed to be able to subclass each other, thus achieving
multiple goals with each pass (as described in the above paragraph).
The first pass is accomplished by the ContextBuilder, which is subclassed by
the TypeBuilder, which is subclassed by the DeclarationBuilder. Thus, in the
first pass, the ContextBuilder creates the DUContext tree, the TypeBuilder
records which types are encountered, and the DeclarationBuilder creates
Declaration instances which are associated with the current type and context.
The second pass is accomplished by the ContextBuilder, which is subclassed by
the UseBuilder. On the second pass, we only iterate previously parsed
contexts (as they are already created). Then, as variable uses are
encountered, a Use is created for each. A Declaration is searched for in the
current context, and if one is found, they are associated with each other.
Classes and their purposes
==========================
DUChain - an object global to a language plugin which keeps track of all
loaded source files and the top level context of their definition-use chains.
DUContext - an object which represents a single context in a source file, and
stores information about parent and child DUContexts, and Declarations,
Definitions and Uses which occur in them. Also provides convenience methods
for searching the chain.
Declaration - an object which represents a single declaration. Has several
subclasses (ClassFunctionDeclaration, ClassMemberDeclaration, and
ForwardDeclaration) which store more information specific to the type of
declaration which is being represented. Uses of the Declaration are stored
here for convenience [note: perhaps this should be looked up on request
instead...? may be better performing and much less error prone, eg when
updating duchains].
Definition - an object which represents a definition corresponding to a
Declaration.
Use - an object which represents a use of a particular declaration.
SymbolTable - a hash which stores identifiers available in the top level
context of a source file and their respective Declarations.
*Builder - objects whose purpose is to iterate the parsed AST and produce
instances of the duchain objects.
Type* - a type system. Not particularly advanced at this point.
Definition-use chain searching
==============================
Because iterating a complete definition-use chain can become expensive when
they are large, when a search is being performed (eg. for a declaration
corresponding to a certain identifier) it is first performed up to the top
level context, then the symbol table is consulted. The symbol table is a
hash of all identifiers which are known to the entire duchain. All potential
matches are evaluated to see if they are visible from the location of the
use. This is a considerable performance gain.
Locking
=======
The duchain is designed to operate in a multithreaded environment. This means
that multiple parse jobs may be operating simultaneously, reading from and
writing to the duchain. Thus, locking is required.
A single read-write lock per DUChain (ie. per language plugin) is used to
serialise writes to the chain and allow concurrent reads. Thus, to call
non-const methods, you must hold a write lock, and for const methods, a read
lock. This can be fairly simply accomplished with QReadLocker and
QWriteLocker. Refer to function documentation for exceptions to this general
rule.
Also, when manipulating text editor ranges, the KTextEditor's smart interface
must be locked. See code in ContextBuilder::openContextInternal and
DUChainBase.
Interface for plugins
=====================
As plugins will be accessing the DUChain from the main thread, they will need
to hold a read lock. In order to be notified of changes to the duchain, an
observer interface is offered. See DUChainObserver.
Text editor integration
=======================
The main classes are subclasses of a base class, DUChainBase. This object
holds a reference to the text range. When the source file is opened in an
editor, the CppEditorIntegrator will create smart text ranges, which are
bound to the editor's copy of the document. From there, highlighting can be
applied to these ranges, as well as other advanced functions (see the
KTextEditor documentation for possibilities). The c++ language support will
convert these ranges to smart ranges when the corresponding document is
loaded into an editor.
Generalisation to other languages
=================================
In order to expand the utility of the definition-use chain, it would be
preferable to have a duchain available for each language with a common API.
This is potentially achievable.
In order to do this, the DUContext, Declaration, Definition, and Use classes
need to be evaluated for c++ specific code, and that code separated out into
a subclass of each of those classes. Then, the base classes can be moved to
the KDevelop platform interface code. Other classes may also be able to be
generalised, such as the SymbolTable, and Type* classes. Then, other
languages can subclass the interface classes if they need specific features.
At least then the basic structure will be available to any plugins which are
interested in the information.
An editor integrator will need to be created for each language (to adapt to
the parser), but this should be a trivial task (see the Cpp version for
code).
TODO list
=========
* Incremental parsing needs work. It is designed for it from the start, but
hasn't been implemented on the lower layers (parser in particular needs to
have the code for this written)
* Multithreaded testing
* Chain viewer plugin is broken - the port to an observer design is not
working properly yet
Future features - ideas
=======================
The completed duchain should allow for code refactoring, intelligent
navigation, improved automatic code generation (eg. "create switch
statement"), context-sensitive code completion, integration of documentation,
debugger integration, a code structure view, call graph, static code analysis
etc.
Comments welcome :)
Cheers,
Hamish.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20070104/12d91889/attachment.sig>
More information about the KDevelop-devel
mailing list