KDev-Clang: How should we handle caches of files with different environments?

Milian Wolff mail at milianw.de
Fri Aug 29 22:25:46 UTC 2014


Hey all,

Sergey noticed a big performance slow-down that I introduced along with some 
more bugs with this commit:

commit 5215ff8f78ba19bd5a3b8264b7bbe9449532b03f
Author: Milian Wolff <mail at milianw.de>
Date:   Thu Aug 7 19:10:54 2014 +0200

    Update DUChain data when the environment has changed.
    
    This combines the include paths, defines and pch-path into a
    hash which is stored on-disk and then later compared to the new
    environment. If the hash differs, we trigger a reparse.
    
    To prevent opened files from getting reparsed at startup when
    no data from the project could be obtained, we add some more code
    for this special purpose: We check whether we parsed before
    with a known project and whether the new environment data also
    comes from a project. If not then we rely only on the timestamp
    of the file on whether to trigger a reparse or not. Otherwise, the
    previous data (i.e. with known project) takes precedence.

For some more input, if you didn't read this already, see: 
https://git.reviewboard.kde.org/r/119959/

Now I looked at the remaining issues that Sergey noticed and can confirm them. 
A simple project to reproduce this can be created like this:

~~~~~~~ CMakeLists.txt: ~~~~~~~~~~
cmake_minimum_required(VERSION 2.8.11)
project(test)

add_executable(fileA fileA.cpp)
set_property(TARGET fileA APPEND PROPERTY INCLUDE_DIRECTORIES "/tmp/foo")

add_executable(fileB fileB.cpp)
set_property(TARGET fileB APPEND PROPERTY INCLUDE_DIRECTORIES "/tmp/bar")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And both fileA.cpp and fileB.cpp just need to have something like this:

~~~~~~~~~~~~~~~~~~
#include <iostream>
int foo() {}
~~~~~~~~~~~~~~~~~~

Now run this in a kdev-clang KDevelop session and enable the corresponding 
debug area. Once both files are cached/highlighted, change one of them, wait 
for the update, then the other. With change I mean e.g. add an argument to the 
function or anything like that, but keep the iostream include.

What you'll see is tons of output like this:
environment differs, require update: "/usr/include/c++/4.9.1/iostream" new 
hash: 4014444178 new project known: true old hash: 173631101 old project 
known: true

This is *valid*, since both files have different include paths. This, and the 
per-file defines get inherited by included files (oh C++ modules, where are 
you?). As such, from a compiler perspective, it's correct that the cache needs 
to be updated once the environment has changed.

>From an IDE perspective this is unbearably slow, I agree with Sergey. But what 
should we do about this situation? I have three suggestions so far:

#1 skip update of duchain cache for system includes on environment changes
+ relatively easy to implement thanks to clang_Location_isInSystemHeader
- the cache ping-pong will still happen for non-system-includes though
note: the cache will still be updated when the timestamp of the file changes
note: forced recursive reparses will also still trigger an update
=> I think I'll add this as a first work-around.

#2 combine Sergeys idea with my existing environment checking
Sergey tried to fix the problem by changing the environment that is serialized 
to only reference the include paths that where actually used by a given file.
This breaks the update mechanism though. But what one could do is store two 
hashes, one for the parse job to check whether a clang reparse is required, 
and one to check whether the duchain cache needs an updated.
- a bit more involved to implement
+ should hopefully also guard against the cache ping-pong for non-system-
includes, as long as you don't do funny includes of different files based on 
the include path
- completely ignores the macro defines though which must be handled similarly 
otherwise you can get the same cache ping-pong effects. to also figure out 
what defines of the environment where used, we'd have to iterate over all 
cursors and find macro uses and check that against the ones in the 
environment, which is probably quite costly...

#3 do what oldcpp did
- I still have to understand what exactly it is doing
- I /think/ it's something like #2, but it also creates multiple cache entries 
per file, depending on the environment. this blows up the size of the duchain 
cache and the memory usage etc. pp. I'm not sure we want this

Any other suggestions how to handle this?

-- 
Milian Wolff
mail at milianw.de
http://milianw.de


More information about the KDevelop-devel mailing list