KDev-Clang: How should we handle caches of files with different environments?
Milian Wolff
mail at milianw.de
Fri Aug 29 22:25:46 UTC 2014
Hey all,
Sergey noticed a big performance slow-down that I introduced along with some
more bugs with this commit:
commit 5215ff8f78ba19bd5a3b8264b7bbe9449532b03f
Author: Milian Wolff <mail at milianw.de>
Date: Thu Aug 7 19:10:54 2014 +0200
Update DUChain data when the environment has changed.
This combines the include paths, defines and pch-path into a
hash which is stored on-disk and then later compared to the new
environment. If the hash differs, we trigger a reparse.
To prevent opened files from getting reparsed at startup when
no data from the project could be obtained, we add some more code
for this special purpose: We check whether we parsed before
with a known project and whether the new environment data also
comes from a project. If not then we rely only on the timestamp
of the file on whether to trigger a reparse or not. Otherwise, the
previous data (i.e. with known project) takes precedence.
For some more input, if you didn't read this already, see:
https://git.reviewboard.kde.org/r/119959/
Now I looked at the remaining issues that Sergey noticed and can confirm them.
A simple project to reproduce this can be created like this:
~~~~~~~ CMakeLists.txt: ~~~~~~~~~~
cmake_minimum_required(VERSION 2.8.11)
project(test)
add_executable(fileA fileA.cpp)
set_property(TARGET fileA APPEND PROPERTY INCLUDE_DIRECTORIES "/tmp/foo")
add_executable(fileB fileB.cpp)
set_property(TARGET fileB APPEND PROPERTY INCLUDE_DIRECTORIES "/tmp/bar")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
And both fileA.cpp and fileB.cpp just need to have something like this:
~~~~~~~~~~~~~~~~~~
#include <iostream>
int foo() {}
~~~~~~~~~~~~~~~~~~
Now run this in a kdev-clang KDevelop session and enable the corresponding
debug area. Once both files are cached/highlighted, change one of them, wait
for the update, then the other. With change I mean e.g. add an argument to the
function or anything like that, but keep the iostream include.
What you'll see is tons of output like this:
environment differs, require update: "/usr/include/c++/4.9.1/iostream" new
hash: 4014444178 new project known: true old hash: 173631101 old project
known: true
This is *valid*, since both files have different include paths. This, and the
per-file defines get inherited by included files (oh C++ modules, where are
you?). As such, from a compiler perspective, it's correct that the cache needs
to be updated once the environment has changed.
>From an IDE perspective this is unbearably slow, I agree with Sergey. But what
should we do about this situation? I have three suggestions so far:
#1 skip update of duchain cache for system includes on environment changes
+ relatively easy to implement thanks to clang_Location_isInSystemHeader
- the cache ping-pong will still happen for non-system-includes though
note: the cache will still be updated when the timestamp of the file changes
note: forced recursive reparses will also still trigger an update
=> I think I'll add this as a first work-around.
#2 combine Sergeys idea with my existing environment checking
Sergey tried to fix the problem by changing the environment that is serialized
to only reference the include paths that where actually used by a given file.
This breaks the update mechanism though. But what one could do is store two
hashes, one for the parse job to check whether a clang reparse is required,
and one to check whether the duchain cache needs an updated.
- a bit more involved to implement
+ should hopefully also guard against the cache ping-pong for non-system-
includes, as long as you don't do funny includes of different files based on
the include path
- completely ignores the macro defines though which must be handled similarly
otherwise you can get the same cache ping-pong effects. to also figure out
what defines of the environment where used, we'd have to iterate over all
cursors and find macro uses and check that against the ones in the
environment, which is probably quite costly...
#3 do what oldcpp did
- I still have to understand what exactly it is doing
- I /think/ it's something like #2, but it also creates multiple cache entries
per file, depending on the environment. this blows up the size of the duchain
cache and the memory usage etc. pp. I'm not sure we want this
Any other suggestions how to handle this?
--
Milian Wolff
mail at milianw.de
http://milianw.de
More information about the KDevelop-devel
mailing list