KDev-Clang: How should we handle caches of files with different environments?

Olivier J. G. olivier.jg at gmail.com
Sat Aug 30 15:51:37 UTC 2014


I have a thought, maybe completely off-base since I haven't actually worked
on this (hard!) problem.

I think there are two issues masquerading as one, and it's easier to see if
you think in terms of "enviroments" rather than "contexts":
1. This environment is /outdated/ (user added a pch, added new defines,
etc). What we want is clear: we want all TopDUContexts using it to be
rebuilt
2. This environment is /different/ from another environment, both
environments may be valid at the same time for the same TopDUContext. The
solution is more challenging.

I get the feeling that your "Update DUChain data when the environment has
changed" was targeting #1, and then we stumbled into #2.

Now, two assumptions:
1. We don't want ping-pong: Given context X, included in context Y and
created with env A, X should never be recreated while A (or at least Y?)
remains valid.
2. We don't want multiple context bloat: When context Z later includes X
with env B, X remains unchanged.

Given these assumptions, now two complementary solutions:
1. We need to be able to identify a stale environment and rebuild dependant
contexts. Can IADM help here? Not sure how hard this would be.
2. At some point in the future, I'd like to be able to click on a #include
and select "rebuild from this environment". This would be my answer to
problem and assumption #2.

For the time being, we at least shouldn't be worse off if we could just get
#1... but is that a realistic solution or is it deceptively complicated (or
wrong)?

Thoughts?

-Olivier JG



On Sat, Aug 30, 2014 at 12:25 AM, Milian Wolff <mail at milianw.de> wrote:

> Hey all,
>
> Sergey noticed a big performance slow-down that I introduced along with
> some
> more bugs with this commit:
>
> commit 5215ff8f78ba19bd5a3b8264b7bbe9449532b03f
> Author: Milian Wolff <mail at milianw.de>
> Date:   Thu Aug 7 19:10:54 2014 +0200
>
>     Update DUChain data when the environment has changed.
>
>     This combines the include paths, defines and pch-path into a
>     hash which is stored on-disk and then later compared to the new
>     environment. If the hash differs, we trigger a reparse.
>
>     To prevent opened files from getting reparsed at startup when
>     no data from the project could be obtained, we add some more code
>     for this special purpose: We check whether we parsed before
>     with a known project and whether the new environment data also
>     comes from a project. If not then we rely only on the timestamp
>     of the file on whether to trigger a reparse or not. Otherwise, the
>     previous data (i.e. with known project) takes precedence.
>
> For some more input, if you didn't read this already, see:
> https://git.reviewboard.kde.org/r/119959/
>
> Now I looked at the remaining issues that Sergey noticed and can confirm
> them.
> A simple project to reproduce this can be created like this:
>
> ~~~~~~~ CMakeLists.txt: ~~~~~~~~~~
> cmake_minimum_required(VERSION 2.8.11)
> project(test)
>
> add_executable(fileA fileA.cpp)
> set_property(TARGET fileA APPEND PROPERTY INCLUDE_DIRECTORIES "/tmp/foo")
>
> add_executable(fileB fileB.cpp)
> set_property(TARGET fileB APPEND PROPERTY INCLUDE_DIRECTORIES "/tmp/bar")
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> And both fileA.cpp and fileB.cpp just need to have something like this:
>
> ~~~~~~~~~~~~~~~~~~
> #include <iostream>
> int foo() {}
> ~~~~~~~~~~~~~~~~~~
>
> Now run this in a kdev-clang KDevelop session and enable the corresponding
> debug area. Once both files are cached/highlighted, change one of them,
> wait
> for the update, then the other. With change I mean e.g. add an argument to
> the
> function or anything like that, but keep the iostream include.
>
> What you'll see is tons of output like this:
> environment differs, require update: "/usr/include/c++/4.9.1/iostream" new
> hash: 4014444178 new project known: true old hash: 173631101 old project
> known: true
>
> This is *valid*, since both files have different include paths. This, and
> the
> per-file defines get inherited by included files (oh C++ modules, where are
> you?). As such, from a compiler perspective, it's correct that the cache
> needs
> to be updated once the environment has changed.
>
> From an IDE perspective this is unbearably slow, I agree with Sergey. But
> what
> should we do about this situation? I have three suggestions so far:
>
> #1 skip update of duchain cache for system includes on environment changes
> + relatively easy to implement thanks to clang_Location_isInSystemHeader
> - the cache ping-pong will still happen for non-system-includes though
> note: the cache will still be updated when the timestamp of the file
> changes
> note: forced recursive reparses will also still trigger an update
> => I think I'll add this as a first work-around.
>
> #2 combine Sergeys idea with my existing environment checking
> Sergey tried to fix the problem by changing the environment that is
> serialized
> to only reference the include paths that where actually used by a given
> file.
> This breaks the update mechanism though. But what one could do is store two
> hashes, one for the parse job to check whether a clang reparse is required,
> and one to check whether the duchain cache needs an updated.
> - a bit more involved to implement
> + should hopefully also guard against the cache ping-pong for non-system-
> includes, as long as you don't do funny includes of different files based
> on
> the include path
> - completely ignores the macro defines though which must be handled
> similarly
> otherwise you can get the same cache ping-pong effects. to also figure out
> what defines of the environment where used, we'd have to iterate over all
> cursors and find macro uses and check that against the ones in the
> environment, which is probably quite costly...
>
> #3 do what oldcpp did
> - I still have to understand what exactly it is doing
> - I /think/ it's something like #2, but it also creates multiple cache
> entries
> per file, depending on the environment. this blows up the size of the
> duchain
> cache and the memory usage etc. pp. I'm not sure we want this
>
> Any other suggestions how to handle this?
>
> --
> Milian Wolff
> mail at milianw.de
> http://milianw.de
> _______________________________________________
> KDevelop-devel mailing list
> KDevelop-devel at kde.org
> https://mail.kde.org/mailman/listinfo/kdevelop-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20140830/ffe854bc/attachment.html>


More information about the KDevelop-devel mailing list