KDev-Clang: How should we handle caches of files with different environments?

Sun Aug 31 16:21:10 UTC 2014

On Saturday 30 August 2014 17:51:37 Olivier J. G. wrote:
> I have a thought, maybe completely off-base since I haven't actually worked
> on this (hard!) problem.

Interesting input, certainly not off-base.

> I think there are two issues masquerading as one, and it's easier to see if
> you think in terms of "enviroments" rather than "contexts":
> 1. This environment is /outdated/ (user added a pch, added new defines,
> etc). What we want is clear: we want all TopDUContexts using it to be
> rebuilt
> 2. This environment is /different/ from another environment, both
> environments may be valid at the same time for the same TopDUContext. The
> solution is more challenging.
> 
> I get the feeling that your "Update DUChain data when the environment has
> changed" was targeting #1, and then we stumbled into #2.

True, this is exactly what happened.

> Now, two assumptions:
> 1. We don't want ping-pong: Given context X, included in context Y and
> created with env A, X should never be recreated while A (or at least Y?)
> remains valid.
> 2. We don't want multiple context bloat: When context Z later includes X
> with env B, X remains unchanged.

I agree.

> Given these assumptions, now two complementary solutions:
> 1. We need to be able to identify a stale environment and rebuild dependant
> contexts. Can IADM help here? Not sure how hard this would be.

Not really. But when the user adds custom include paths or defines, IADM will 
trigger a reparse with the ForceRecursive flag set (afaik). So we'll update 
everything anyways.

Generally, I think the best option for now would be if we could reliably mark 
non-project files as system-includes and never update them then. I'll update 
my patch and try to get that done. This should workaround the biggest 
performance issues.

> 2. At some point in the future, I'd like to be able to click on a #include
> and select "rebuild from this environment". This would be my answer to
> problem and assumption #2.

Yeah that could be done eventually but won't be really user friendly I guess 
;-) Still, something to keep in mind.

> For the time being, we at least shouldn't be worse off if we could just get
> #1... but is that a realistic solution or is it deceptively complicated (or
> wrong)?

It's a complicated issue, really. But maybe we could do something like the 
following:

a) only check the environment hash in the ParseJob, but not in clanghelpers.
b) iterate over all imports, like we currently do, and ensure *some* cached 
data is available and it is up-to-date in respect to the file timestamp.
c) when we look for the declaration at a given position, and thanks to clang 
we now there _must_ be one there in the cache, we do the following:
- if cache contains a declaration at the given position, return that
- if nothing is found in the cache, re-cache the data for the file we tried to 
find (and ensure we only do this at most once, and did not already do it in 
b))
- check again for a declaration at the given position, and return that or 
nullptr and probably emit a developer warning that the cache is not properly 
filled

a) would solve the issue #1 you mentioned. 
b) is what we had before, which was fast and worked well enough mostly
c) ensures that the cache contains all declarations required for proper code 
navigation etc. pp. we could /still/ end up in cache ping-pong scenarios then 
for headers which include something like this

#ifdef FOO
int foo = 1;
#else
int foo = 2;
#endif

and these headers are included with either FOO defined or not. But I hope this 
actually happens rarely. If it's really a big issue we could even think about 
not re-caching the full file but instead just add a declaration for the cursor 
we try to find. This could actually also work quite reliably:

#ifdef FOO
class asdf
{
  void bar();
};
#else
class asdf
{
  void bar();
};
#endif

I first assumed that it would be extremly hard to ensure a valid cache when 
asdf::bar() is requested, as we must ensure the asdf context is created etc. 
and we cannot just just a bar class member in the global top context. But 
thanks to C++ we /always/ first have to use asdf *somewhere* before we could 
ever use bar(). Thus we'd first add the declaration for the asdf cursor to the 
cache, and recurse and create a cache entry for bar as well.

Thus, once this header is included in two files, one with FOO set and one 
without FOO set, we'd actually get /both/ versions in the cache - which would 
be extremely nice, no?

What do you think?
-- 
Milian Wolff
mail at milianw.de
http://milianw.de