D18551: clang: Create preamble only on second parse

Mon Feb 3 20:37:33 GMT 2020

mwolff added a comment.

  In D18551#604847 <https://phabricator.kde.org/D18551#604847>, @aaronpuchert wrote:

  > To keep this short I didn't reply to the parts I completely agree with.
  >
  > In D18551#604249 <https://phabricator.kde.org/D18551#604249>, @mwolff wrote:
  >
  > > > Now I actually have `/tmp` as tmpfs, and so these 400 MB don't come for free. This is perfectly fine when I'm actually working on these files, but so far I have only opened KDevelop.
  > >
  > > I see. The problem really is that the preamble files shouldn't be written to /tmp then - it would be perfectly fine to move them to a temp dir on the disk I think - it would still be faster than having to redo the work.
  >
  >
  > There are arguments for both in-memory and on-disk, it depends on how much memory there is. I have another more more generously sized machine where the files are perfectly fine in RAM.
  >
  > I think `/tmp` is correct by the FHS <https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s18.html> and `/var/tmp` wouldn't be appropriate <https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s15.html>. I could easily put this on disk if I wanted to. I just wanted to make the argument that resource usage is a concern, and that there is some benefit in limiting ourselves to the preambles that we need.

  Yes, I agree. Let's leave it at `/tmp` and try to reduce the amount of PCHs we put there (see below).

  >>>>>> Also note how we just reparsed the file solely for code completion purposes!
  >>>>> 
  >>>>> You mean the preamble? Since code completion is usually invoked after some initial changes, we have to reparse anyway. What you said would only be true if we invoke code completion without changing anything first, which seems rather weird.
  >>>> 
  >>>> I mean: We trigger a reparse when we open a document, to get hold of the clang TU to do code completion. If this *does not* create a preamble, then the following call to query code completion items - *after* editing - will create the preamble. The preamble only includes the headers of the file, not the contents of the `.cpp`.
  >>> 
  >>> Well, then why don't we just skip the initial reparse, and only reparse after the first edit?
  >> 
  >> Yes, I agree that we should try this out. Note that this will require changes in `clang/codecompletion/model.cpp` at least, potentially also in other places. I.e. right now this code relies on the fact that we trigger a reparse after opening a document. This then allows us to write code like this there:
  >> 
  >>   auto top = DUChainUtils::standardContextForUrl(m_url);
  >>   if (!top) {
  >>       qCWarning(KDEV_CLANG) << "No context found for" << m_url;
  >>       return;
  >>   }
  >>    
  >>   ParseSessionData::Ptr sessionData(ClangIntegration::DUChainUtils::findParseSessionData(top->url(), m_index->translationUnitForUrl(top->url())));
  >>    
  >>   if (!sessionData) {
  >>       // TODO: trigger reparse and re-request code completion
  >>       qCWarning(KDEV_CLANG) << "No parse session / AST attached to context for url" << m_url;
  >>       return;
  >>   }
  >>    
  >> 
  >> If we only attach the AST after the first edit, it won't be enough. We will also have to trigger a reparse and force attaching the AST/parse session data when code completion is explicitly requested via `ctrl + space` without editing.
  > 
  > How bad would it be if we don't do that and only provide KTextEditor-completions before the first edit? I think it's pretty rare that code completion is immediately requested, typically one starts with a line break, or by removing code, before new code is written.

  Very bad. I can still remember how bad it felt to use KDevelop before @brauch fixed some nasty issues that prevented us from using the PCH properly. It always felt like KDevelop was utterly broken since it took so long to give us completion answers. And I believe I also remember that quite a few of our users complained about it, and in turn praised the improvement after @brauch fixed it.

  I really think this is super important to have a snappy code completion behavior.

  >>>> You usually edit the `.cpp` file and then you want to get code completion results immediately.
  >>> 
  >>> We're talking about a one-time delay, all further edits will have a preamble.
  >> 
  >> Yes, but the patch proposed here in its current form just disables the preamble always, no? So I think we agree that this isn't good enough?
  > 
  > It shouldn't, unless I'm misunderstanding our usage of `libclang`. Without the flag <https://clang.llvm.org/doxygen/group__CINDEX__TRANSLATION__UNIT.html#ggab1e4965c1ebe8e41d71e90203a723fe9aa27840716b90a9346019c9f0914f9ab8>, the preamble is created not on the first parse, but on the first reparse of a file. Meaning there is no preamble when I just open a file, but as soon as I edit and the IDE triggers a reparse, the preamble is created and used for subsequent reparsing.

  True, you are right on that point.

  > Let's do some math to clarify this. Let
  > 
  > - //C_all// be the cost for parsing a source without preamble,
  > - //C_source// be the cost for parsing a source with preamble, and
  > - //C_preamble// be the cost for serializing the preamble. Obviously //C_source < C_all//. Then we have:
  > 
  >   | | Cost for opening | Cost after first reparse | Cost after //n//-th reparse | | Old | //C_all + C_preamble// | //C_all + C_source + C_preamble// | //C_all + n · C_source + C_preamble// | | New | //C_all// | //2 · C_all + C_preamble// | //2 · C_all + (n-1) · C_source + C_preamble// | | New - Old | //-C_preamble// | //C_all - C_source// | //C_all - C_source// |
  > 
  >   Now let //m_read// be the number of documents that are only being read and //m_write// the number of documents being read and written. Then the cost difference is //m_write · (C_all - C_source) - m_read · C_preamble//, which is negative iff //m_write / m_read <  C_preamble / (C_all - C_source)//. The right-hand side is a constant that depends on the nature of the project and the hardware one is working on:
  > - //C_all// and //C_source// include reading from disk or page cache and doing some heavy CPU work, the difference between them is the parsing of the header files,
  > - //C_preamble// includes a bit of CPU work and writing to a file (which might be on disk).
  > 
  >   The left-hand side depends on usage patterns. I use KDevelop a lot to read and browse code without doing any changes, so this would very likely benefit me. But I might be alone in that.
  > 
  >   Edit: replaced time in calculation by cost.

  :) I appreciate the thought put into this. And I totally agree on your analysis, and that for your usage pattern the proposal makes a lot of sense. But I and many others write a lot of code with KDevelop. And we tend to work in a completion-guided manner, i.e. I rarely write working code, I write some gibberish and KDevelop's auto completion inserts the right thing for me. If that suddenly takes significantly longer on the first time I start editing in a file, it means that KDevelop will look much less attractive. It may sound strange, but I assure you that this is a real problem. Even today we have many (valid, imo) concerns that code completion isn't fast enough. And this is pretty much the most important feature of KDevelop for me...

  So that's why I once again suggest we only attach the preamble PCH to files we open. This should already remove the cost for `C_preamble` from a lot of files, since the majority of a project won't be opened in the editor. To me this looks like an advantage to all of us, so that would be cool if you could implement that!

  Then, to make your use-case happy, I wouldn't mind adding a "hidden" feature that disables the PCH-on-first-parse for opened documents, based on an env var. That would allow us to test the impact of this feature and get a concrete feeling for it easily.

  Secondly, once that's in, I suggest we work on another feature that I've long wanted to write, but never got around doing: A LRU cache of TUs. I.e. instead of keeping the TU for all opened documents in memory, and thereby also keeping the preamble PCH alive, only do that for a user-defined number of documents based on last usage. I would really like to see this, but we also will have to ensure that navigating quickly through all documents doesn't trigger excessive reparsing to add the TUs (think CTRL + Tab or alt+arrow)... Definitely another interesting area to investigate, but also harder to get right probably.

REPOSITORY
  R32 KDevelop

REVISION DETAIL
  https://phabricator.kde.org/D18551

To: aaronpuchert, #kdevelop, mwolff, brauch, rjvbb
Cc: rjvbb, kdevelop-devel, hmitonneau, christiant, glebaccon, domson, antismap, iodelay, alexeymin, geetamc, Pilzschaf, akshaydeo, surgenight, arrowd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20200203/c30f2093/attachment-0001.html>