Review Request: Add support for "language features"

Aleix Pol Gonzalez aleixpol at gmail.com
Mon Jul 16 10:37:26 UTC 2012



> On July 4, 2012, 8:02 a.m., Andreas Pakulat wrote:
> > Hmm, what does the stringlist contain? I don't like string's to transport type-like information and features of a language do rather sound like a flags-type. I'm also wondering what this thing does that needs it in the buildsystem manager and not in the language support itself? An implementation for both sides would probably help understanding the intent.
> > 
> > Oh and even if its decided to go into the project interface I'd say it should be in the base interface, not the buildsystem one so project managers which have no buildsystem support can still supply this feature.
> 
> Aleix Pol Gonzalez wrote:
>     I think he needs it to tell the (c/c++) language support whether it should be analyzed with a C or a C++ parser. The build system support can tell which one should be used.
>     
>     On the other hand, I agree that string is too generic.
> 
> Andreas Pakulat wrote:
>     Hmm, what information does the buildsystem have that the c++ support does not? I mean currently cmake just sees a bunch of C or C++ files. So this would require some special buildsystem plugin to work in the first place. I dislike that very very much, its the wrong approach IMHO. The parser mode for the C++ parser is completely orthogonal to the buildsystem being used. So the parsing mode shouldn't be something decided based on the buildsystem, but rather based on the actual file.
>     
>     Can C vs. C++ be determined for a file automatically? If not, I'd say a project config page supplied by the cpp-plugin that lets you choose between C and C++ parsing mode is completely sufficient. If there is sufficient demand for having more finegrained control over the type of parsing per folder or file, then the config page needs to get some support for wildcard matching or the like.
>     
>     If there is indeed a way to find this out for most of the files, then the c++ plugin should do exactly that.
> 
> Aleix Pol Gonzalez wrote:
>     well, in the system side, it's cmake who decides if gcc or g++ should be used, no? So it would make sense if we did likewise.
>     
>     Am I missing anything?
> 
> Alexandre Courbot wrote:
>     As Aleix pointed out, the build manager is ultimately the only entity who knows how a file should be compiled, by which compiler, and with which options. Thus it is the only one who can tell with precision whether a .h file should be interpreted as C or C++, whether a .c file should be parsed as C99 or ANSI C, etc. Language support is not enough.
>     
>     As for passing this information as a string, this is because language features are a contract between build manager and language support. The information passed is of type "this file is C99, so don't use the C++ features here". Since we are working at the platform level, concepts like C99 do not exist and thus flags would not be suitable either.
> 
> Andreas Pakulat wrote:
>     While the buildsystem may be able to know the compiler switches, kdevelop's plugins won't. Thats partially due to not being able to execute all the steps that are declared via the buildsystem - running external processes, writing files etc. are problematic to execute just for parsing and it can be that certain compiler switches are only used after such actions have been performed (un)successfully.
>     
>     In addition to find this level of information from the buildsystem requires already quite some work for CMake and even more for any other managers we have. I won't ever add support for that to the custom buildsystem manager for example.
>     
>     So, while such a fully automatic solution might be really nice to have, I think its ultimately not going to be fully working or only for the combination of 1 buildsystem manager and 1 language plugin. Hence My suggestion to instead support this via a config page with a default and some wildcard-matching for individual files. This is not automatic but should suffice for most usescases without causing too much extra work.
>     
>     Regarding the String I didn't see any arguments from you that would rule out having an enum or some other abstract type. Might even make sense to supply that from the language support, i.e. ask the language support which features it can support on the given file and how they match to the compiler and compiler flags, i.e. ANSI-C - enable when gcc is used with flag X, C99 enable with gcc and -c99 flag, C++ enable with g++, C++0x enable with g++ and flag Y. Heck, I'd put the decision into another API on ILanguageSupport which taks a list of compiler args, filename and compiler binary name and let the language support return an enum value...
> 
> Aleix Pol Gonzalez wrote:
>     Alexandre, maybe you can try to figure out if it's C it in the language support side?
>     
>     Also we'd have the same problem we have with the .h files, this should be decided by the language support (aka it's included from c++ or c). In case of the .c/.cpp files, you can try to discriminate by extension, for the moment?
>     
>     It can look like a hack, but it's what we'd to in the buildsystem side anyway. Also it would mean depending on a newer kdevplatform as well.
> 
> Alexandre Courbot wrote:
>     Andreas, the first part of your answer sounds like you think I plan to run "make -n" in order to extract compilation options. This is not the case. The idea is that the build manager, be it CMake or anything else, can pass extra information about a file that language support can not get otherwise. This information could be obtained by a dry-run of make, but I don't think this is a good idea neither - I had plenty of problems with the automatic include paths resolver because of the points you mentioned and do not want to go that way. But without going that far, there is still some information that would be valuable to obtain from the build manager.
>     
>     I agree with Aleix that in the case of C/C++ guessing from the file extension will work most of the time with the current parser. But this is still a hack. For .h files as Aleix pointed the best thing is to infer the language depending on which files included it, but once again this does not work if you open that file directly or if it is not included by any visible source file, and we end up with guesses and approximations again.
>     
>     Persisting with approximations and hacks instead of doing things so they can scale right will backdraft at us at some point in the future. This is already the case with define() and includeDirectories() which should have no place in a generic build manager interface and should be part of a more general solution. The present patch is a proposal for a more generic solution.
>     
>     Note that every build manager is not forced to implement the new method at all - those which don't will still continue to work as they did so far, and leave language support without hints as to how to parse the file - just as it is today. This change is not intrusive at all, and does not add any dependency (we already have links from language support to the build manager through define() and includeDirectories()). It is just a new interface that allows more collaboration to take place between language support and build manager. Thus the amount of resistance towards it takes me by surprise.
>     
>     Regarding the string as a vector for this information, you simply cannot use enums as they would be language-specific. For C you could have { ANSI, C99, CPP/CPP11, AllowXX, AllowYY }, and some of these terms could also be combined. For Python { Python2, Python3 } which would be exclusive. A string or list of strings is simple and efficient and I don't see the need to take things further - think of the user-agent used by web browsers for instance. It's really that simple - the build manager gives hints about how the file should be parsed, and language supports sees how much of it it can understand and tunes the parser accordingly. That's it.
> 
> Andreas Pakulat wrote:
>     Well, if the cmake code does a try-compile loop to find out wether the compiler understands certain features and then sets compiler flags accordingly and changes from C99 to ANSI C or whatever. Then the cmake manager needs to do the same thing (and I don't think try-compile is actually implemented right now) to find out the type of C dialect. If this is not done, then you'll never reach 100% coverage so why put in the effort in the first place (which has to be maintained and adapted to newer versions of the buildsystem, something thats a real issue with the few developers we have).
>     
>     Frankly, I don't see a problem with includeDirectories and define's, they're simply specialized versions of a too generic compileFlags() function which wouldn't make any sense with Python for example anyway. Also note your python examples towards the end don't fit, since python projects often don't have a buildsystem in the first place and hence simply use the projectmanager interface.
>     
>     Having an interface thats not used anywhere is simply not useful, if there is only 1 user, its still not very useful since its bound to bitrot over time. In fact, in 2 years or so, some new contributor may look at the c++ support and the cmake manager and notice that C++ uses something which cmake doesn't implement and simply remove it again since he sees no point.
>     
>     As I said before I don't think this is really needed, however if it goes in it needs to use a type-safe API and that means no strings in the API. As I outlined its very much possible for the language support to supply information how to match compiler flags+file to a certain enum and hence let the language support decide how to interpret the information. If we can think of more things that could influence the decision, pass them along as well.
>     
>     Oh and for header files you're doomed anyway, cmake at least cannot tell you wether a header is to be considered C or C++, it doesn't know itself either it only knows the source files. However the DUChain should be able to tell you for a header file which C++ files use it and hopefully this doesn't even need a reparse with full data.
> 
> Alexandre Courbot wrote:
>     Well, I am not going to fight over it if nobody else sees the point of this change. I thought it was an elegant way to solve the C/C++ discrimination problem (besides what can be done through extensions checking) by involving the build system.
>     
>     This patch seemed lonely but I also have adapted the C/C++ parser to accept C code and adapt accordingly. This was supposed to be on top of this change, so I wanted to check the API change first. And kdev-kernel was also using that, always reporting "C" as there is no C++ file in the kernel project.
>     
>     Unless there is more debate in favor of this change in the next few days I will abandon this review request and go ahead with the rest.
> 
> Andreas Pakulat wrote:
>     Oh, I do see the point of having the buildsystem telling the parser how a file is going to be compiled. I just don't see this ever implemented so that it works out-of-the-box in all cases - in particular not for any of the currently included buildsystem plugins. And API thats not used/implemented is just dead code.
>     
>     Anyway, the only technical objection I have against the patch is the use of strings, as I said its doable without too much effort (just a bit more indirection) to have type-safety with this and still get the same features.
> 
> Alexandre Courbot wrote:
>     Ah well, I don't expect this simple interface to solve the whole issue - just to offer opportunity to improve it. My point has never been that it would work out-of-the-box, but some build managers and language supports could take advantage of it (the idea was to selfishly use this for kdev-kernel in particular).
>     
>     But I think we can actually reach a satisfactory point where this feature would have users, especially if we suppress the use of strings as you suggested. If I got it correctly, you are suggesting that the interface returns a dummy class, which would be dynamically casted by the language support into something more meaningful. I.e. the build manager and language support would be contract-tied by some subclass. If we do so, then it totally makes sense for C support to move the defines and includepaths into this class. That way, instead of having these language-specific features into the generic classes, where they definitely don't belong, they would be encapsulated into the language features, and the language features would then have users (and I could snap these parser features into it at the same time). The instance returned by the language features for a C/C++ file would thus have the following methods:
>     
>     - defines() (taking over IBuildSystemManager::defines())
>     - includePaths() (taking over IBuildSystemManager::includeDirectories())
>     - languageFeatures() (returning a bit-field of the features the C parser should enable/disable)
>     
>     Then the whole thing would need to be named differently, "parser features" comes to mind as this is only used by the parser. How does that look? All in all, it should make everything more sound.
> 
> Andreas Pakulat wrote:
>     Hmm, I'm not sure we're talking about the same thing, what I had in mind was something like:
>     
>     QSet<ILanguageFeature> ILanguageSupport::determineLanguageFeatures( const KUrl& file, const QStringList& commandline, const QMap<QString,QString>& environment);
>     
>     And the buildsystemmanager calls that for each file after it determined the commandline during parsing and caches the result. It can be re-computed when necessary as the files change. The parser can then query this based on the file when it needs it...
>     
>     If we notice that additional things are necessary in the future to determine the language features we could add another overload which takes more parameters. Alternatively we could extend the buildsystem manager to simply expose the information:
>     
>     QStringList IBuildSystemManager::buildCommandForFile( const KUrl& file );
>     QMap<QString,QString> IBuildSystemManager::buildEnvironmentForFile( const KUrl& file );
>     
>     The latter would allow to drop the ILanguageFeature interface from the public API and let the language support have a simple enum for the different features internally.
> 
> Milian Wolff wrote:
>     Andreas, how would you figure out whether your ILanguageFeature is actually something like "enable C99 mode"? dynamic_cast<Cpp::C99LanguageFeature>(set.at(...))?
>     
>     But I agree that by making the buildCommand + build environment publically accessible to the parser, we could read it and enable/disable language-specific features on demand. That does sound like a scalable way to address this issue, no? The question of course remains how easy this is to implement... For the QMake manager e.g. I'd have no idea how to actually return a build command... The environment could be returned though, which might also include something like e.g. QMAKE_CXX_FLAGS but then the cpp plugin would need to handle that... And how one sets the compiler in QMake - I have no idea - apparently like this: http://stackoverflow.com/questions/4500720/qmake-extra-compilers-problems-with-dependencies-between-header-files ... So no idea how that should be handled in this new API...
>     
>     All in all I agree that this all seems to be too complicated. I'd rather have a simple configuration interface as proposed by Andreas, where a user could e.g. say "foo/*.c + foo/*.h" => this is C-code, while "bar/*.cpp" and "bar/*.h" is c++ code. Maybe this could be done in such a way that a project manager like kdev-kernel could automatically supplement information such that "path-to-kernel-project/*" is handled as C code. Of course that would be CPP-language-plugin specific, but thats enough. I have not yet heard of any language plugin that would require this in a generic way anyways, i.e. Sven does not plan to support both Python 2.x and Python 3.x.
> 
> Andreas Pakulat wrote:
>     Well as Alexandre said its not just C vs. C++, there are certain features for each language that oe might want to have enabled or disabled. Though most of the time I guess having the IDE parser accept a bit more code than the compiler is ok as long as language itself is correct.
>     
>     The python 2 vs. 3 example cannot be handled automatically anyway, since there is no buildsystem only the interpreter which is used to run the script and the intention of the author which version he wants to use. So for Python it wouldalso need to be a project config.
> 
> Alexandre Courbot wrote:
>     Mmm, no, actually I think it should be much simpler than that.
>     
>     The build system manager should have no need to query ILanguageSupport for features. Why should it do that, since it is the one that should give them (and that will, in the end, provide them back to the parser)? Passing the compilation command line to ILanguageSupport seems also totally wrong to me here. First, because ILanguageSupport is not supposed to know the details about the underlying compiler and its supported options - these details are build details, not language ones. Second, because as it has been pointed out already, the command line is hard to compute even from the build system manager. This is definitely *not* what I am trying to achieve here.
>     
>     Instead of going from a set of compilation features down to the command line that specifies them, then up again to the features from the command line in ILanguageSupport, why not just passing the abstract features directly from the build manager to the language support? In the end, the result would be the same, but it would be much less hassle.
>     
>     We are already doing that actually, more or less. cpplanguagesupport calls IBuildSystemManager::defines() and IBuildSystemManager::includeDirectories() while parsing a file. Actually cpplanguagesupport is the *only* consumer of these interface methods (as in, the framework has no use for them), which supports the idea that they are C++-specific and should be moved out of the interface. What I propose is to forget about the original patch, and rework it as a more general replacement for defines() and includeDirectories() that would return an instance that can be casted by the language support into something meaningful to the language. In the case of C++, this object would contain defines() and includeDirectories() methods, along with flags indicating whether the source file is C or C++, as well as any other information the build manager could provide. If it could not provide this extra information, then the parser will revert to defaults, as it already does today.
>     
>     So, to summarize, we would have:
>     - One single method into IBuildSystemManager that takes a file and returns an instance of ILanguageFeatures for it
>     - For C++, an ICPPLanguageFeatures interface with defines(), includeDirectories(), and cppVersion() abstract methods along with enums for language tags (C99, C++, C++11, ...). This class will serve as a contract between the build managers and CPPLanguageSupport.
>     - The build managers that support this would have an extra class that implements ICPPLanguageFeatures and its methods (the existing ones would mimic the current defines() and includeDirectories() behavior). Those that don't just return NULL and language support knows what this means.
>     
>     That way we can implement both current and future behaviors in a clean, abstract way. And my request for different language versions will not bloat the build managers interface. How does that sound?
> 
> Andreas Pakulat wrote:
>     This would require defining which languages are aupported in the framework. Thats totally wrong, since it requires changing the framework to add a new ixxxfeatures class. Linking the buildsystemmanager against the language plugins is also not better.
>     
>     I guess we simply disagree on a very basic level what each plugin is supposed to do. In my opinion and looking at cmake, the buildsystem cannot and should not know more than what its supposed to do with a given file. That means it kows which commandline to run to get another file out of it which is a dependency of a target that should be build. The buildsystem has no clue what the various commandline options are supposed to do, though it may have some special commands to add some of them automatically (link-libraries, include_directories etc.). The language support however knows its dealing with a c(pp) file and hence can know that the command its run through is a c or cpp compiler. Thus it can look at the commandline and decide wether any of the options should change its paraing behaviour dor that file. This is. Basically treating our languag plugin as a compiler, which I think is the right thing to do here.
>     
>     Another option would be to start thingking about a whole new framework api that sits between both buildsystem and language api and translates from commandline arguments into language features, so there would be separate implementations for each command that needs to be supported. I think though thus would be a bit too abstraction.
> 
> Milian Wolff wrote:
>     Andreas, from a practical view, could you please answer my question of how you think one could actually implement your two API functions in a sane way? Take QMake for example... Or CMake for what its worth, how would you return a buildCommand there? After all, that will need to be somewhat standardized such that the C++ language plugin can actually do something meaningful with it. If every build system just returns some hand crafted string of a guessed build line there, I'd rather not have that in our API at all, but instead some more typesafe API as proposed by Alexandre.
>     
>     Yet of course I see your reasoning that a build manager should also not have special code to parse some language specific features...
>     
>     All in all I would think that the cpp-specific configuration, where a user (or potentially also a project manager like kdev-kernel) can define whether a given file is C or C++ would be scalable and easy to implement. Shouldn't we really go that route instead of pitting two ideas against each other which both have their fallacies, just for the sake of abstracting it to multiple languages even though no other language will make use of it?
> 
> Andreas Pakulat wrote:
>     I'm with you that the config page should be more than enough for this case.
>     
>     For a generic solution, it more or less depends on what you want to achieve. If you want a (half-)automatic way for the language support to figure out language features it should use for parsing based on compiler commands from the buildsystem, then thats exactly what the API should provide. CMake itself also doesn't have the slightest clue about language features, it only knows that a special cmake-file is there which sets up a couple of CMake variables that have the language encoded. Those contain the compiler executable, the standard flags for compilation and linking and a few variations for different build types (there are actually a few more things cmake knows like include-dir switches, link-switches, but thats actually a downside causing problems for other languages like Java). Thats all the buildsystem can provide, without adding language-specific stuff into the implementation. So the language plugin would need to figure out what features to enable based on the list of compiler arguments. For that a simple string-list interface is completely sufficient, there's no way to make this more type-safe without adding knowledge about specific flags to the buildsystem.
>     
>     Or you just want to be able to build a special buildsystem implementation which can supply special C/C++ feature information since it knows it only deals with C99 or ANSI-C or the project uses C++0x. Then a simple enum with a few hardcoded values in the public API would suffice, but it would also be a generic API thats only suitable for a very special use-case.
> 
> Alexandre Courbot wrote:
>     I see it is difficult to reach consensus on this one. Maybe I will just let it go for a while - I can probably achieve what I need through other means for now (although it will just be a hack) and we can come back to this once we have more insight on the problem.
>     
>     I don't know how I can upload the C patches for the parser without a way to control its behavior though.

I'd suggest you to figure out a way to put the parser in a C or C++ state and put the logic that figures out if it's C or C++ inside the kdevcpplanguage plugin.

Once all this is settled we'll see if we can do better.


- Aleix


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/105299/#review15362
-----------------------------------------------------------


On June 20, 2012, 1:43 a.m., Alexandre Courbot wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/105299/
> -----------------------------------------------------------
> 
> (Updated June 20, 2012, 1:43 a.m.)
> 
> 
> Review request for KDevelop.
> 
> 
> Description
> -------
> 
> Add support for "language features"
> 
> Sometimes the same language can run using different variants - the most
> obvious example is the C++ language support, which may also support C
> and other variants and behave differently according to the type of file.
> 
> This patch adds a new method to IBuildSystemManager allowing it to get a
> list of features to pass to ILanguageSupport::createParseJob as an
> additional argument. ILanguageSupport can then adapt the behavior of its
> parser according to the features the build manager says the parsed file
> uses.
> 
> Corresponding support for kdevelop (since this patch breaks API compatibility) is there: https://git.reviewboard.kde.org/r/105300/
> 
> 
> Diffs
> -----
> 
>   language/backgroundparser/backgroundparser.cpp 417a8e4b7f38acfa959959895f186c11e3a76f93 
>   language/backgroundparser/tests/testlanguagesupport.h ed3864c9e8da8eed97d3d91500eec6c623fae41e 
>   language/backgroundparser/tests/testlanguagesupport.cpp 3f88894d728610ebd433bff46936f38dcd2138be 
>   language/interfaces/ilanguagesupport.h 22cedf09656aaf80275dd3a14d3752003fe9a912 
>   project/interfaces/ibuildsystemmanager.h c0813d8f781b0be29829b9278f191299af823b68 
>   project/interfaces/ibuildsystemmanager.cpp 74af0e76f8c8bc9276d79ff54be4d3d41927c298 
> 
> Diff: http://git.reviewboard.kde.org/r/105299/diff/
> 
> 
> Testing
> -------
> 
> Compiled kdevplatform & updated kdevelop, checked things were working.
> 
> 
> Thanks,
> 
> Alexandre Courbot
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20120716/1494096f/attachment.html>


More information about the KDevelop-devel mailing list