Supporting multiple languages in a single document

Nicolás Alvarez nicolas.alvarez at gmail.com
Sun Jun 6 19:04:47 UTC 2010


Milian Wolff wrote:
> I've now started working on it and at least the documents get parsed one
> after the other already. There are lots of problems though and the code is
> still bit dirty. The biggest problems I see right now is that apparently
> most tools (outline, context browser, ...) use
> DUChainUtils::standardContextForUrl() (most functions in DUChainUtils use
> that) which will always return the "inner" language. In my test case that
> would be CSS. How should that be fixed...
>
> I mean now we have multiple TopDUContexts for each file, one for each
> contained language...

I don't really understand the minor technical details of what you're
doing; but here are use cases for you to think about :)

In the case of PHP, the default language is actually HTML. It only
becomes PHP when you get to a <?php ?> tag. However, all the PHP
chunks together form a single program.

Another way to think about it is that the document language is PHP,
but everything between begin of file and <?php, between ?> and <?php,
and between ?> and end of file is really text passed to an "echo"
statement. This is how the internal PHP implementation works; but I'm
not sure if this model is useful for us at all.

<?php
function foo() {
    $one = 42;
?>
hello world
<?php
    echo $one;
}
?>

In "echo $one", the parser has to understand that we are still inside
the function. This is important so it knows that $one is the same
local variable that was declared above.


Another situation: Javascript embedded in HTML. In this case, separate
<script> tags are executed separately, they don't form a single
program. If you do something like in the PHP example above, it's a
syntax error. However, global variables *are* shared between different
script tags, and built-ins like "document" or "window" are the same in
the entire HTML document. In addition, if you declare a function
anywhere, and call it from another function anywhere, it will work.
All this is important in the creation of the Javascript duchain.

There is a chance we can handle this situation very similar to PHP,
actually. I haven't analyzed it carefully.


But here is another hypothetical case of multiple languages that is
definitely different, with chunks of fully-independent programs
embedded in another:

MACRO(AC_STRUCT_TIMEZONE)
    MESSAGE(STATUS "Checking for struct tm.tm_zone")
    CHECK_C_SOURCE_COMPILES("
#include <time.h>

int main() {
    static struct tm obj;
    if (obj.tm_zone) {
        return 0;
    }
    return 0;
}
" struct_tm_check)
ENDMACRO(AC_STRUCT_TIMEZONE)

This is C embedded in CMake. It would be quite interesting if we could
have C++ semantic highlighting, code completion, etc. inside that
string. But note that each call to CHECK_C_SOURCE_COMPILES uses a
totally independent C program. If I declare a function in one, and use
it in another, the second should appear underlined in yellow, because
one program won't see the other's functions.

Even if nobody implements this in the CMake language support, I think
it's a valid use case for multiple languages in a document, and the
parsing frameworks should support it.

-- 
Nicolas

(I read mailing lists through Gmane. Please don't Cc me on replies; it
makes me get one message on my newsreader and another on email.)




More information about the KDevelop-devel mailing list