Supporting multiple languages in a single document

Sun Jun 6 11:24:34 UTC 2010

Hey there,

during my GSOC I definitely have to solve the current issue where multiple 
languages are used in a single document.

Right now one language "wins" since the background parser only create a 
parsejob for the first language it gets. See:

            // TODO more thinking required here to support multiple parse jobs 
per url (where multiple language plugins want to parse)
            return job;

Now with CSS, PHP and XML language plugins we need a solution for that. My 
current idea is this:

- weight languages, i.e. preprocessors (ruby, php, ...) come first
- XML/HTML is next
- embedded JavaScript/CSS is last

Now assume we have this file:

foo.php

<!DOCTYPE html>
<html>
  <head>
    <title><?php echo $title; ?></title>
    <style type="text/css">
     #foo { color:red; }
    </style>
  </head>
  <body></body>
</head>

The preprocessor knows what it has to parse, i.e. only <?php ... ?>

It _somehow_ (this really needs to be discussed!) triggers a parsejob for XML 
and tells it: you only have to care for ranges <!DOCTYPE to <title> and 
</title> to </head>.

XML does it's magic and tells CSS (again, how?) to parse the code between 
<style>...</style>

Problems / Questions:

- how to trigger a parsejob for only a given list of ranges in a document?
we could use a new method in the BackgroundParser for that...

- how to know what language to use in the "deeper" parsejobs? I mean php 
doesn't know it's emitting html, xml, css, $whatever. But HTML otoh _does_ 
know what it embeds (you always pass the mimetype hence we could use our 
current mechanism to find the language for that).

I think for now it would suffice if we hardcode HTML aka XML in PHP since thats 
what its used for most often. Later on we could think about ways to let the 
user configure what a given file is outputting.

Note: If we'd run _all_ languages at the same "weight" we'd have the problem 
that:

a) each language must know what to exclude where, i.e. needs at least a 
limited knowledge about php, ruby, ... => complex, error prone and much code 
duplication
b) a file has only one mimetype, hence we'd have something like right now where 
CSS is registered for the HTML mimetypes. Ugly and not really correct

I think I'll try to write some code for my idea and see how it works...
-- 
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20100606/eac6ed06/attachment.sig>