html / xml language support

Milian Wolff mail at milianw.de
Tue Jan 26 17:41:31 UTC 2010


On Tuesday, 26. January 2010 17:59:59 Ruan Strydom wrote:
> I have done a bit more reading etc on the language support stuff, still
> nowhere close to start coding yet... what I would like to say is: "YOU GUYS
> ARRRNT HUMAN!!" ..... man what did I get myself into.....

I can remember feeling the same way when I started contributing to 
KDevelop/PHP, though I had the advantage of only having to improve an existing 
solution, not coming up with a completely new language support ;-) Please, 
don't get lost or disappointed!
 
> anyway I have a couple more questions:
> 
> 1 I will need 3 token'izers and 3 parsers: (XSD), (DTD), (HTML/XML)?

This solely depends on you. I'm not a really into XML, but isn't XSD just XML? 
I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks different, so 
probably needs it's own parser.

But I think I'd personally try to leverage as much as possible. What does 
Webkit use? What does KHTML use? Can't you at least borrow their' lexer and 
try to port their (bison?) grammar to KDevelop-PG-Qt? We could help there as 
well.

Also note: When you use KDevelop-PG-Qt, the tokenizer is created automatically 
for you (afaik). But you need a lexer.

2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one that 
supports this kinda stuff like <ul><li>listitem without closing tag</ul>?

> 2 XSD and DTD will have to be done in KDevelop-PG-QT and I am not so sure
> about HTML/XML?

As I said above: try to look around for existing solutions and take as much as 
possible from them. HTML/XML is a "quite simple" format to parse, but as soon 
as you get to stuff like the above mentioned valid SGML parts, it might become 
tricky. I know that, I once wrote a HTML parser in PHP ;-)

Also what makes stuff kinda tricky is that in the editor, you should be as 
error-tolerant as possible, since while editing the document _will_ 
incorporate invalid stuff...

> 3 The DTD and XSD builders will build the same DUChain structure?

What do you mean here? The internal Declarations used to represent the 
schemas? As far as I'm concerned, I'm pretty sure you'll have to create your 
own Declaration's, e.g. take a look at php/duchain/variabledeclaration.{h,cpp}

You can create internal datastructures just the way you want them and it might 
be that the stuff that fits a DTD also fits a XSD element, but that's nothing I 
know for sure (since I'm a layman when it comes to these topics).

> Am I more or less going in the right direction here?

Well yes I think so. I really have to try out the XML plugin of yours these 
coming days and give you maybe some feedback. It might also be possible that 
we could improve the XML plugin first so we can show you what you could use 
where. Then maybe you are acquainted enough with the KDevelop stuff so that you 
can write the other stuff on your own... But I don't have the time so far...

> And I am not sure about the rest, haven't gotten that far? I suppose when
>  the users types the HTML and XML I will iterate over the top-context of
>  the imported XSD/DTD to do validations etc? Haven't thought about it...

Well you'd probably generate an AST from the XML or HTML document, iterate 
over the Tag's and see whether they are valid:

- in a parent tag that allows this child
- all required attributes
- only valid attributes
- attributes have correct values (where appropriate)
- tag is properly closed

> I am glad I started working on the XML catalog though cause it will be
>  used.

> Thanks
> 
> On Saturday 23 January 2010 10:50:30 you wrote:
> > On Sat, Jan 23, 2010 at 09:13, Ruan Strydom <ruan at jcell.co.za> wrote:
> > > Sorry I just realized that I had a couple of random incorrect
> > > statements and questions thrown into one paragraph... here is a
> > > re-factored version, please ignore the other.
> >
> > hehe, "refactored" text :D
> >
> > > I will try to do DTD first since HTML use it. Schema's have inheritance
> > > and a more complex structure so I would have to keep it in mind while
> > > doing it. Perhaps I can pass a couple of diagrams past you guy's?
> > >
> > > Which leads to the question about the DUChain: are you certain that I
> > > should not use it, as  schema's follow a OO structure, there is
> > > inheritance, also schema define 'enums', etc which is similar to code
> > > (correction)?
> >
> > No, I was just saying you don't *have* to use it. If it fits your
> > needs - use it. Maybe
> > David can comment on this.
> >
> > > About implementing a tree like structure in the DUChain: it may go a
> > > couple of levels deep, but I do not think that it would be majorly
> > > excessive since humans type it?
> >
> > The problem I was thinking about was not storing a tree structure in
> > DUChain (that
> > should perfeclty work), it was the Outline Quick open that doesn't show a
> >  tree.
> >
> > > There is a lot of work that went into the DUChain and its
> > > synchronization (correction), making my own will just open a whole new
> > > can of worms. I suppose I could use aspects of the DUChain rather than
> > > using all of it?
> >
> > You probably could create a list that contains:
> > - what you need to store
> > - when you need to reparse it
> > - when you need to load it (ie. after kdevelop restart)
> > - what items you need to find in the structure
> > And then decide on the storage system to use.
> >
> > Niko
> 

-- 
Milian Wolff
mail at milianw.de
http://milianw.de




More information about the KDevelop-devel mailing list