html / xml language support

Milian Wolff mail at milianw.de
Tue Jan 26 19:49:30 UTC 2010


On Tuesday, 26. January 2010 20:07:11 Ruan Strydom wrote:
> > This solely depends on you. I'm not a really into XML, but isn't XSD just
> >  XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
> >  different, so probably needs it's own parser.
> 
> The reason for giving XSD its own lexer/parser is because its a language on
> its own, XML is just the medium:
> 
>   <xs:simpleType name="orderidtype">
>     <xs:restriction base="xs:string">
>       <xs:pattern value="[0-9]{6}"/>
>     </xs:restriction>
>   </xs:simpleType>
> 
> 
> so the lexer will break it up similar to html and xml but the parser will
>  have to handle simpleType with a special meaning, whereas <ns:order
>  orderType="" /> does not. In this case the orderType attribute is given
>  meaning by the simpleType declaration and by its inclusion.
> 
> I suppose its a bit like:
> 
> class Test {
> };
> 
> Test t;
> 
> The class declaration is treated differently to its usage? Also CSS is
>  treated differently to HTML.
> 
> Or am I wrong? This whole thing has me as confused as vomit in a
>  tumble-dryer.

Hahaha thanks for that metaphor :) As war as I can see the difference is less 
the parser (that simply gives you an AST without any deep meaning, i.e. no 
semantics). What's really the difference is that for a .xml document you'd have 
to run the Usebuilder, whereas for the .xsd you'd have to run the 
DeclarationBuilder...

> > 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
> >  that supports this kinda stuff like <ul><li>listitem without closing
> >  tag</ul>?
> 
> Yes that was the idea to an extent. For example "<ul><li>listitem" would be
> treated as "<ul/><li/>listitem".  I realize that this is not the SGML
>  standard though? I will have to see, since this would be incorrect.

Once you are able to parse the DTD it shouldn't be too hard though I think. If 
you meet a close-tag, just see whether it closes a parent and then also close 
the unclosed children... But well, this is just my hunch :) Not sure whether 
it's really that "simple" (i.e. just a QStack of openened tags).

> > What do you mean here? The internal Declarations used to represent the
> > schemas? As far as I'm concerned, I'm pretty sure you'll have to create
> >  your own Declaration's, e.g. take a look at
> >  php/duchain/variabledeclaration.{h,cpp}
> 
> Yes that is what I meant. But the same Declarations etc. will be used by
>  both DTD and XSD?

If DTD and XSD declarations have the same info and you want to share them: 
sure! But well, this is up to you. As both are schema "declarations", it 
probably makes sense. Sadly I can't tell you for sure since I don't really 
know the difference between the two :)

> I am going to study now, I'll have a look at this tomorrow again. 30 and
>  still studying :(

Better learn than die dumb :P And again: You have all the time you need, don't 
worry. I'm very greatful that I'm not the one who has to battle XML & HTML :) 
But I'll try to assist you once I have more time.

> On Tuesday 26 January 2010 19:41:31 Milian Wolff wrote:
> > On Tuesday, 26. January 2010 17:59:59 Ruan Strydom wrote:
> > > I have done a bit more reading etc on the language support stuff, still
> > > nowhere close to start coding yet... what I would like to say is: "YOU
> > > GUYS ARRRNT HUMAN!!" ..... man what did I get myself into.....
> >
> > I can remember feeling the same way when I started contributing to
> > KDevelop/PHP, though I had the advantage of only having to improve an
> >  existing solution, not coming up with a completely new language support
> >  ;-) Please, don't get lost or disappointed!
> >
> > > anyway I have a couple more questions:
> > >
> > > 1 I will need 3 token'izers and 3 parsers: (XSD), (DTD), (HTML/XML)?
> >
> > This solely depends on you. I'm not a really into XML, but isn't XSD just
> >  XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
> >  different, so probably needs it's own parser.
> >
> > But I think I'd personally try to leverage as much as possible. What does
> > Webkit use? What does KHTML use? Can't you at least borrow their' lexer
> > and try to port their (bison?) grammar to KDevelop-PG-Qt? We could help
> > there as well.
> >
> > Also note: When you use KDevelop-PG-Qt, the tokenizer is created
> >  automatically for you (afaik). But you need a lexer.
> >
> > 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
> >  that supports this kinda stuff like <ul><li>listitem without closing
> >  tag</ul>?
> >
> > > 2 XSD and DTD will have to be done in KDevelop-PG-QT and I am not so
> > > sure about HTML/XML?
> >
> > As I said above: try to look around for existing solutions and take as
> > much as possible from them. HTML/XML is a "quite simple" format to parse,
> > but as soon as you get to stuff like the above mentioned valid SGML
> > parts, it might become tricky. I know that, I once wrote a HTML parser in
> > PHP ;-)
> >
> > Also what makes stuff kinda tricky is that in the editor, you should be
> > as error-tolerant as possible, since while editing the document _will_
> > incorporate invalid stuff...
> >
> > > 3 The DTD and XSD builders will build the same DUChain structure?
> >
> > What do you mean here? The internal Declarations used to represent the
> > schemas? As far as I'm concerned, I'm pretty sure you'll have to create
> >  your own Declaration's, e.g. take a look at
> >  php/duchain/variabledeclaration.{h,cpp}
> >
> > You can create internal datastructures just the way you want them and it
> >  might be that the stuff that fits a DTD also fits a XSD element, but
> >  that's nothing I know for sure (since I'm a layman when it comes to
> > these topics).
> >
> > > Am I more or less going in the right direction here?
> >
> > Well yes I think so. I really have to try out the XML plugin of yours
> > these coming days and give you maybe some feedback. It might also be
> > possible that we could improve the XML plugin first so we can show you
> > what you could use where. Then maybe you are acquainted enough with the
> > KDevelop stuff so that you can write the other stuff on your own... But I
> > don't have the time so far...
> >
> > > And I am not sure about the rest, haven't gotten that far? I suppose
> > > when the users types the HTML and XML I will iterate over the
> > > top-context of the imported XSD/DTD to do validations etc? Haven't
> > > thought about it...
> >
> > Well you'd probably generate an AST from the XML or HTML document,
> > iterate over the Tag's and see whether they are valid:
> >
> > - in a parent tag that allows this child
> > - all required attributes
> > - only valid attributes
> > - attributes have correct values (where appropriate)
> > - tag is properly closed
> >
> > > I am glad I started working on the XML catalog though cause it will be
> > >  used.
> > >
> > > Thanks
> > >
> > > On Saturday 23 January 2010 10:50:30 you wrote:
> > > > On Sat, Jan 23, 2010 at 09:13, Ruan Strydom <ruan at jcell.co.za> wrote:
> > > > > Sorry I just realized that I had a couple of random incorrect
> > > > > statements and questions thrown into one paragraph... here is a
> > > > > re-factored version, please ignore the other.
> > > >
> > > > hehe, "refactored" text :D
> > > >
> > > > > I will try to do DTD first since HTML use it. Schema's have
> > > > > inheritance and a more complex structure so I would have to keep it
> > > > > in mind while doing it. Perhaps I can pass a couple of diagrams
> > > > > past you guy's?
> > > > >
> > > > > Which leads to the question about the DUChain: are you certain that
> > > > > I should not use it, as  schema's follow a OO structure, there is
> > > > > inheritance, also schema define 'enums', etc which is similar to
> > > > > code (correction)?
> > > >
> > > > No, I was just saying you don't *have* to use it. If it fits your
> > > > needs - use it. Maybe
> > > > David can comment on this.
> > > >
> > > > > About implementing a tree like structure in the DUChain: it may go
> > > > > a couple of levels deep, but I do not think that it would be
> > > > > majorly excessive since humans type it?
> > > >
> > > > The problem I was thinking about was not storing a tree structure in
> > > > DUChain (that
> > > > should perfeclty work), it was the Outline Quick open that doesn't
> > > > show a tree.
> > > >
> > > > > There is a lot of work that went into the DUChain and its
> > > > > synchronization (correction), making my own will just open a whole
> > > > > new can of worms. I suppose I could use aspects of the DUChain
> > > > > rather than using all of it?
> > > >
> > > > You probably could create a list that contains:
> > > > - what you need to store
> > > > - when you need to reparse it
> > > > - when you need to load it (ie. after kdevelop restart)
> > > > - what items you need to find in the structure
> > > > And then decide on the storage system to use.
> > > >
> > > > Niko
> 

-- 
Milian Wolff
mail at milianw.de
http://milianw.de




More information about the KDevelop-devel mailing list