html / xml language support
Milian Wolff
mail at milianw.de
Tue Jan 26 19:49:30 UTC 2010
On Tuesday, 26. January 2010 20:07:11 Ruan Strydom wrote:
> > This solely depends on you. I'm not a really into XML, but isn't XSD just
> > XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
> > different, so probably needs it's own parser.
>
> The reason for giving XSD its own lexer/parser is because its a language on
> its own, XML is just the medium:
>
> <xs:simpleType name="orderidtype">
> <xs:restriction base="xs:string">
> <xs:pattern value="[0-9]{6}"/>
> </xs:restriction>
> </xs:simpleType>
>
>
> so the lexer will break it up similar to html and xml but the parser will
> have to handle simpleType with a special meaning, whereas <ns:order
> orderType="" /> does not. In this case the orderType attribute is given
> meaning by the simpleType declaration and by its inclusion.
>
> I suppose its a bit like:
>
> class Test {
> };
>
> Test t;
>
> The class declaration is treated differently to its usage? Also CSS is
> treated differently to HTML.
>
> Or am I wrong? This whole thing has me as confused as vomit in a
> tumble-dryer.
Hahaha thanks for that metaphor :) As war as I can see the difference is less
the parser (that simply gives you an AST without any deep meaning, i.e. no
semantics). What's really the difference is that for a .xml document you'd have
to run the Usebuilder, whereas for the .xsd you'd have to run the
DeclarationBuilder...
> > 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
> > that supports this kinda stuff like <ul><li>listitem without closing
> > tag</ul>?
>
> Yes that was the idea to an extent. For example "<ul><li>listitem" would be
> treated as "<ul/><li/>listitem". I realize that this is not the SGML
> standard though? I will have to see, since this would be incorrect.
Once you are able to parse the DTD it shouldn't be too hard though I think. If
you meet a close-tag, just see whether it closes a parent and then also close
the unclosed children... But well, this is just my hunch :) Not sure whether
it's really that "simple" (i.e. just a QStack of openened tags).
> > What do you mean here? The internal Declarations used to represent the
> > schemas? As far as I'm concerned, I'm pretty sure you'll have to create
> > your own Declaration's, e.g. take a look at
> > php/duchain/variabledeclaration.{h,cpp}
>
> Yes that is what I meant. But the same Declarations etc. will be used by
> both DTD and XSD?
If DTD and XSD declarations have the same info and you want to share them:
sure! But well, this is up to you. As both are schema "declarations", it
probably makes sense. Sadly I can't tell you for sure since I don't really
know the difference between the two :)
> I am going to study now, I'll have a look at this tomorrow again. 30 and
> still studying :(
Better learn than die dumb :P And again: You have all the time you need, don't
worry. I'm very greatful that I'm not the one who has to battle XML & HTML :)
But I'll try to assist you once I have more time.
> On Tuesday 26 January 2010 19:41:31 Milian Wolff wrote:
> > On Tuesday, 26. January 2010 17:59:59 Ruan Strydom wrote:
> > > I have done a bit more reading etc on the language support stuff, still
> > > nowhere close to start coding yet... what I would like to say is: "YOU
> > > GUYS ARRRNT HUMAN!!" ..... man what did I get myself into.....
> >
> > I can remember feeling the same way when I started contributing to
> > KDevelop/PHP, though I had the advantage of only having to improve an
> > existing solution, not coming up with a completely new language support
> > ;-) Please, don't get lost or disappointed!
> >
> > > anyway I have a couple more questions:
> > >
> > > 1 I will need 3 token'izers and 3 parsers: (XSD), (DTD), (HTML/XML)?
> >
> > This solely depends on you. I'm not a really into XML, but isn't XSD just
> > XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
> > different, so probably needs it's own parser.
> >
> > But I think I'd personally try to leverage as much as possible. What does
> > Webkit use? What does KHTML use? Can't you at least borrow their' lexer
> > and try to port their (bison?) grammar to KDevelop-PG-Qt? We could help
> > there as well.
> >
> > Also note: When you use KDevelop-PG-Qt, the tokenizer is created
> > automatically for you (afaik). But you need a lexer.
> >
> > 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
> > that supports this kinda stuff like <ul><li>listitem without closing
> > tag</ul>?
> >
> > > 2 XSD and DTD will have to be done in KDevelop-PG-QT and I am not so
> > > sure about HTML/XML?
> >
> > As I said above: try to look around for existing solutions and take as
> > much as possible from them. HTML/XML is a "quite simple" format to parse,
> > but as soon as you get to stuff like the above mentioned valid SGML
> > parts, it might become tricky. I know that, I once wrote a HTML parser in
> > PHP ;-)
> >
> > Also what makes stuff kinda tricky is that in the editor, you should be
> > as error-tolerant as possible, since while editing the document _will_
> > incorporate invalid stuff...
> >
> > > 3 The DTD and XSD builders will build the same DUChain structure?
> >
> > What do you mean here? The internal Declarations used to represent the
> > schemas? As far as I'm concerned, I'm pretty sure you'll have to create
> > your own Declaration's, e.g. take a look at
> > php/duchain/variabledeclaration.{h,cpp}
> >
> > You can create internal datastructures just the way you want them and it
> > might be that the stuff that fits a DTD also fits a XSD element, but
> > that's nothing I know for sure (since I'm a layman when it comes to
> > these topics).
> >
> > > Am I more or less going in the right direction here?
> >
> > Well yes I think so. I really have to try out the XML plugin of yours
> > these coming days and give you maybe some feedback. It might also be
> > possible that we could improve the XML plugin first so we can show you
> > what you could use where. Then maybe you are acquainted enough with the
> > KDevelop stuff so that you can write the other stuff on your own... But I
> > don't have the time so far...
> >
> > > And I am not sure about the rest, haven't gotten that far? I suppose
> > > when the users types the HTML and XML I will iterate over the
> > > top-context of the imported XSD/DTD to do validations etc? Haven't
> > > thought about it...
> >
> > Well you'd probably generate an AST from the XML or HTML document,
> > iterate over the Tag's and see whether they are valid:
> >
> > - in a parent tag that allows this child
> > - all required attributes
> > - only valid attributes
> > - attributes have correct values (where appropriate)
> > - tag is properly closed
> >
> > > I am glad I started working on the XML catalog though cause it will be
> > > used.
> > >
> > > Thanks
> > >
> > > On Saturday 23 January 2010 10:50:30 you wrote:
> > > > On Sat, Jan 23, 2010 at 09:13, Ruan Strydom <ruan at jcell.co.za> wrote:
> > > > > Sorry I just realized that I had a couple of random incorrect
> > > > > statements and questions thrown into one paragraph... here is a
> > > > > re-factored version, please ignore the other.
> > > >
> > > > hehe, "refactored" text :D
> > > >
> > > > > I will try to do DTD first since HTML use it. Schema's have
> > > > > inheritance and a more complex structure so I would have to keep it
> > > > > in mind while doing it. Perhaps I can pass a couple of diagrams
> > > > > past you guy's?
> > > > >
> > > > > Which leads to the question about the DUChain: are you certain that
> > > > > I should not use it, as schema's follow a OO structure, there is
> > > > > inheritance, also schema define 'enums', etc which is similar to
> > > > > code (correction)?
> > > >
> > > > No, I was just saying you don't *have* to use it. If it fits your
> > > > needs - use it. Maybe
> > > > David can comment on this.
> > > >
> > > > > About implementing a tree like structure in the DUChain: it may go
> > > > > a couple of levels deep, but I do not think that it would be
> > > > > majorly excessive since humans type it?
> > > >
> > > > The problem I was thinking about was not storing a tree structure in
> > > > DUChain (that
> > > > should perfeclty work), it was the Outline Quick open that doesn't
> > > > show a tree.
> > > >
> > > > > There is a lot of work that went into the DUChain and its
> > > > > synchronization (correction), making my own will just open a whole
> > > > > new can of worms. I suppose I could use aspects of the DUChain
> > > > > rather than using all of it?
> > > >
> > > > You probably could create a list that contains:
> > > > - what you need to store
> > > > - when you need to reparse it
> > > > - when you need to load it (ie. after kdevelop restart)
> > > > - what items you need to find in the structure
> > > > And then decide on the storage system to use.
> > > >
> > > > Niko
>
--
Milian Wolff
mail at milianw.de
http://milianw.de
More information about the KDevelop-devel
mailing list