html / xml language support
Ruan Strydom
ruan at jcell.co.za
Tue Jan 26 19:07:11 UTC 2010
> This solely depends on you. I'm not a really into XML, but isn't XSD just
> XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
> different, so probably needs it's own parser.
The reason for giving XSD its own lexer/parser is because its a language on
its own, XML is just the medium:
<xs:simpleType name="orderidtype">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{6}"/>
</xs:restriction>
</xs:simpleType>
so the lexer will break it up similar to html and xml but the parser will have
to handle simpleType with a special meaning, whereas <ns:order orderType="" />
does not. In this case the orderType attribute is given meaning by the
simpleType declaration and by its inclusion.
I suppose its a bit like:
class Test {
};
Test t;
The class declaration is treated differently to its usage? Also CSS is treated
differently to HTML.
Or am I wrong? This whole thing has me as confused as vomit in a tumble-dryer.
> 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
> that supports this kinda stuff like <ul><li>listitem without closing
> tag</ul>?
Yes that was the idea to an extent. For example "<ul><li>listitem" would be
treated as "<ul/><li/>listitem". I realize that this is not the SGML standard
though? I will have to see, since this would be incorrect.
> What do you mean here? The internal Declarations used to represent the
> schemas? As far as I'm concerned, I'm pretty sure you'll have to create
> your own Declaration's, e.g. take a look at
> php/duchain/variabledeclaration.{h,cpp}
Yes that is what I meant. But the same Declarations etc. will be used by both
DTD and XSD?
I am going to study now, I'll have a look at this tomorrow again. 30 and still
studying :(
....
Ruan
On Tuesday 26 January 2010 19:41:31 Milian Wolff wrote:
> On Tuesday, 26. January 2010 17:59:59 Ruan Strydom wrote:
> > I have done a bit more reading etc on the language support stuff, still
> > nowhere close to start coding yet... what I would like to say is: "YOU
> > GUYS ARRRNT HUMAN!!" ..... man what did I get myself into.....
>
> I can remember feeling the same way when I started contributing to
> KDevelop/PHP, though I had the advantage of only having to improve an
> existing solution, not coming up with a completely new language support
> ;-) Please, don't get lost or disappointed!
>
> > anyway I have a couple more questions:
> >
> > 1 I will need 3 token'izers and 3 parsers: (XSD), (DTD), (HTML/XML)?
>
> This solely depends on you. I'm not a really into XML, but isn't XSD just
> XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
> different, so probably needs it's own parser.
>
> But I think I'd personally try to leverage as much as possible. What does
> Webkit use? What does KHTML use? Can't you at least borrow their' lexer and
> try to port their (bison?) grammar to KDevelop-PG-Qt? We could help there
> as well.
>
> Also note: When you use KDevelop-PG-Qt, the tokenizer is created
> automatically for you (afaik). But you need a lexer.
>
> 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
> that supports this kinda stuff like <ul><li>listitem without closing
> tag</ul>?
>
> > 2 XSD and DTD will have to be done in KDevelop-PG-QT and I am not so sure
> > about HTML/XML?
>
> As I said above: try to look around for existing solutions and take as much
> as possible from them. HTML/XML is a "quite simple" format to parse, but
> as soon as you get to stuff like the above mentioned valid SGML parts, it
> might become tricky. I know that, I once wrote a HTML parser in PHP ;-)
>
> Also what makes stuff kinda tricky is that in the editor, you should be as
> error-tolerant as possible, since while editing the document _will_
> incorporate invalid stuff...
>
> > 3 The DTD and XSD builders will build the same DUChain structure?
>
> What do you mean here? The internal Declarations used to represent the
> schemas? As far as I'm concerned, I'm pretty sure you'll have to create
> your own Declaration's, e.g. take a look at
> php/duchain/variabledeclaration.{h,cpp}
>
> You can create internal datastructures just the way you want them and it
> might be that the stuff that fits a DTD also fits a XSD element, but
> that's nothing I know for sure (since I'm a layman when it comes to these
> topics).
>
> > Am I more or less going in the right direction here?
>
> Well yes I think so. I really have to try out the XML plugin of yours these
> coming days and give you maybe some feedback. It might also be possible
> that we could improve the XML plugin first so we can show you what you
> could use where. Then maybe you are acquainted enough with the KDevelop
> stuff so that you can write the other stuff on your own... But I don't
> have the time so far...
>
> > And I am not sure about the rest, haven't gotten that far? I suppose when
> > the users types the HTML and XML I will iterate over the top-context of
> > the imported XSD/DTD to do validations etc? Haven't thought about it...
>
> Well you'd probably generate an AST from the XML or HTML document, iterate
> over the Tag's and see whether they are valid:
>
> - in a parent tag that allows this child
> - all required attributes
> - only valid attributes
> - attributes have correct values (where appropriate)
> - tag is properly closed
>
> > I am glad I started working on the XML catalog though cause it will be
> > used.
> >
> > Thanks
> >
> > On Saturday 23 January 2010 10:50:30 you wrote:
> > > On Sat, Jan 23, 2010 at 09:13, Ruan Strydom <ruan at jcell.co.za> wrote:
> > > > Sorry I just realized that I had a couple of random incorrect
> > > > statements and questions thrown into one paragraph... here is a
> > > > re-factored version, please ignore the other.
> > >
> > > hehe, "refactored" text :D
> > >
> > > > I will try to do DTD first since HTML use it. Schema's have
> > > > inheritance and a more complex structure so I would have to keep it
> > > > in mind while doing it. Perhaps I can pass a couple of diagrams past
> > > > you guy's?
> > > >
> > > > Which leads to the question about the DUChain: are you certain that I
> > > > should not use it, as schema's follow a OO structure, there is
> > > > inheritance, also schema define 'enums', etc which is similar to code
> > > > (correction)?
> > >
> > > No, I was just saying you don't *have* to use it. If it fits your
> > > needs - use it. Maybe
> > > David can comment on this.
> > >
> > > > About implementing a tree like structure in the DUChain: it may go a
> > > > couple of levels deep, but I do not think that it would be majorly
> > > > excessive since humans type it?
> > >
> > > The problem I was thinking about was not storing a tree structure in
> > > DUChain (that
> > > should perfeclty work), it was the Outline Quick open that doesn't show
> > > a tree.
> > >
> > > > There is a lot of work that went into the DUChain and its
> > > > synchronization (correction), making my own will just open a whole
> > > > new can of worms. I suppose I could use aspects of the DUChain rather
> > > > than using all of it?
> > >
> > > You probably could create a list that contains:
> > > - what you need to store
> > > - when you need to reparse it
> > > - when you need to load it (ie. after kdevelop restart)
> > > - what items you need to find in the structure
> > > And then decide on the storage system to use.
> > >
> > > Niko
>
More information about the KDevelop-devel
mailing list