html / xml language support

Ruan Strydom ruan at jcell.co.za
Tue Jan 26 19:07:11 UTC 2010


> This solely depends on you. I'm not a really into XML, but isn't XSD just
>  XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
>  different, so probably needs it's own parser.

The reason for giving XSD its own lexer/parser is because its a language on 
its own, XML is just the medium:

  <xs:simpleType name="orderidtype">
    <xs:restriction base="xs:string">
      <xs:pattern value="[0-9]{6}"/>
    </xs:restriction>
  </xs:simpleType>


so the lexer will break it up similar to html and xml but the parser will have 
to handle simpleType with a special meaning, whereas <ns:order orderType="" /> 
does not. In this case the orderType attribute is given meaning by the 
simpleType declaration and by its inclusion.

I suppose its a bit like:

class Test {
};

Test t;

The class declaration is treated differently to its usage? Also CSS is treated 
differently to HTML.

Or am I wrong? This whole thing has me as confused as vomit in a tumble-dryer.

> 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
>  that supports this kinda stuff like <ul><li>listitem without closing
>  tag</ul>?

Yes that was the idea to an extent. For example "<ul><li>listitem" would be 
treated as "<ul/><li/>listitem".  I realize that this is not the SGML standard 
though? I will have to see, since this would be incorrect.

> What do you mean here? The internal Declarations used to represent the
> schemas? As far as I'm concerned, I'm pretty sure you'll have to create
>  your own Declaration's, e.g. take a look at
>  php/duchain/variabledeclaration.{h,cpp}

Yes that is what I meant. But the same Declarations etc. will be used by both 
DTD and XSD?

I am going to study now, I'll have a look at this tomorrow again. 30 and still 
studying :(

....
Ruan 

On Tuesday 26 January 2010 19:41:31 Milian Wolff wrote:
> On Tuesday, 26. January 2010 17:59:59 Ruan Strydom wrote:
> > I have done a bit more reading etc on the language support stuff, still
> > nowhere close to start coding yet... what I would like to say is: "YOU
> > GUYS ARRRNT HUMAN!!" ..... man what did I get myself into.....
> 
> I can remember feeling the same way when I started contributing to
> KDevelop/PHP, though I had the advantage of only having to improve an
>  existing solution, not coming up with a completely new language support
>  ;-) Please, don't get lost or disappointed!
> 
> > anyway I have a couple more questions:
> >
> > 1 I will need 3 token'izers and 3 parsers: (XSD), (DTD), (HTML/XML)?
> 
> This solely depends on you. I'm not a really into XML, but isn't XSD just
>  XML? I.e. shouldn't a XML parser be able to handle XSD? DTD otoh looks
>  different, so probably needs it's own parser.
> 
> But I think I'd personally try to leverage as much as possible. What does
> Webkit use? What does KHTML use? Can't you at least borrow their' lexer and
> try to port their (bison?) grammar to KDevelop-PG-Qt? We could help there
>  as well.
> 
> Also note: When you use KDevelop-PG-Qt, the tokenizer is created
>  automatically for you (afaik). But you need a lexer.
> 
> 2nd note: Do you intent to write a SGML compliant HTML parser? I.e. one
>  that supports this kinda stuff like <ul><li>listitem without closing
>  tag</ul>?
> 
> > 2 XSD and DTD will have to be done in KDevelop-PG-QT and I am not so sure
> > about HTML/XML?
> 
> As I said above: try to look around for existing solutions and take as much
>  as possible from them. HTML/XML is a "quite simple" format to parse, but
>  as soon as you get to stuff like the above mentioned valid SGML parts, it
>  might become tricky. I know that, I once wrote a HTML parser in PHP ;-)
> 
> Also what makes stuff kinda tricky is that in the editor, you should be as
> error-tolerant as possible, since while editing the document _will_
> incorporate invalid stuff...
> 
> > 3 The DTD and XSD builders will build the same DUChain structure?
> 
> What do you mean here? The internal Declarations used to represent the
> schemas? As far as I'm concerned, I'm pretty sure you'll have to create
>  your own Declaration's, e.g. take a look at
>  php/duchain/variabledeclaration.{h,cpp}
> 
> You can create internal datastructures just the way you want them and it
>  might be that the stuff that fits a DTD also fits a XSD element, but
>  that's nothing I know for sure (since I'm a layman when it comes to these
>  topics).
> 
> > Am I more or less going in the right direction here?
> 
> Well yes I think so. I really have to try out the XML plugin of yours these
> coming days and give you maybe some feedback. It might also be possible
>  that we could improve the XML plugin first so we can show you what you
>  could use where. Then maybe you are acquainted enough with the KDevelop
>  stuff so that you can write the other stuff on your own... But I don't
>  have the time so far...
> 
> > And I am not sure about the rest, haven't gotten that far? I suppose when
> >  the users types the HTML and XML I will iterate over the top-context of
> >  the imported XSD/DTD to do validations etc? Haven't thought about it...
> 
> Well you'd probably generate an AST from the XML or HTML document, iterate
> over the Tag's and see whether they are valid:
> 
> - in a parent tag that allows this child
> - all required attributes
> - only valid attributes
> - attributes have correct values (where appropriate)
> - tag is properly closed
> 
> > I am glad I started working on the XML catalog though cause it will be
> >  used.
> >
> > Thanks
> >
> > On Saturday 23 January 2010 10:50:30 you wrote:
> > > On Sat, Jan 23, 2010 at 09:13, Ruan Strydom <ruan at jcell.co.za> wrote:
> > > > Sorry I just realized that I had a couple of random incorrect
> > > > statements and questions thrown into one paragraph... here is a
> > > > re-factored version, please ignore the other.
> > >
> > > hehe, "refactored" text :D
> > >
> > > > I will try to do DTD first since HTML use it. Schema's have
> > > > inheritance and a more complex structure so I would have to keep it
> > > > in mind while doing it. Perhaps I can pass a couple of diagrams past
> > > > you guy's?
> > > >
> > > > Which leads to the question about the DUChain: are you certain that I
> > > > should not use it, as  schema's follow a OO structure, there is
> > > > inheritance, also schema define 'enums', etc which is similar to code
> > > > (correction)?
> > >
> > > No, I was just saying you don't *have* to use it. If it fits your
> > > needs - use it. Maybe
> > > David can comment on this.
> > >
> > > > About implementing a tree like structure in the DUChain: it may go a
> > > > couple of levels deep, but I do not think that it would be majorly
> > > > excessive since humans type it?
> > >
> > > The problem I was thinking about was not storing a tree structure in
> > > DUChain (that
> > > should perfeclty work), it was the Outline Quick open that doesn't show
> > > a tree.
> > >
> > > > There is a lot of work that went into the DUChain and its
> > > > synchronization (correction), making my own will just open a whole
> > > > new can of worms. I suppose I could use aspects of the DUChain rather
> > > > than using all of it?
> > >
> > > You probably could create a list that contains:
> > > - what you need to store
> > > - when you need to reparse it
> > > - when you need to load it (ie. after kdevelop restart)
> > > - what items you need to find in the structure
> > > And then decide on the storage system to use.
> > >
> > > Niko
> 




More information about the KDevelop-devel mailing list