HTML XML DTD

Thu Mar 18 20:49:37 UTC 2010

On Thu, Mar 18, 2010 at 21:33, Ruan Strydom <ruan at jcell.co.za> wrote:
> I finally managed to get time to do some more hacking and write the
> lexer/parser for xml/html/dtd (one lexer and parser for all). (I really need
> to start documenting everything)
cool, great news!

> I'm not sure if my PG-QT grammar / structure is correct but it seems to work
> like I intended it too (initially anyway). ie: it parsed yahoo.com (1246 line
> html) and others with no visible problems and of coarse the unit tests.
code, I want to see the code :D :D

> DTD is a bit doggy still and not fully implemented, it works but on a limited
> number of predefined tests. Going to do more random complex testing now.
>
> The questions:
>
> 1) The parser only builds up an sequential array of elements (ast nodes),
> except for doctype where the definitions is contained inside the doctype ast
> node (in a dtd file it a an array again, the definitions is not within a
> doctype element). Does this sound correct?
> (ie <tag> <othertag/> </tag> <tag/> will be 4 ast nodes in the array not three
> and one with a child)
Usually it has to be a tree. I'd say like in an DOM implementation -
meaning two nodes with one child (the closing tag doesn't need it's
own node)
But that highly depends on  what you want to do with it...

> 2) I have 2 first/first and 1 first/last  conflict, but it appears to work
> fine. The PG-QT web page says that can be ignored in some cases?
Such conflicts should be eaxminded carefully and documented why they
can be ignored. Especially when injecting c++ code into the parser
conflicts can stay that must be ignored.

> 3) Can I commit it so long and can someone have a look at the grammar. I do
> not want to carry on if it is broken, since a lot of code (builders as I
> understand it) will depend on it.
post it or commit it and we will have a look...

Niko