HTML XML DTD

Ruan Strydom ruan at jcell.co.za
Thu Mar 18 20:33:35 UTC 2010


I finally managed to get time to do some more hacking and write the 
lexer/parser for xml/html/dtd (one lexer and parser for all). (I really need 
to start documenting everything)

I'm not sure if my PG-QT grammar / structure is correct but it seems to work 
like I intended it too (initially anyway). ie: it parsed yahoo.com (1246 line 
html) and others with no visible problems and of coarse the unit tests.

DTD is a bit doggy still and not fully implemented, it works but on a limited 
number of predefined tests. Going to do more random complex testing now.

The questions:

1) The parser only builds up an sequential array of elements (ast nodes), 
except for doctype where the definitions is contained inside the doctype ast 
node (in a dtd file it a an array again, the definitions is not within a 
doctype element). Does this sound correct? 
(ie <tag> <othertag/> </tag> <tag/> will be 4 ast nodes in the array not three 
and one with a child)

2) I have 2 first/first and 1 first/last  conflict, but it appears to work 
fine. The PG-QT web page says that can be ignored in some cases?

3) Can I commit it so long and can someone have a look at the grammar. I do 
not want to carry on if it is broken, since a lot of code (builders as I 
understand it) will depend on it.


Thanks allot.

Ruan







More information about the KDevelop-devel mailing list