Java parser update

Jakob Petsovits jpetso at gmx.at
Tue Jan 3 01:53:05 UTC 2006


Hi list,

I'm nearly done with the kdevelop-pg Java parser.
The grammar seems to have few or no bugs left, and I could solve most
of the other difficulties (worked around the lack of LL(k) with custom 
lookahead functions, fixed the bugs in kdevelop-pg).

As a result, real Java files validate properly with the parser.
I still have to collect test cases to ensure wide applicability, but for now 
it seems that parsing works really well. As (f)lex don't handle Unicode,
16-bit characters in identifier names or character escapes are not
possible at the moment, but at least they get accepted in normal strings.
(Has anybody a good idea for a drop-in flex replacement?)

Approaching total correctness, I would be interested in how to proceed when 
the parser is done. What do I have to do to build the class store on the 
parser, and in which order? I would appreciate some hints here, as we
can't be too far away from having code completion for Java now, right?


kdevelop-pg in Subversion still lacks two patches that I don't want to commit 
without approval (and Roberto seems to be away at the moment).
I attach them for completeness and review:

Patch 1, kdev-pg-first-fix.diff:
Solves the last known bug that keeps kdevelop-pg from producing correct 
parsing code. Without this or a similar patch, the Java parser doesn't work 
correctly. The idea of the fix is correct and working, but I'm not sure if 
it's the most elegant implementation for it. Short explanation:

For a supposed rule (X | 0) Y, kdevelop-pg currently produces the
FIRST set {0,X,Y}, which is simply a combination of the set from
the alternation item {0,X} and the Y item {Y}. This is wrong, because
common parser generator knowledge says that epsilon (0) is only kept
in the FIRST set if all items can derive to epsilon. Which is not the case
if Y is required in any case, like in this example.
The current kdevelop-pg behaviour causes troubles, so we must get
epsilon out of the FIRST set if a definitely required item follows.

There are two possible approaches:
1) Deleting epsilon from the FIRST set after we know that the right side (Y) 
can't derive to epsilon. I think this should be less optimal than
2) Preventing that epsilon is added to the FIRST set at all.

This patch tries to accomplish 2. by introducing a zero (=epsilon) block.
When activated, it doesn't add zeros to any FIRST set.
It is used in visit_cons, where the sequence item (holding a left side,
e.g. (X | 0), and a right side, e.g. Y) is processed. The patch causes
the right side to be processed first, which can then be checked if it can 
derive to epsilon. This information is used to turn the zero block on or off:
If the right side doesn't derive to epsilon, no zeros are added to the set.


Patch 2, kdev-pg-omit-some-warnings.diff:
Suppresses most unnecessary warnings when generating the parser.
It affects rules like (IF statement THEN statement (ELSE statement | 0)),
where the dangling "else" would otherwise cause a first/follow conflict
warning (saying that the "else" could also occur in a higher-level "if"
at the same place).

Throughout the whole Java grammar, I didn't encounter one single rule where 
greedy behaviour was not wanted. This should be similar in most (all?) other 
languages, so I think these warnings unnecessarily confuse the grammar author 
and make it harder to sort out the really dangerous ones.
This patch doesn't change the behaviour of the generated parser in any way.


Happy new year (one day late, but anyways),
  Jakob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kdev-pg-first-fix.diff
Type: text/x-diff
Size: 2165 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20060103/e6020f35/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kdev-pg-omit-some-warnings.diff
Type: text/x-diff
Size: 368 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20060103/e6020f35/attachment-0001.bin>


More information about the KDevelop-devel mailing list