Two html parsing patches
David Hyatt
hyatt at apple.com
Wed Sep 24 18:41:48 CEST 2003
The first patch fixes more linked list stupidity on my part in the
residual style code. I patched one of these lists and sent that out to
khtml-devel a while back, but I forgot to patch the other place in the
parser as well.
Index: WebCore/khtml/html/htmlparser.cpp
===================================================================
RCS file: /local/home/cvs/Labyrinth/WebCore/khtml/html/htmlparser.cpp,v
retrieving revision 1.56
diff -u -p -r1.56 WebCore/khtml/html/htmlparser.cpp
--- WebCore/khtml/html/htmlparser.cpp 2003/09/03 20:43:34 1.56
+++ WebCore/khtml/html/htmlparser.cpp 2003/09/24 23:34:10
@@ -1353,12 +1353,9 @@ void KHTMLParser::handleResidualStyleClo
// curr->id rather than the node that you should pop to
when the element gets pulled off
// the stack.
popOneBlock(false);
- curr->next = 0;
curr->node = currNode;
- if (!residualStyleStack)
- residualStyleStack = curr;
- else
- residualStyleStack->next = curr;
+ curr->next = residualStyleStack;
+ residualStyleStack = curr;
}
else
popOneBlock();
This second patch fixes the tokenizer to deal with the following
malformed HTML:
<img src="foo"<img src="goo">
Other browsers treat that like two images and not one. The patch below
makes this work.
Index: WebCore/khtml/html/htmltokenizer.cpp
===================================================================
RCS file:
/local/home/cvs/Labyrinth/WebCore/khtml/html/htmltokenizer.cpp,v
retrieving revision 1.41
diff -u -p -r1.41 WebCore/khtml/html/htmltokenizer.cpp
--- WebCore/khtml/html/htmltokenizer.cpp 2003/08/28 08:14:20 1.41
+++ WebCore/khtml/html/htmltokenizer.cpp 2003/09/24 23:34:10
@@ -975,7 +975,7 @@ void HTMLTokenizer::parseTag(DOMStringIt
while(src.length()) {
curchar = *src;
if(curchar > ' ') {
- if(curchar == '>')
+ if (curchar == '<' || curchar == '>')
tag = SearchEnd;
else if(atespace && (curchar == '\'' || curchar ==
'"'))
{
@@ -1215,7 +1215,7 @@ void HTMLTokenizer::parseTag(DOMStringIt
qDebug("SearchEnd");
#endif
while(src.length()) {
- if(*src == '>')
+ if (*src == '>' || *src == '<')
break;
if (*src == '/')
@@ -1223,12 +1223,14 @@ void HTMLTokenizer::parseTag(DOMStringIt
++src;
}
- if(!src.length() && *src != '>') break;
+ if (!src.length() && *src != '>' && *src != '<') break;
searchCount = 0; // Stop looking for '<!--' sequence
tag = NoTag;
tquote = NoQuote;
- ++src;
+
+ if (*src != '<')
+ ++src;
if ( !currToken.id ) //stop if tag is unknown
return;
More information about the Khtml-devel
mailing list