Two html parsing patches

David Hyatt hyatt at apple.com
Wed Sep 24 18:41:48 CEST 2003


The first patch fixes more linked list stupidity on my part in the 
residual style code.  I patched one of these lists and sent that out to 
khtml-devel a while back, but I forgot to patch the other place in the 
parser as well.

Index: WebCore/khtml/html/htmlparser.cpp
===================================================================
RCS file: /local/home/cvs/Labyrinth/WebCore/khtml/html/htmlparser.cpp,v
retrieving revision 1.56
diff -u -p -r1.56 WebCore/khtml/html/htmlparser.cpp
--- WebCore/khtml/html/htmlparser.cpp	2003/09/03 20:43:34	1.56
+++ WebCore/khtml/html/htmlparser.cpp	2003/09/24 23:34:10
@@ -1353,12 +1353,9 @@ void KHTMLParser::handleResidualStyleClo
              // curr->id rather than the node that you should pop to 
when the element gets pulled off
              // the stack.
              popOneBlock(false);
-            curr->next = 0;
              curr->node = currNode;
-            if (!residualStyleStack)
-                residualStyleStack = curr;
-            else
-                residualStyleStack->next = curr;
+            curr->next = residualStyleStack;
+            residualStyleStack = curr;
          }
          else
              popOneBlock();


This second patch fixes the tokenizer to deal with the following 
malformed HTML:

<img src="foo"<img src="goo">

Other browsers treat that like two images and not one.  The patch below 
makes this work.

Index: WebCore/khtml/html/htmltokenizer.cpp
===================================================================
RCS file: 
/local/home/cvs/Labyrinth/WebCore/khtml/html/htmltokenizer.cpp,v
retrieving revision 1.41
diff -u -p -r1.41 WebCore/khtml/html/htmltokenizer.cpp
--- WebCore/khtml/html/htmltokenizer.cpp	2003/08/28 08:14:20	1.41
+++ WebCore/khtml/html/htmltokenizer.cpp	2003/09/24 23:34:10
@@ -975,7 +975,7 @@ void HTMLTokenizer::parseTag(DOMStringIt
              while(src.length()) {
                  curchar = *src;
                  if(curchar > ' ') {
-                    if(curchar == '>')
+                    if (curchar == '<' || curchar == '>')
                          tag = SearchEnd;
                      else if(atespace && (curchar == '\'' || curchar == 
'"'))
                      {
@@ -1215,7 +1215,7 @@ void HTMLTokenizer::parseTag(DOMStringIt
                  qDebug("SearchEnd");
  #endif
              while(src.length()) {
-                if(*src == '>')
+                if (*src == '>' || *src == '<')
                      break;

                  if (*src == '/')
@@ -1223,12 +1223,14 @@ void HTMLTokenizer::parseTag(DOMStringIt

                  ++src;
              }
-            if(!src.length() && *src != '>') break;
+            if (!src.length() && *src != '>' && *src != '<') break;

              searchCount = 0; // Stop looking for '<!--' sequence
              tag = NoTag;
              tquote = NoQuote;
-            ++src;
+
+            if (*src != '<')
+                ++src;

              if ( !currToken.id ) //stop if tag is unknown
                  return;



More information about the Khtml-devel mailing list