[Kde-pim] Another linklocator patch

Mon Aug 24 21:30:32 BST 2009

On Sunday 23 August 2009, Martin Koller wrote:
> Hi,
>
> even after fixing a previous bug in the linklocator, we have another
> issue with it: https://bugs.kde.org/show_bug.cgi?id=202445
>
> So I decided to modify the linklocator algorithm (after consulting
> RFC3986) to the following concept:
>
> linklocator does not really parse an URL (and it never did), so it's
> not important if the URL in fact is really a completely valid url -
> this will be handled in the target application then.
> What is important is that the URL is discovered, even if it is
> enclosed with some sort of brackets and even interrupted by line
> breaks or white space. RFC3986 explains this in appendix C
>
> So the modified code now handles these cases and it does not check
> for explicit allowed characters in all the different parts of a URL,
> which also has the advantage of fixing the mentioned bug. As said
> above: linklocator is not about checking the validity of the URL
> syntax.

I like this approach.

> Attached is the patch including a modified unit test.
>
> P.S.:line breaks in URLs are still not working in kmail with this
> modification, because the LinkLocator::convertToHtml() has to be
> changed to not remove the line breaks before my new code gets a
> chance to use it. I'll look into this in the next days, if time
> permits.

Please submit your next patch to Review Board. It's a pain to comment on 
a patch via mail.

One thing you should not do is
+    url.reserve( mText.length() - mPos );  // avoid allocs

If mText is very large (e.g. a message with a large inline patch on the 
Linux Kernel mailing list) then you will waste lots of memory just to 
save a few CPU cycles for allocations. If anything then restrict the 
size you reserve to a sensible value, e.g. 
+    url.reserve( qMin( 1024, mText.length() - mPos ) );  // avoid 
allocs

OTOH, why don't you do it as before, i.e. look for the end and then do a 
simple
-    url = mText.mid( start, mPos - start );

Ah, okay, you skip whitespace. Hmm. Okay. In this case please use the 
above reserve with fixed upper limit. I think even 128 would be enough 
as sensible maximal value for the initially reserved size of url.

Your code assumes that a URL that is started by some bracket or quote is 
closed by the corresponding closing bracket/quote. I'm not sure how 
safe this assumption is. Hmm, I guess it's safe enough. OTOH, 
apparently there is a maxUrlLen(). So I suggest to abort the URL 
parsing when url.length() exceeds maxUrlLen(). And probably maxUrlLen() 
could also be used as upper limit for the reserve call.

Regards,
Ingo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20090824/d47adcf3/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/