D26123: Port QRegExp to QRegularExpression
Ahmad Samir
noreply at phabricator.kde.org
Sat Dec 21 12:46:28 GMT 2019
ahmadsamir added inline comments.
INLINE COMMENTS
> dfaure wrote in kemailaddress.cpp:631
> Message from Giuseppe D'Angelo (author of QRegularExpression) : if you can depend from Qt 5.12, there's QRegularExpression::anchoredPattern instead of manual wrapping in \A and \z
>
> We can't require 5.12 yet but this could be a TODO to simplify this code slightly at some point.
Yep, I added the Qt 5.12 TODO in some QRegExp porting diffs but not all (kind of got boring (lots of QRegExp::exactMatch()), but that's not an excuse to be lazy and make it harder for the next guy who modifies that code.... :)).
> dfaure wrote in kemailaddress.cpp:639
> `{,3}` would be `{0,3}`, not `{1,3}`
>
> Would this allow to minimize the amount of changes made here? I'm worried about regressions.
From the unit test, this is to match literal domains, [123.123.123.123]; but the old code would match:
[.123.123.123]
[..123.123]
which seems wrong to me; as proven by this snippet in qtcreator (best way to test regex's is to just literally test them, that is more idiot-proof for me :)):
const QString str = QStringLiteral("[1.1.1.1]");
const QString str2 = QStringLiteral("[.1.1.1]");
const QString str3 = QStringLiteral("[..1.1]");
const QString pattern = QStringLiteral("\\[[0-9]{,3}(\\.[0-9]{,3}){3}\\]");
QRegExp re(pattern);
qDebug() << re.exactMatch(str); // true
qDebug() << re.exactMatch(str2); // true
qDebug() << re.exactMatch(str3); // true
So I dare say even in the old code it should have been {1,3}.
And I found that:
- {,3} in QRegExp is equivalent to {0,3}
- {,3} in QRegularExpression is equivalent to {1,3}
const QString str = QStringLiteral("a");
const QString oldPattern = QStringLiteral("\\d{,3}"); // QRegExp _is_ old
QRegExp oldRe(oldPattern);
qDebug() << (oldRe.indexIn(str) != -1); // true, because it's {0,3}
const QString pattern = QStringLiteral("\\d{,3}");
QRegularExpression re(pattern);
qDebug() << re.match(str).hasMatch(); // false, because it's {1,3}
(^ candidate for QRegExp (its docs literally mention {,n}) to QRegularExpression porting notes in QRegularExpression docs, if you can pass that to Giuseppe D'Angelo :)).
> dfaure wrote in kemailaddress.cpp:639
> `{,3}` would be `{0,3}`, not `{1,3}`
>
> Would this allow to minimize the amount of changes made here? I'm worried about regressions.
I added two more rows to the literal domain test code and found that, my code is wrong too:
addrRx += QStringLiteral("\\[([0-9]{1,3}\\.){1,3}([0-9]{1,3})\\]");
is wrong, it should be:
addrRx += QStringLiteral("\\[([0-9]{1,3}\\.){3}([0-9]{1,3})\\]");
^^^
I'll upload a new diff.
> dfaure wrote in kemailaddress.cpp:639
> I'm confused, you still have `1,3` here, rather than `0,3`
I had replied to your comment but didn't "submit" my comments (stooopid, sorry).
> dfaure wrote in kemailaddress.cpp:1119
> Also from Giuseppe : the `\\x0080-\\xFFFF` requires changes to `\\x{0080}-\\x{FFFF}`
Thank him for me, and thank you for asking him to take a look at it :D; from https://perldoc.perl.org/perlre.html:
> Similarly, \xnn, where nn are hexadecimal digits, matches the character whose native ordinal is nn. Again, not using exactly two digits is a recipe for disaster, but you can use \x{...} to specify any number of hex digits.
so, disaster averted.
REPOSITORY
R270 KCodecs
REVISION DETAIL
https://phabricator.kde.org/D26123
To: ahmadsamir, #frameworks, dfaure, mlaurent, vkrause
Cc: kde-frameworks-devel, LeGast00n, GB_2, michaelh, ngraham, bruns
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20191221/719b45b9/attachment-0001.html>
More information about the Kde-frameworks-devel
mailing list