D26123: Port QRegExp to QRegularExpression

Ahmad Samir noreply at phabricator.kde.org
Sat Dec 21 12:46:28 GMT 2019


ahmadsamir added inline comments.

INLINE COMMENTS

> dfaure wrote in kemailaddress.cpp:631
> Message from Giuseppe D'Angelo  (author of QRegularExpression) : if you can depend from Qt 5.12, there's QRegularExpression::anchoredPattern instead of manual wrapping in \A and \z
> 
> We can't require 5.12 yet but this could be a TODO to simplify this code slightly at some point.

Yep, I added the Qt 5.12 TODO in some QRegExp porting diffs but not all (kind of got boring (lots of QRegExp::exactMatch()), but that's not an excuse to be lazy and make it harder for the next guy who modifies that code.... :)).

> dfaure wrote in kemailaddress.cpp:639
> `{,3}` would be `{0,3}`, not `{1,3}`
> 
> Would this allow to minimize the amount of changes made here? I'm worried about regressions.

From the unit test, this is to match literal domains, [123.123.123.123]; but the old code would match:
[.123.123.123]
[..123.123]
which seems wrong to me; as proven by this snippet in qtcreator (best way to test regex's is to just literally test them, that is more idiot-proof for me :)):

  const QString str = QStringLiteral("[1.1.1.1]");
  const QString str2 = QStringLiteral("[.1.1.1]");
  const QString str3 = QStringLiteral("[..1.1]");
  
  const QString pattern = QStringLiteral("\\[[0-9]{,3}(\\.[0-9]{,3}){3}\\]");
  QRegExp re(pattern);
  
  qDebug() << re.exactMatch(str); // true
  qDebug() << re.exactMatch(str2); // true
  qDebug() << re.exactMatch(str3); // true

So I dare say even in the old code it should have been {1,3}.

And I found that:

- {,3} in QRegExp is equivalent to {0,3}
- {,3} in QRegularExpression is equivalent to {1,3}

  const QString str = QStringLiteral("a");
  const QString oldPattern = QStringLiteral("\\d{,3}"); // QRegExp _is_ old
  QRegExp oldRe(oldPattern);
  qDebug() << (oldRe.indexIn(str) != -1); // true, because it's {0,3}
  
  const QString pattern = QStringLiteral("\\d{,3}");
  QRegularExpression re(pattern);
  qDebug() << re.match(str).hasMatch(); // false, because it's {1,3}

(^ candidate for QRegExp (its docs literally mention {,n}) to QRegularExpression porting notes in QRegularExpression docs, if you can pass that to Giuseppe D'Angelo :)).

> dfaure wrote in kemailaddress.cpp:639
> `{,3}` would be `{0,3}`, not `{1,3}`
> 
> Would this allow to minimize the amount of changes made here? I'm worried about regressions.

I added two more rows to the literal domain test code and found that, my code is wrong too:

  addrRx += QStringLiteral("\\[([0-9]{1,3}\\.){1,3}([0-9]{1,3})\\]");
  is wrong, it should be:
  addrRx += QStringLiteral("\\[([0-9]{1,3}\\.){3}([0-9]{1,3})\\]");
                                              ^^^

I'll upload a new diff.

> dfaure wrote in kemailaddress.cpp:639
> I'm confused, you still have `1,3` here, rather than `0,3`

I had replied to your comment but didn't "submit" my comments (stooopid, sorry).

> dfaure wrote in kemailaddress.cpp:1119
> Also from Giuseppe : the `\\x0080-\\xFFFF` requires changes to `\\x{0080}-\\x{FFFF}`

Thank him for me, and thank you for asking him to take a look at it :D; from https://perldoc.perl.org/perlre.html:

> Similarly, \xnn, where nn are hexadecimal digits, matches the character whose native ordinal is nn. Again, not using exactly two digits is a recipe for disaster, but you can use \x{...} to specify any number of hex digits.

so, disaster averted.

REPOSITORY
  R270 KCodecs

REVISION DETAIL
  https://phabricator.kde.org/D26123

To: ahmadsamir, #frameworks, dfaure, mlaurent, vkrause
Cc: kde-frameworks-devel, LeGast00n, GB_2, michaelh, ngraham, bruns
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20191221/719b45b9/attachment-0001.html>


More information about the Kde-frameworks-devel mailing list