[Nepomuk] Review Request 109991: Regexp cache optimization in Nepomuk fileindexer.

Lukasz Olender lukiasz at gmail.com
Mon Apr 22 22:18:46 UTC 2013



> On April 17, 2013, 1:50 a.m., Simeon Bird wrote:
> > common/regexpcache.cpp, line 199
> > <http://git.reviewboard.kde.org/r/109991/diff/1/?file=138503#file138503line199>
> >
> >     Why don't you use QString::split?
> 
> Lukasz Olender wrote:
>     I've added comment.
> 
> Simeon Bird wrote:
>     Right...but I think you should use this function:
>     
>     QStringList QString::split ( const QRegExp & rx, SplitBehavior behavior = KeepEmptyParts ) const
>     
>     with a regex that means "split on any character NOT []\" 
>     (I can't remember offhand what that looks like in regex, but you can look it up).

I'm not sure it's possible. Let's think just about "\x" rule, where 'x' is one of escaped characters. Split must be done after every char, except situation when last char was '\'. Also, split must be done between characters, not on some of them, because it will eat all characters we split on. This involves RegExp lookbehind rule and it's not supported by QRegExp class. Similar situation with "[...]". Here is few examples.

QString str = "nep\\omuk";
QStringList list;
list = str.split(QRegExp(""), QString::SkipEmptyParts);
// list: [ "n", "e", "p", "\", "o", "m", "u", "k" ]

list = str.split(QRegExp("."), QString::SkipEmptyParts);
// list: empty

list = str.split(QRegExp("[^o]"), QString::SkipEmptyParts);
// list: [ "o" ]

list = str.split(QRegExp(".{0}(?!\\\\)"), QString::SkipEmptyParts);
// list: [ "n", "e", "p\", "o", "m", "u", "k" ]

I might be wrong, maybe there is an another way to do it. If so, I'd be happy to solve it in more elegant way. AFAIK in Qt5 we have QRegularExpression with lookbehind included.


On April 17, 2013, 1:50 a.m., Lukasz Olender wrote:
> > Thanks for the patch! Please could you add a few comments on the design of the optimised regexp, why it is faster, how it works, and so on? It really helps us to review it faster. 
> > 
> > Also, could you please add the test cases you link to the nepomuk tests directory? If you like, submit a separate patch adding them, as a test for regexp speed and correctness would be useful in itself. 
> >
> 
> Lukasz Olender wrote:
>     Yes. I'll make an another patch when review of this will be finished. Thanks for response!
> 
> Simeon Bird wrote:
>     You're welcome - actually though it would be really good if you could post the testcase patch now - I wanted to run the test for review of this patch.

To create tests I'll need to add some minor changes into current code (I'll add a method selection in constructor). There is also a need for a number of filenames to show performance gain or just corectness of this solution. Should I put a text file with filenames into new patch? Or maybe code which will get filenames directly from user's home directory?

If you're interested in checking solution, please go tohttp://www.sendspace.com/file/mkihdp, I posted sources of simple application and filenames I've created to test it. Everything is ready for looking (except there is an old version of RegExpCache class without your reviews, but it's also working).


- Lukasz


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/109991/#review31192
-----------------------------------------------------------


On April 18, 2013, 4:27 p.m., Lukasz Olender wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/109991/
> -----------------------------------------------------------
> 
> (Updated April 18, 2013, 4:27 p.m.)
> 
> 
> Review request for Nepomuk and Vishesh Handa.
> 
> 
> Description
> -------
> 
> It's related with https://bugs.kde.org/show_bug.cgi?id=303654. 
> P.S. I accidentally deleted author's and license info in patch. Isolated performance tests are also uploaded to http://www.sendspace.com/file/mkihdp (previous link not always work). It's my first patch.
> 
> 
> This addresses bug 303654.
>     http://bugs.kde.org/show_bug.cgi?id=303654
> 
> 
> Diffs
> -----
> 
>   common/regexpcache.h d89f968 
>   common/regexpcache.cpp df45277 
> 
> Diff: http://git.reviewboard.kde.org/r/109991/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Lukasz Olender
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130422/fdb78706/attachment.html>


More information about the Nepomuk mailing list