fuzzy-matching in quickopen...

Alexander Neundorf neundorf at kde.org
Sun Sep 25 21:49:37 BST 2022


Hi,

On Samstag, 24. September 2022 00:06:34 CEST Waqar Ahmed wrote:
> I am against adding the old way, but if it's optional, ok sure as long as
> it is disabled by default.
> 
> Your approach is completely incorrect though and the only reason I will say
> ok to the patch is because Christoph already said ok. We can and should
> improve the algorithm instead rather than just bringing back the old way on
> the first complaint.

Here are 3 examples (in the kate source tree) where the calculated score is 
IMO not good:

I want to switch to "KateSearchCommand.cpp", which is already open.
filter "ese":
KateSearchCommand.cpp gets a score of 113
MultilineStartEndOfLineMatch.txt gets a higher score of 116, even though it 
does not contain the string "ese", but only the "eS" and "E" with 4 characters 
inbetween
I think a string which contains the filter exactly should get a higher score 
than a string which "just" contains the characters.


filter "tes":
KateSearchCommand.cpp score gets a score of 118 and comes in place 23, i.e. 
not visible without scrolling.
tests.qrc score gets a higher score of 159, probably because it starts with 
"tes", but it is not open yet. There are about 20 files which start with 
"test", they are all not open.
I often leave out the start of the filename, because often this is the same for 
many files in a project (e.g. "kate" in kate, or "q" in Qt, or "algo" in some 
other project), so I start typing with something in the middle of the filename.
So I'd suggest that the "is open" bonus should be bigger than the "starts 
with" bonus.

Different example: I want to switch to "kfts_fuzzy_match.h"
filter "fts":
kfts_fuzzy_match.h gets a score of 100
filetree_model_test.cpp gets a higher score of 120. Again, I'd suggest that a 
string which contains the filter string exactly should get a higher score than 
a string which "just" contains the characters.

The following gives IMO better results:

bonus for "already open" = 15

if (matched) {
   int sequentialBonus = 25;
   int separatorBonus = 10; // bonus if match occurs after a separator
   int camelBonus = 10; // bonus if match is uppercase and prev is lower
   int firstLetterBonus = 10; // bonus if the first letter is matched
   int leadingLetterPenalty = 0; // penalty applied for every letter in str 
before the first match
   int maxLeadingLetterPenalty = 0; // maximum penalty for leading letters
   int unmatchedLetterPenalty = -1; // penalty for every letter that doesn't 
matter
   int nonBeginSequenceBonus = 20;


I'm not sure I understand this. Doesn't this mean that a long filename gets a 
big bonus ? 
            // extra points if file exists in project root
            // This gives priority to the files at the root
            // of the project over others. This is important
            // because otherwise getting to root files may
            // not be that easy
            if (!matchPath) {
                score += (sm->idxToFilePath(sourceRow) == name) * name.size();


Alex





More information about the KWrite-Devel mailing list