fuzzy-matching in quickopen...
Alexander Neundorf
neundorf at kde.org
Sun Sep 25 21:49:37 BST 2022
Hi,
On Samstag, 24. September 2022 00:06:34 CEST Waqar Ahmed wrote:
> I am against adding the old way, but if it's optional, ok sure as long as
> it is disabled by default.
>
> Your approach is completely incorrect though and the only reason I will say
> ok to the patch is because Christoph already said ok. We can and should
> improve the algorithm instead rather than just bringing back the old way on
> the first complaint.
Here are 3 examples (in the kate source tree) where the calculated score is
IMO not good:
I want to switch to "KateSearchCommand.cpp", which is already open.
filter "ese":
KateSearchCommand.cpp gets a score of 113
MultilineStartEndOfLineMatch.txt gets a higher score of 116, even though it
does not contain the string "ese", but only the "eS" and "E" with 4 characters
inbetween
I think a string which contains the filter exactly should get a higher score
than a string which "just" contains the characters.
filter "tes":
KateSearchCommand.cpp score gets a score of 118 and comes in place 23, i.e.
not visible without scrolling.
tests.qrc score gets a higher score of 159, probably because it starts with
"tes", but it is not open yet. There are about 20 files which start with
"test", they are all not open.
I often leave out the start of the filename, because often this is the same for
many files in a project (e.g. "kate" in kate, or "q" in Qt, or "algo" in some
other project), so I start typing with something in the middle of the filename.
So I'd suggest that the "is open" bonus should be bigger than the "starts
with" bonus.
Different example: I want to switch to "kfts_fuzzy_match.h"
filter "fts":
kfts_fuzzy_match.h gets a score of 100
filetree_model_test.cpp gets a higher score of 120. Again, I'd suggest that a
string which contains the filter string exactly should get a higher score than
a string which "just" contains the characters.
The following gives IMO better results:
bonus for "already open" = 15
if (matched) {
int sequentialBonus = 25;
int separatorBonus = 10; // bonus if match occurs after a separator
int camelBonus = 10; // bonus if match is uppercase and prev is lower
int firstLetterBonus = 10; // bonus if the first letter is matched
int leadingLetterPenalty = 0; // penalty applied for every letter in str
before the first match
int maxLeadingLetterPenalty = 0; // maximum penalty for leading letters
int unmatchedLetterPenalty = -1; // penalty for every letter that doesn't
matter
int nonBeginSequenceBonus = 20;
I'm not sure I understand this. Doesn't this mean that a long filename gets a
big bonus ?
// extra points if file exists in project root
// This gives priority to the files at the root
// of the project over others. This is important
// because otherwise getting to root files may
// not be that easy
if (!matchPath) {
score += (sm->idxToFilePath(sourceRow) == name) * name.size();
Alex
More information about the KWrite-Devel
mailing list