Fuzzy searching against KDE repositories
Dimitris Kardarakos
dimkard at gmail.com
Sun May 6 16:04:24 BST 2018
Hello everyone!
Let me introduce you to a project that I am currently working on.
The scope of the project is to provide an easy way to search KDE code
and translations repository since I consider that such a kind of an
infrastructure would help possible newcomers to easily obtain valuable
information about the work of the community. For example:
- which projects exist
- which ones are the most active
- how developers describe their work on them
- find out the developers that currently work on them
To make the long story short, I was thinking that a google-like search
engine would facilitate onboarding of newcomers to KDE
So, I ended up to a solution that:
1. Fetches git and svn commit messages from the kde-commits mailing list
2. Parses each message and creates a json file that contains the below
information:
- commit subject
- commit message
- author
- project
- commit date
- isrevision (does a relative phabricator task exist?)
- istranslation (is it a translation commit?)
- fixesbug (whether the commit is bug-related)
The relative code can be found here
https://github.com/dimkard/kde-commits-solr
3. Loads the json recordset to an Apache Solr instance
4. On top of apache Solr, Banana (port of Kibana for Solr) has been
added. A custom searching panel has been created to provide fuzzy
searches against KDE repositories.
Moreover, it could also be useful for KDE writers/promoters to get a
clear view of the current development, either on code or translations,
the new features, the bug-fixing work, etc
To better illustrate the tool, let's simulate the creation of a post
like
https://pointieststick.wordpress.com/2018/04/29/this-week-in-usability-productivity-part-16-everything-else/
, leveraging the functionalities offered by this solution.
At first, the promoter wants to get more info and add references to
open/save dialog project improvements:
/Open/Save dialog project/
/The dialogs now display previews for the same assortment of file types
as Dolphin does (Alex Nemeth)/
/Grid Spacing in icons view has been tightened up to match Dolphin,
allowing more to be shown in the window (Alex Nemeth)/
/
/
In case that the writer remembers the name of the committer and knows
that a relative bug report does exist, the facet in the left will be
used and the relative time period will also be set (top-left):
https://framapic.org/z4PtCZxEul5K/L3tJZ8visR4I.png
https://framapic.org/DnveENis7bEa/BsKkske4RVPz.png
The records returned are:
https://framapic.org/8fv0crGCijf6/cPsbeZWO1CJH.png
so the commit in concern has been successfully found.
In case that no committer name is available, the writer may search for
sth like:
https://framapic.org/wYn063MH0jPY/4zEdV8ngIidS.png
Then following the search suggestion
https://framapic.org/owrsQQ2a5HVW/gWUiTNz7xWTB.png
the relative commit will be returned as top result:
https://framapic.org/bNDgq0cbR7J7/xVd3HJqQ40Kv.png
The same applies for the second search:
https://framapic.org/qpddu38zvRJF/ssQO4MBEWt6s.png
since the relative commit is returned as well:
https://framapic.org/g31g6x5mIOxR/5PNUInPMNxYw.png
Moreover, although this is not its primary role, the solution provides
some useful interactive visualization tools. For example, searching work
on projects like plasma-phone-components, plasma-settings, plasma-mobile
and kirigami, the tool would provide useful information regarding work
on Plasma Mobile. So, a relative promo article could be accompanied with
some useful statistics and references to real plasma mobile commits,
like this:
https://framapic.org/2RW8LlxCjYkh/LbkUnVQZTyZV.png
In the future, such a solution could be further extended indexing
bugzilla data as well. As a result, reports about possible duplicates
could be automatically generated and, why not, a fuzzy search engine
could be offered to the bug reporters enhancing the reporting
experience, avoiding duplicates and frustration about irrelevant results.
Nevertheless, there is a set of factors that should be considered as
well. At first, the amount of commits on a project is just an indicator
-among many others- of the activity of a project. A lot of work may
happen behind the scenes, in terms of communications, design, testing
etc, and this work may be committed as a single or a few commits. So,
considering all commits as equal is a trap. In addition, since the tool
measures the # of commits by each developer, we may think twice about
the implications of such a tool regarding the psychological effects on
the personality of contributors.
Do you think that such a tool could help KDE community? I look forward
to hearing your thoughts, since I am not still convinced if working on
this would really help the KDE ecosystem.
PS: We may look at other alternatives as regards to the technologies
involved. I’ve opted for the aforementioned since I have already worked
on them in the past.
PS1: If similar projects that I am not aware of currently exist in KDE
we may consider using them instead of this approach (or join efforts if
they are compatible). My intention is just to start a discussion about
how big data, indexing and fuzzy searching may improve onboarding and
"promotion" work.
Dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-community/attachments/20180506/cd3b8bdf/attachment.htm>
More information about the kde-community
mailing list