Season of KDE 2015: Proposal

Boudhayan Gupta bgupta at kde.org
Sat Oct 24 04:47:32 UTC 2015


Hi Priya,

Welcome to the KDE Community!

You seem to be a little out-of-touch with the current Baloo
architecture, as is evident from your project proposal.

Baloo does not do semantic search anymore. In fact, the KDE desktop
does not have a semantic search component at all. Baloo internally is
just a metadata search and indexing solution, and a pretty kick-ass
one at that. We use a key-value database to store data, not a RDF
graph database (as Nepomuk used to do). For more information, read
this: http://vhanda.in/blog/2015/03/the-semantic-desktop-is-dead/

In fact, read the other posts at Vishesh's blog, as you'll understand
where Baloo comes from and where it's going. Vishesh is Baloo's
current maintainer.

Now let's address your specific concerns:

1 - 100% CPU usage - these are bugs (getting stuck on a file,
libraries and extractors misbehaving etc), and we need all the help we
can get in rooting them out.

2 - Extend the API - You'll have to understand that KDE Frameworks 5
has strict rules about maintaining ABI as well as API compatibility
and we don't muck around with the API unless there are strict use
cases. This doesn't mean a *NO* at all, it just means you need to
demonstrate that there's going to be a non-trivial use case for the
extensions to the API.

3 - Battery drain when indexing - Baloo recently switched to a
two-pass indexing algorithm, where the "Basic Indexing" pass always
occurs and collects basic metadata about the file, just enough to
search. Deeper indexing occurs only when connected to AC. I'm not sure
if the indexing is still done in batches of 40 (probably not) - but
this has had a positive impact on performance as well as adding
granularity to indexing control. Are you running the latest KF5 based
version of Baloo?

Of course, we'd still love for you to be a part of the KDE Community.
So here's what you can do:

1. Baloo does need work and optimisation (a lot of it, in fact), so
explore where the problems lie and figure out how to solve/improve
them, and then revise the specifics of this proposal.

2. The original project (Robust plugin infrastructure for
KFileMetaData) involves setting up support for writing metadata
extractors in languages other than C++, as well as metadata writers.
This will be an exciting project, so if you want to work on it, you
should take a look at the KFileMetaData code. Be aware though, there's
a patch pending to add support for non C++ extractor plugins
(https://git.reviewboard.kde.org/r/125762/), so if that patch is
accepted this project could reduce to just adding support for metadata
writers. This in itself is a significant task though (there'll be new
API that has to be created), and perfect for someone getting started
on working on major C++ projects.

We love the enthusiasm in your proposal though, and we're looking
forward to having you work with us over the summer!

Yours,
Boudhayan Gupta

On 22 October 2015 at 23:49, Priya Satbhaya <priyasatbhaya64 at gmail.com> wrote:
>
> Robust plugin infrastructure for KFileMetaData
>
> - Priya Satbhaya
>
> Abstract: Metadata is a set of data that describes and gives information
> about other data. KFileMetaData is a library used for extracting text and
> metadata from a number of different files. And Baloo is a framework which
> use semantic search for file searching, indexing and managing metadata.
>
> Name: Priya Satbhaya
> Email Address: priyasatbhaya64 at gmail.com
> Freenode IRC Nick: pri
> IM Service and Username: freenode:pri
> Location: Durgapur, India, UTC+5:30
>
> Proposal Title: Robust plugin infrastructure for KFileMetaData
>
> Motivation for Proposal / Goal:
> The main motivation for this proposal came from Baloo. As  'Baloo' is a
> file indexing and file search framework  which replaced Nepomuk. Yes it is
> better than Nepomuk in many ways but the main concern with Baloo is it slows
> down the system, so i feel if i can contribute towards its search
> optimisation.
>
> Targets:
>
> 1. Baloo eats up 100% CPU time, so increase the optimisation.
> 2.In Krunner,increase the number of search results to be displayed.
> 3.In Baloo extend the indexing for removable media also.
> 4.Create a more user-friendly and interactive user interface.
>
> Details:
>
> 1.Semantic search optimisation of Baloo:
> It indexes files in batches of about 40, it has to find the problematic file
> by indexing that bunch in parts: first half/second half, index problematic
> half in pieces again, until the file is found. This can take up to 30
> minutes of heavy CPU usage. Unfortunately, while Baloo will not start to
> index a new batch of 40 files while on battery power, it continues to
> determine the broken file while on battery. This behavior has been fixed in
> in KDE Applications 4.13.1 (it will stop indexing immediately when the power
> cord is unplugged) and the time the search for each file can take has been
> reduced to about 10 minutes.
>        We can improve it further for larger and difficult files by changing
> the entire search process and proposing different graph models.
>
> Baloo search is integrated in KRunner and Dolphin.
>
> 2. Krunner number of results displayed:
> The number or results are limited. There is no paging support for results:
> runner plugins simply return their whole data set and we simply hope that
> they don’t return too many. In fact, the current Krunner UI just drops
> everything after the first 50 (this because of QGraphicsView and no model).
> But if we implement Krunner with Nepomuk search results will be more
> specific by assigning a global shortcut to it.
>
> 3.Improve the API
>         So that we can not only query plugins but additional information
> about them as well.
>
>
> Other obligations from December to February
>
> The winter vacation for students in my college will be starting from the 4th
> of December. So I would be able to give all of my time on coding for this
> project, from after that date. And since I will be staying at home the
> entire time and have got neither any travelling plans nor any other
> endeavors for the winter, so I will be able to give more than 40 hours of my
> time per week for SoK. However my next semester will be starting from the
> 1st of January 2016. But still, I'm confident that even then I will be able
> to devote 5-7 hours of my time for the project daily ( 5 hours on weekdays
> from 7pm to 12:30 am with a half-an-hour dinner-break, 8 hours on Saturdays,
> and 7 hours on Sundays). Hence I can confidently say that I can put in all
> of my time from December to February for SoK in this winter.
>
>
> About Me:
> I am an engineering student, currently studying in the 2rd year in the
> National Institute of Technology, Durgapur, India pursuing a B.Tech in
> Information Technology. I have been enthusiastic about FOSS ever since. I
> was introduced to open source when I came to know about the free annual
> online training program by Kushal Das which happens in the #dgplug channel
> on IRC every summer. I am an active preacher of open source in my college,
> and take part in almost every event related to FOSS in general.I hope to
> always keep contributing to this wonderful virtual globe I will love to use
> along with the other KDE softwares, in the years to come.
>
> My Work Experience:
> 1. I am working on a project in semantic web, titled "New graph model for
> optimised query processing in semantic web environment" under Prof. Animesh
> Dutta and Prof.Biswanath Dutta along with PhD scholars in our college.
>
> 2. I have completed virtual training course on "Ethical Hacking" in
> Internshala.
> https://drive.google.com/file/d/0B92Bd31UZMqLNGpRdUo1Y2VIcE0/view?usp=sharing
>
> 3. This summer i have attended a month long training program at "Microsoft
> Technology Associate" for the course of "Core Java" and i also made a
> project "Exam Suite" on netbeans platform under Rajan Chetri.There i have
> also cleared "Database Fundamentals" exam under "Microsoft Certification".
> https://drive.google.com/file/d/0B92Bd31UZMqLSkxXS3B2RUViQmM/view?usp=sharing
>
>
>>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe
>>> <<
>


More information about the Kde-soc mailing list