The usage statistics [kactivities, baloo, ktp, plasma]
Ivan Čukić
ivan.cukic at kde.org
Mon Oct 13 16:59:17 UTC 2014
Hi all,
As promised, starting a discussion on how we can use the usage statistics
gathered by kactivitymanagerd (kamd in the rest of the text). And the design
of the API to cover the use-cases.
The point is to discuss all of this and put the summaries on the etherpad page
at https://notes.kde.org/p/KActivities_Usage_Statistics
1. Use-cases
=========
The main ideas I had while developing Lancelot (some overlap with those that
Eike and David have):
- Automatically deduced favourite applications for the users that didn't set
them up (not important whether they actually end up in the favourites section,
or are used just for sorting in krunner or something).
- The same as the above, but for documents (per-application, and global) or
contacts or ...
- Replacing the 'recent documents' with something more meaningful (kinda a
subset of the previous item)
- Tasks applet and launchers could show the list of important (or recent)
documents opened in a specific application.
- ** more advanced ** Deducing which things belong to each other based on the
fact they have been often used together and similar.
2. What is currently there
=================
(mostly copied from the mail I sent Eike some time ago)
- It supports tracking for open/close, focus-in/out, modified and accessed
events (from the API side, handled by KActivities::ResourceInstance class in a
pretty RAII manner :) )
- Every event has the activity in which it occurred (usedActivity field),
application that triggered the event (initiatingAgent) and the timestamps (and
the URL of the thing - targettedResource - a document, a contact, ...). The
names are a bit cumbersome, they are taken from the ontology that was designed
for this purpose. You can write Agent, Activity, Resource for the sake of
brevity.
- Apart from that, it also keeps the scores for the things.
Vishesh asked for the formula for the scoring - see appendix 1.
Applications that supported this in 4.x were (I'm probably missing a few):
Dolphin, Gwenview, Calligra (modulo Kexi), Okular, Kate, KWrite and Vim in
konsole. I have no idea whether the patches remained in Qt5 ports.
3. What will be needed
================
Integration with baloo. It will require patches on both sides if we are to
support all the use-cases without cross-queries. We will need accessible file
types via sqlite (on baloo side) and baloo identifiers or something on kamd
side.
One of the things that I think will be needed is some kind of additional
payload that the applications will be able to store alongside the resource
event. We'll see after we collect the use-cases.
4. Reading API
===========
This needs to be designed. I would not be surprised if the API ends up being
similar to baloo's querying system since it seems we will have quite a diverse
set of use-cases. Although, it should provide a proper live data model for the
results.
Appendix 1: Formula for the resource scoring:
===============================
LaTeX formatted:
S = \sum _{i = 1} ^ n
e^{-d_i} e^{k_i \log(l_i)}
Haskell-like formatted, whichever you find easier to read :)
sum [
exp (-di) * exp ( ki * log li ) | i <- [1..n]
]
where d_i is the time that passed since the i-th event, k_i coefficient
depending on the type of the event, l_i length of the event (time distance
between open and close for example, or focus in and out)
It can be rewritten to look prettier (exp log = id and so on), but this
conveys the meaning in a nicer way by separating the terms according to their
meaning.
The main ideas behind the formula are:
- score degrades with the time, so if a document was kept open in okular for
an hour yesterday, it will have a significantly higher score than a document
that was kept open for a whole day a year ago;
- different events have different meanings;
- event time interval is measured on a logarithmic scale, so that there is a
greater difference between 1hr and 2hrs, than between 11hrs and 12hrs;
- can be calculated quickly by only processing new events since the last
score update.
--
Cheerio,
Ivan
KDE, ivan.cukic at kde.org, http://ivan.fomentgroup.org/
gpg key id: 850B6F76, keyserver.pgp.com
More information about the Plasma-devel
mailing list