The usage statistics [kactivities, baloo, ktp, plasma]

Ivan Čukić ivan.cukic at kde.org
Mon Oct 13 16:59:17 UTC 2014


Hi all,

As promised, starting a discussion on how we can use the usage statistics 
gathered by kactivitymanagerd (kamd in the rest of the text). And the design 
of the API to cover the use-cases.

The point is to discuss all of this and put the summaries on the etherpad page 
at https://notes.kde.org/p/KActivities_Usage_Statistics


1. Use-cases
=========

The main ideas I had while developing Lancelot (some overlap with those that 
Eike and David have):

 - Automatically deduced favourite applications for the users that didn't set 
them up (not important whether they actually end up in the favourites section, 
or are used just for sorting in krunner or something).
 - The same as the above, but for documents (per-application, and global) or 
contacts or ...
 - Replacing the 'recent documents' with something more meaningful (kinda a 
subset of the previous item)
 - Tasks applet and launchers could show the list of important (or recent) 
documents opened in a specific application.
 - ** more advanced ** Deducing which things belong to each other based on the 
fact they have been often used together and similar.


2. What is currently there
=================

(mostly copied from the mail I sent Eike some time ago)

- It supports tracking for open/close, focus-in/out, modified and accessed 
events (from the API side, handled by KActivities::ResourceInstance class in a 
pretty RAII manner :) )
- Every event has the activity in which it occurred (usedActivity field), 
application that triggered the event (initiatingAgent) and the timestamps (and 
the URL of the thing - targettedResource - a document, a contact, ...). The 
names are a bit cumbersome, they are taken from the ontology that was designed 
for this purpose. You can write Agent, Activity, Resource for the sake of 
brevity.
- Apart from that, it also keeps the scores for the things.

Vishesh asked for the formula for the scoring - see appendix 1.

Applications that supported this in 4.x were (I'm probably missing a few): 
Dolphin, Gwenview, Calligra (modulo Kexi), Okular, Kate, KWrite and Vim in 
konsole. I have no idea whether the patches remained in Qt5 ports.


3. What will be needed
================

Integration with baloo. It will require patches on both sides if we are to 
support all the use-cases without cross-queries. We will need accessible file 
types via sqlite (on baloo side) and baloo identifiers or something on kamd 
side.

One of the things that I think will be needed is some kind of additional 
payload that the applications will be able to store alongside the resource 
event. We'll see after we collect the use-cases.


4. Reading API
===========

This needs to be designed. I would not be surprised if the API ends up being 
similar to baloo's querying system since it seems we will have quite a diverse 
set of use-cases. Although, it should provide a proper live data model for the 
results.


Appendix 1: Formula for the resource scoring:
===============================

LaTeX formatted:
    S = \sum _{i = 1} ^ n 
        e^{-d_i} e^{k_i \log(l_i)}

Haskell-like formatted, whichever you find easier to read :)
    sum [
        exp (-di) * exp ( ki * log li )     | i <- [1..n] 
    ]

where d_i is the time that passed since the i-th event, k_i coefficient 
depending on the type of the event, l_i length of the event (time distance 
between open and close for example, or focus in and out)

It can be rewritten to look prettier (exp log = id and so on), but this 
conveys the meaning in a nicer way by separating the terms according to their 
meaning.

The main ideas behind the formula are:
 - score degrades with the time, so if a document was kept open in okular for 
an hour yesterday, it will have a significantly higher score than a document 
that was kept open for a whole day a year ago;
 - different events have different meanings;
 - event time interval is measured on a logarithmic scale, so that there is a 
greater difference between 1hr and 2hrs, than between 11hrs and 12hrs;
 - can be calculated quickly by only processing new events since the last 
score update.


-- 
Cheerio,
Ivan


KDE, ivan.cukic at kde.org, http://ivan.fomentgroup.org/
gpg key id: 850B6F76, keyserver.pgp.com


More information about the Plasma-devel mailing list