The usage statistics [kactivities, baloo, ktp, plasma]

David Edmundson david at davidedmundson.co.uk
Mon Oct 20 10:02:34 UTC 2014


On Mon, Oct 13, 2014 at 6:59 PM, Ivan Čukić <ivan.cukic at kde.org> wrote:

> Hi all,
>
> As promised, starting a discussion on how we can use the usage statistics
> gathered by kactivitymanagerd (kamd in the rest of the text). And the
> design
> of the API to cover the use-cases.
>
> The point is to discuss all of this and put the summaries on the etherpad
> page
> at https://notes.kde.org/p/KActivities_Usage_Statistics
>
>
> 1. Use-cases
> =========
>
> The main ideas I had while developing Lancelot (some overlap with those
> that
> Eike and David have):
>
>  - Automatically deduced favourite applications for the users that didn't
> set
> them up (not important whether they actually end up in the favourites
> section,
> or are used just for sorting in krunner or something).
>  - The same as the above, but for documents (per-application, and global)
> or
> contacts or ...

 - Replacing the 'recent documents' with something more meaningful (kinda a
> subset of the previous item)
>  - Tasks applet and launchers could show the list of important (or recent)
> documents opened in a specific application.
>

One use case that we might want to consider is including email frequency
when sorting contacts.

What makes this very different from the rest is that the length of event
doesn't really apply. I tend to write an email then fill in the address
last, so kmail couldn't give accurate stats if it tried. and the length of
time I spent typing might not have any impact.

I'm a bit skeptical of the time tracking rather than usage tracking in
general, if a user opens something 10 times and closes that quickly, it's
more important to list that in favourites than something a user only opens
once and leaves open. We might need to have an API that doesn't pass a wID
and just inserts an arbitrary time and/or a way to not put any weight on
the time interval in the querying API.


>  - ** more advanced ** Deducing which things belong to each other based on
> the
> fact they have been often used together and similar.
>
>



>
> 2. What is currently there
> =================
>
> (mostly copied from the mail I sent Eike some time ago)
>
> - It supports tracking for open/close, focus-in/out, modified and accessed
> events (from the API side, handled by KActivities::ResourceInstance class
> in a
> pretty RAII manner :) )
> - Every event has the activity in which it occurred (usedActivity field),
> application that triggered the event (initiatingAgent) and the timestamps
> (and
> the URL of the thing - targettedResource - a document, a contact, ...). The
> names are a bit cumbersome, they are taken from the ontology that was
> designed
> for this purpose. You can write Agent, Activity, Resource for the sake of
> brevity.
> - Apart from that, it also keeps the scores for the things.
>
> Vishesh asked for the formula for the scoring - see appendix 1.
>
> Applications that supported this in 4.x were (I'm probably missing a few):
> Dolphin, Gwenview, Calligra (modulo Kexi), Okular, Kate, KWrite and Vim in
> konsole. I have no idea whether the patches remained in Qt5 ports.
>
> Gwenview code remained. Though it's purely logging and not using any of it.


>
> 3. What will be needed
> ================
>
> Integration with baloo. It will require patches on both sides if we are to
> support all the use-cases without cross-queries. We will need accessible
> file
> types via sqlite (on baloo side) and baloo identifiers or something on kamd
> side.
>
> One of the things that I think will be needed is some kind of additional
> payload that the applications will be able to store alongside the resource
> event. We'll see after we collect the use-cases.
>
>
> 4. Reading API
> ===========
>
> This needs to be designed. I would not be surprised if the API ends up
> being
> similar to baloo's querying system since it seems we will have quite a
> diverse
> set of use-cases. Although, it should provide a proper live data model for
> the
> results.
>
>
> Appendix 1: Formula for the resource scoring:
> ===============================


> LaTeX formatted:
>     S = \sum _{i = 1} ^ n
>         e^{-d_i} e^{k_i \log(l_i)}
>
> Haskell-like formatted, whichever you find easier to read :)
>     sum [
>         exp (-di) * exp ( ki * log li )     | i <- [1..n]
>     ]
>
> where d_i is the time that passed since the i-th event, k_i coefficient
> depending on the type of the event, l_i length of the event (time distance
> between open and close for example, or focus in and out)
>
It can be rewritten to look prettier (exp log = id and so on), but this
> conveys the meaning in a nicer way by separating the terms according to
> their
> meaning.
>
>

> The main ideas behind the formula are:
>  - score degrades with the time, so if a document was kept open in okular
> for
> an hour yesterday, it will have a significantly higher score than a
> document
> that was kept open for a whole day a year ago;
>  - different events have different meanings;
>  - event time interval is measured on a logarithmic scale, so that there
> is a
> greater difference between 1hr and 2hrs, than between 11hrs and 12hrs;
>  - can be calculated quickly by only processing new events since the last
> score update.
>
>
> --
> Cheerio,
> Ivan
>
>
> KDE, ivan.cukic at kde.org, http://ivan.fomentgroup.org/
> gpg key id: 850B6F76, keyserver.pgp.com
> _______________________________________________
> Plasma-devel mailing list
> Plasma-devel at kde.org
> https://mail.kde.org/mailman/listinfo/plasma-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/plasma-devel/attachments/20141020/15d24b3b/attachment-0001.html>


More information about the Plasma-devel mailing list