The usage statistics [kactivities, baloo, ktp, plasma]
Ivan Čukić
ivan.cukic at kde.org
Tue Oct 21 09:22:14 UTC 2014
On Monday 20 October 2014 17:24:02 Vishesh Handa wrote:
> On Mon, Oct 13, 2014 at 6:59 PM, Ivan Čukić <ivan.cukic at kde.org> wrote:
> > 3. What will be needed
> > ================
> >
> > Integration with baloo. It will require patches on both sides if we are
to
> > support all the use-cases without cross-queries. We will need accessible
> > file
> > types via sqlite (on baloo side) and baloo identifiers or something on
> > kamd
> > side.
>
> * The Baloo identifiers will only work for indexed files. Given that we're
> not enforcing users to index everything (nor should we). We need a
> different approach.
What could be an alternative approach is - since we already support
applications to set the mimetype in the api (used for SLC) we could save
that
as well. We could detect the mimetype for the files automatically, but not
for
other linkable things. I guess that this would also cover most of the use-
cases from the original mail that talked about the 'additional payload'.
One thing to ask here is if baloo skips indexing a file that is in an
indexable folder - does it store any info about the file (name, date etc.)
or
not?
Still, baloo is not off the hook - if we want to detect the file moving and
deletion (as suggested by the guy behind baloo :P), at least for the
indexed
files, we need baloo to somehow tell us what was moved/deleted.
This could be implemented in a few different ways that would all have some
drawbacks:
1 - baloo sending signals - big drawback would be that if the clients miss
out
an event, the database would become inconsistent;
2 - saving baloo ids along files in kamd - drawback is that baloo becomes
tied
to sqlite (as you mentioned);
3 - baloo saving information about what has moved/deleted so that a client
can
ask for all events that happened since some timestamp - drawbacks here are
that the clients need to regularly check for the updates meaning they will
most likely have at least a bit out of date information
4 - combination of 1 and 3 - it would be the most complex implementation,
but
it would work properly (distributed dbs are evil )
And, an additional problem is what to do about the files that are not
indexed
by baloo? Should it silently fail or what? For the statistics, it would be
(imo) ok to fail silently, but for activity linking, it might be nice to
show
a warning message to the user.
> > sum [
> >
> > exp (-di) * exp ( ki * log li ) | i <- [1..n]
> >
> > ]
> >
> I don't understand all of the math, but this sounds quite ideal. Currently
> in KRunner we have a global run count which affects the score of all the
> result, and that score doesn't degrade over time. This seems like
something
> nice to replace it with, if we can make it super fast that is.
The scores are cached, so there is no problem with the speed - it is just a
query over a table. Something like
select resource, date, score * exp(now - date) as currentScore ...
which would be in the library, so the client would not need to think about
the
difference between the cached score and the current score (cached score was
current at the point of calculation, but needs to be degraded on query
because
the time has passed since the initial calculation).
If you are interested, I can go a bit more into math details - I think I
even
have it written somewhere.
Ch!
On 20 October 2014 17:24, Vishesh Handa <me at vhanda.in> wrote:
>
>
> On Mon, Oct 13, 2014 at 6:59 PM, Ivan Čukić <ivan.cukic at kde.org> wrote:
>>
>> 3. What will be needed
>> ================
>>
>> Integration with baloo. It will require patches on both sides if we are to
>> support all the use-cases without cross-queries. We will need accessible
>> file
>> types via sqlite (on baloo side) and baloo identifiers or something on
>> kamd
>> side.
>>
>
> * The Baloo identifiers will only work for indexed files. Given that we're
> not enforcing users to index everything (nor should we). We need a
> different approach.
> * This would also require Baloo to stick with sqlite.
>
> Appendix 1: Formula for the resource scoring:
>> ===============================
>>
>> LaTeX formatted:
>> S = \sum _{i = 1} ^ n
>> e^{-d_i} e^{k_i \log(l_i)}
>>
>> Haskell-like formatted, whichever you find easier to read :)
>> sum [
>> exp (-di) * exp ( ki * log li ) | i <- [1..n]
>> ]
>>
>> where d_i is the time that passed since the i-th event, k_i coefficient
>> depending on the type of the event, l_i length of the event (time distance
>> between open and close for example, or focus in and out)
>>
>> It can be rewritten to look prettier (exp log = id and so on), but this
>> conveys the meaning in a nicer way by separating the terms according to
>> their
>> meaning.
>>
>> The main ideas behind the formula are:
>> - score degrades with the time, so if a document was kept open in okular
>> for
>> an hour yesterday, it will have a significantly higher score than a
>> document
>> that was kept open for a whole day a year ago;
>> - different events have different meanings;
>> - event time interval is measured on a logarithmic scale, so that there
>> is a
>> greater difference between 1hr and 2hrs, than between 11hrs and 12hrs;
>> - can be calculated quickly by only processing new events since the last
>> score update.
>>
>
> I don't understand all of the math, but this sounds quite ideal. Currently
> in KRunner we have a global run count which affects the score of all the
> result, and that score doesn't degrade over time. This seems like something
> nice to replace it with, if we can make it super fast that is.
>
> --
> Vishesh Handa
>
> _______________________________________________
> Plasma-devel mailing list
> Plasma-devel at kde.org
> https://mail.kde.org/mailman/listinfo/plasma-devel
>
>
--
Cheerio,
Ivan
--
While you were hanging yourself on someone else's words
Dying to believe in what you heard
I was staring straight into the shining sun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/plasma-devel/attachments/20141021/29462a6a/attachment-0001.html>
More information about the Plasma-devel
mailing list