<div dir="ltr">On Monday 20 October 2014 17:24:02 Vishesh Handa wrote:<br>> On Mon, Oct 13, 2014 at 6:59 PM, Ivan Čukić <<a href="mailto:ivan.cukic@kde.org">ivan.cukic@kde.org</a>> wrote:<br>> > 3. What will be needed<br>> > ================<br>> > <br>> > Integration with baloo. It will require patches on both sides if we are to<br>> > support all the use-cases without cross-queries. We will need accessible<br>> > file<br>> > types via sqlite (on baloo side) and baloo identifiers or something on<br>> > kamd<br>> > side.<br>> <br>> * The Baloo identifiers will only work for indexed files. Given that we're<br>> not enforcing users to index everything (nor should we). We need a<br>> different approach.<br><br>What could be an alternative approach is - since we already support <br>applications to set the mimetype in the api (used for SLC) we could save that <br>as well. We could detect the mimetype for the files automatically, but not for <br>other linkable things. I guess that this would also cover most of the use-<br>cases from the original mail that talked about the 'additional payload'.<br><br>One thing to ask here is if baloo skips indexing a file that is in an <br>indexable folder - does it store any info about the file (name, date etc.) or <br>not?<br><br>Still, baloo is not off the hook - if we want to detect the file moving and <br>deletion (as suggested by the guy behind baloo :P), at least for the indexed <br>files, we need baloo to somehow tell us what was moved/deleted.<br><br>This could be implemented in a few different ways that would all have some <br>drawbacks:<br>1 - baloo sending signals - big drawback would be that if the clients miss out <br>an event, the database would become inconsistent;<br>2 - saving baloo ids along files in kamd - drawback is that baloo becomes tied <br>to sqlite (as you mentioned);<br>3 - baloo saving information about what has moved/deleted so that a client can <br>ask for all events that happened since some timestamp - drawbacks here are <br>that the clients need to regularly check for the updates meaning they will <br>most likely have at least a bit out of date information<br>4 - combination of 1 and 3 - it would be the most complex implementation, but <br>it would work properly (distributed dbs are evil  )<br><br>And, an additional problem is what to do about the files that are not indexed <br>by baloo? Should it silently fail or what? For the statistics, it would be <br>(imo) ok to fail silently, but for activity linking, it might be nice to show <br>a warning message to the user.<br><br><br>> >     sum [<br>> >     <br>> >         exp (-di) * exp ( ki * log li )     | i <- [1..n]<br>> >     <br>> >     ]<br>> > <br><br>> I don't understand all of the math, but this sounds quite ideal. Currently<br>> in KRunner we have a global run count which affects the score of all the<br>> result, and that score doesn't degrade over time. This seems like something<br>> nice to replace it with, if we can make it super fast that is.<br><br>The scores are cached, so there is no problem with the speed - it is just a <br>query over a table. Something like<br><br>select resource, date, score * exp(now - date) as currentScore ...<br><br>which would be in the library, so the client would not need to think about the <br>difference between the cached score and the current score (cached score was <br>current at the point of calculation, but needs to be degraded on query because <br>the time has passed since the initial calculation).<br><br>If you are interested, I can go a bit more into math details - I think I even <br>have it written somewhere.<br><br>Ch!<br></div><div class="gmail_extra"><br><div class="gmail_quote">On 20 October 2014 17:24, Vishesh Handa <span dir="ltr"><<a href="mailto:me@vhanda.in" target="_blank">me@vhanda.in</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Mon, Oct 13, 2014 at 6:59 PM, Ivan Čukić <span dir="ltr"><<a href="mailto:ivan.cukic@kde.org" target="_blank">ivan.cukic@kde.org</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

3. What will be needed<br>

================<br>

<br>

Integration with baloo. It will require patches on both sides if we are to<br>

support all the use-cases without cross-queries. We will need accessible file<br>

types via sqlite (on baloo side) and baloo identifiers or something on kamd<br>

side.<br></blockquote><div><br></div></span><div>* The Baloo identifiers will only work for indexed files. Given that we're not enforcing users to index everything (nor should we). We need a different approach.</div><div>* This would also require Baloo to stick with sqlite.</div><span class=""><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Appendix 1: Formula for the resource scoring:<br>

===============================<br>

<br>

LaTeX formatted:<br>

    S = \sum _{i = 1} ^ n<br>

        e^{-d_i} e^{k_i \log(l_i)}<br>

<br>

Haskell-like formatted, whichever you find easier to read :)<br>

    sum [<br>

        exp (-di) * exp ( ki * log li )     | i <- [1..n]<br>

    ]<br>

<br>

where d_i is the time that passed since the i-th event, k_i coefficient<br>

depending on the type of the event, l_i length of the event (time distance<br>

between open and close for example, or focus in and out)<br>

<br>

It can be rewritten to look prettier (exp log = id and so on), but this<br>

conveys the meaning in a nicer way by separating the terms according to their<br>

meaning.<br>

<br>

The main ideas behind the formula are:<br>

 - score degrades with the time, so if a document was kept open in okular for<br>

an hour yesterday, it will have a significantly higher score than a document<br>

that was kept open for a whole day a year ago;<br>

 - different events have different meanings;<br>

 - event time interval is measured on a logarithmic scale, so that there is a<br>

greater difference between 1hr and 2hrs, than between 11hrs and 12hrs;<br>

 - can be calculated quickly by only processing new events since the last<br>

score update.<br></blockquote><div><br></div></span><div>I don't understand all of the math, but this sounds quite ideal. Currently in KRunner we have a global run count which affects the score of all the result, and that score doesn't degrade over time. This seems like something nice to replace it with, if we can make it super fast that is. </div><span class="HOEnZb"><font color="#888888"><div><br></div><div>-- <br></div></font></span></div><span class="HOEnZb"><font color="#888888"><span style="color:rgb(192,192,192)">Vishesh Handa</span><br>

</font></span></div></div>

<br>_______________________________________________<br>

Plasma-devel mailing list<br>

<a href="mailto:Plasma-devel@kde.org">Plasma-devel@kde.org</a><br>

<a href="https://mail.kde.org/mailman/listinfo/plasma-devel" target="_blank">https://mail.kde.org/mailman/listinfo/plasma-devel</a><br>

<br></blockquote></div><br><br clear="all"><br>-- <br>Cheerio,<br>Ivan<br><br>--<br>While you were hanging yourself on someone else's words<br>Dying to believe in what you heard<br>I was staring straight into the shining sun

</div>