[Kde-pim] Excessive amount of queries

Sun Dec 7 18:16:19 GMT 2014

On Sunday, December 7, 2014 15.59:23 you wrote:
> I think you are missing the point, Aaron. The question should be: Why do we
> even need 500k query round trips per second for a mail application?

This was answered in my previous email:

> > no, we probably don't need 500k queries/s, but given that given each query
> > takes on average 1/500,000th of a second it means we can get away with
> > querying and not worrying so much about it because the latency is so low.

... now the philosophy parts :)

> Independent of the actual storage mechanism, doing less will always be more
> performant than doing more. 

Actually, and perhaps somewhat surprisingly, that's not universally true. 
Especially when it comes to information retrieval, because not all operations 
are equal in cost and some operations vary in cost depending on where they are 
executed.

The storage mechanism is closer to the source of the data (obviously :), so if  
it can e.g. compute the sum of items (even if doing so by counting them every 
time!) or deliver a pre-threaded list of items (which requires the "doing 
more" of keeping it incrementally updated), it can indeed be *faster* even 
when both storage and client are "doing more" (e.g in terms of instructions or 
queries or ... ). Data locality, efficiency of access (e.g. cost of 
serialization), etc. all have huge impacts on the total costs, making "doing 
more" at times cheaper than "doing less".

Things like client-side caches can be unexpectedly treacherous ... they need 
to be kept in sync with the store, often don't outlive the application (for 
simplicity's sake), require a locking strategy if you ever wish to go multi-
threaded, ... things which you can get "for free" by hitting storage directly. 
So if retrieval from storage is cheap and/or if storage has the ability to 
perform tasks such as memoization (esp if that survives application 
lifespans), it often will over time outperform a client-side "does less" 
solution ... which btw tends to get more complex over time. I've seen such 
"prevent ops" solutions that have evolved over time to cover the real-world 
requirements of the product and, even though they started out as being faster, 
ended up being slower than going back to "doing more", but doing it in a more 
appropriate place.

With info retrieval, I've found that "doing more" is less important than doing 
things in the optimal place.

This is why taking the same algorithm out of a client and shoving it into a 
stored procedure will often outperform the same client-side code, even if the 
stored procedure language is quite a bit slower (and more clumsy) than the 
natively-compiled client.

This is why "NoSQL" (hate that term tbh :) key/value stores outperform most 
SQL-based systems when scaling out for many types of data ... even though they 
are doing more work to keep nodes in sync and have more complexity in terms of 
working around the CAP theorem and providing some path to ACID where 
necessary.

> So thinking about the proper way to tackle this issue,

Agreed.

> to find a solution that does not require thousands of queries per
> second, would pay of way more than rewriting all of Akonadi. And would still

Both can be done, of course, but I'd be very wary about adding (and keeping) 
more complexity to work around poor query performance. I'd also be wary about 
adding caches outside the store when the storage could keep desired caches for 
the client; so even if the client does (lots) more queries, if they are orders 
of magnitude faster and do not require store synchronization, it will almost 
certainly end up better.

(... and am I the only one who finds having to add caches for what is supposed 
to be a PIM data cache ironic? :)

> I'm not arguing against anything of that. Just keep in mind that using some
> faster storage mechanism won't free the developers from actually thinking
> about the performance implications of the code they are writing.

Agreed, and I don't think I ever said otherwise.

As I said in my first mail, it is great that this kind of optimization work is 
happening ... people will benefit from it. I'm just trying to emphasize (after 
someone mentioned Akonadi "next") that in future it would be better to 
consider architectural improvements that can eliminate entire classes of 
problems rather than patching the tower of code on top of Akonadi until we hit 
the sky ... if Kontact hasn't already done so.

If it is easy / natural to repeatedly fetch folder statistics (it is, 
obviously) because it keeps code loosely coupled (which is often the same 
thing as saying "developed over time by different people") then a system that 
handles that kind of pattern well is generally preferable. It would allow 
simpler client code which means less bugs, less to maintain and a greater 
chance of new contributors because it is less difficult to develop decently 
performing clients.

-- 
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20141207/9aee08dc/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/