[Kde-pim] Excessive amount of queries
Aaron J. Seigo
aseigo at kde.org
Sun Dec 7 18:16:19 GMT 2014
On Sunday, December 7, 2014 15.59:23 you wrote:
> I think you are missing the point, Aaron. The question should be: Why do we
> even need 500k query round trips per second for a mail application?
This was answered in my previous email:
> > no, we probably don't need 500k queries/s, but given that given each query
> > takes on average 1/500,000th of a second it means we can get away with
> > querying and not worrying so much about it because the latency is so low.
... now the philosophy parts :)
> Independent of the actual storage mechanism, doing less will always be more
> performant than doing more.
Actually, and perhaps somewhat surprisingly, that's not universally true.
Especially when it comes to information retrieval, because not all operations
are equal in cost and some operations vary in cost depending on where they are
executed.
The storage mechanism is closer to the source of the data (obviously :), so if
it can e.g. compute the sum of items (even if doing so by counting them every
time!) or deliver a pre-threaded list of items (which requires the "doing
more" of keeping it incrementally updated), it can indeed be *faster* even
when both storage and client are "doing more" (e.g in terms of instructions or
queries or ... ). Data locality, efficiency of access (e.g. cost of
serialization), etc. all have huge impacts on the total costs, making "doing
more" at times cheaper than "doing less".
Things like client-side caches can be unexpectedly treacherous ... they need
to be kept in sync with the store, often don't outlive the application (for
simplicity's sake), require a locking strategy if you ever wish to go multi-
threaded, ... things which you can get "for free" by hitting storage directly.
So if retrieval from storage is cheap and/or if storage has the ability to
perform tasks such as memoization (esp if that survives application
lifespans), it often will over time outperform a client-side "does less"
solution ... which btw tends to get more complex over time. I've seen such
"prevent ops" solutions that have evolved over time to cover the real-world
requirements of the product and, even though they started out as being faster,
ended up being slower than going back to "doing more", but doing it in a more
appropriate place.
With info retrieval, I've found that "doing more" is less important than doing
things in the optimal place.
This is why taking the same algorithm out of a client and shoving it into a
stored procedure will often outperform the same client-side code, even if the
stored procedure language is quite a bit slower (and more clumsy) than the
natively-compiled client.
This is why "NoSQL" (hate that term tbh :) key/value stores outperform most
SQL-based systems when scaling out for many types of data ... even though they
are doing more work to keep nodes in sync and have more complexity in terms of
working around the CAP theorem and providing some path to ACID where
necessary.
> So thinking about the proper way to tackle this issue,
Agreed.
> to find a solution that does not require thousands of queries per
> second, would pay of way more than rewriting all of Akonadi. And would still
Both can be done, of course, but I'd be very wary about adding (and keeping)
more complexity to work around poor query performance. I'd also be wary about
adding caches outside the store when the storage could keep desired caches for
the client; so even if the client does (lots) more queries, if they are orders
of magnitude faster and do not require store synchronization, it will almost
certainly end up better.
(... and am I the only one who finds having to add caches for what is supposed
to be a PIM data cache ironic? :)
> I'm not arguing against anything of that. Just keep in mind that using some
> faster storage mechanism won't free the developers from actually thinking
> about the performance implications of the code they are writing.
Agreed, and I don't think I ever said otherwise.
As I said in my first mail, it is great that this kind of optimization work is
happening ... people will benefit from it. I'm just trying to emphasize (after
someone mentioned Akonadi "next") that in future it would be better to
consider architectural improvements that can eliminate entire classes of
problems rather than patching the tower of code on top of Akonadi until we hit
the sky ... if Kontact hasn't already done so.
If it is easy / natural to repeatedly fetch folder statistics (it is,
obviously) because it keeps code loosely coupled (which is often the same
thing as saying "developed over time by different people") then a system that
handles that kind of pattern well is generally preferable. It would allow
simpler client code which means less bugs, less to maintain and a greater
chance of new contributors because it is less difficult to develop decently
performing clients.
--
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20141207/9aee08dc/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/
More information about the kde-pim
mailing list