[kdepim-users] Tbird versus Kmail, performance

Fri Nov 14 15:30:20 GMT 2014

[ Comments below, in-line ]

On 11/14/2014 09:37 AM, Daniel Vrátil wrote:

Hi Daniel,

If you'd like to move this offline, you have my email address.  I'm a
firm proponent of Open Source and I have a skill-set that I'm willing
to donate.  Standard caveat applies, work/family comes first but I
will /eventually/ answer!  :)

> IIRC this provides collections statistics (total messages, unread messages, 
> ...) - it's calculated quite often, 

Yah, I saw multiple connections issuing those queries.

> as almost any Item operation can affect the statistics, and
> unfortunately it's not cached, so when something changes, all
> clients will request update on the collection stats, which means
> it's actually calculated again for each client. Definitely something
> worth looking into.

Ah, I think I understand, each client is sent a message saying, `hey,
we have more items', then each client issues its own query to get the
latest count.

It may be worth exploring to use the Innodb parameters
`query_cache_size' and `query_cache_type' to mimic creating a cache
from within Innodb.

In my test, I didn't realize I had to set `query_cache_type=1' (it was
set to `0') but I did set `query_cache_size=128M'

@Rene:  for your test, you'll want to try `query_cache_type=1'

>>    CREATE INDEX idx01 ON PimItemTable (collectionId, id, size);
> 
> Do you have any numbers? 

No I don't ... I was looking for something /big/ ... like, add these
indexes and bam! instant performance boost (which I've seen at client
sites) but not in this case.

The proper way to measure the performance would be to capture the
number of Logical I/O's generated by the query.  If there's a sizeable
reduction for representative datasets, it's a win.

For example, if we reduced from 1,000 to 990, probably not worth
adding an index (which of course comes at a cost:  the DBMS has to
maintain it).

On the other hand, if we reduced the logical I/O's to 500, then of
course it's worth it.

By using logical I/O's as our metric, we don't have to run a full
scale benchmark with tons of external factors possibly influencing our
numbers.

BTW, I'm defining a logical I/O as an I/O that is tallied regardless
of whether it's a physical I/O or not.

>> I circled back to the queries and realized they're somewhat
>> unbounded.  I'd have expected the query to use a `date/time' column
>> to further restrict it.
> 
> Well, they have to be, as stats are always computed from the entire
> collection, not just part of it :-)

:)

What I'm suggesting is the db-client knows its counts from the last
time it ran its query therefore it only needing the deltas.

Here's what I'm thinking ... for simplicity, I'll use some pseudocode:

   last_time_I_checked = now;
   now = // my current date/time //

   -- Get my current count
   SELECT count(DISTINCT PimItemTable.id)
   INTO   current_count
   WHERE  timestamp BETWEEN last_time_I_checked AND now;

   // code
   total_count += current_count;
   last_time_I_checked = now;

> We only use the datetime limit for Baloo indexer, as the timestamp
> stored in the Akonadi database does not represent the actual
> timestamp of when the item was retrieved from remote backend - due
> to all the async stuff,
>
> [ trimmed ]

I'll need your help here because you're the Subject Matter Expert
(SME) but can't we use a timestamp column to track when the row
entered the database?  The creation time of the row.  If we have such
a row, then we can implement what I'm talking about without data
loss.

Generally speaking, that's what I'm suggesting:  limit the required DB
bandwidth to improve scalability.

Cheers,
--
Pablo Sanchez - Blueoak Database Engineering, Inc
Ph:    819.459.1926         Blog:  http://pablo-blog.blueoakdb.com
iNum:  883.5100.0990.1054

_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users