[kdepim-users] Tbird versus Kmail, performance
Daniel Vrátil
dvratil at redhat.com
Fri Nov 14 14:37:41 GMT 2014
On Friday 14 of November 2014 09:12:12 Pablo Sanchez wrote:
> [ Comments below, in-line ]
>
> On 11/14/2014 06:36 AM, René J.V. Bertin wrote:
> > [ trimmed ]
> >
> > Which configures there isn't much IO going on during these lengthy
> > operations.
>
> Yes, in my (VM) tests, after ensuring MySQL's logging was disabled,
> and increase MySQL's memory to effectively cache the /akonadi/ tables,
> I found I was CPU bound with one processor.
>
> My host O/S has eight (logical) CPU's so I gave the VM an additional
> four CPU's. MySQL ate those CPU's too. Baloo was eating some CPU but
> it is nice'd so not that big of a deal.
>
> I dug in to MySQL to see what it was doing. I didn't do anything very
>
> complicated to measure what SQL was running. I repeatedly ran:
> > show full processlist;
>
> This shows the database connections and if they're running SQL, the
> SQL they're running.
>
> From multiple connections, I'd see the following queries being run:
>
> SELECT count(DISTINCT PimItemTable.id)
> FROM PimItemTable
> INNER JOIN PimItemFlagRelation
> ON ( PimItemTable.id = PimItemFlagRelation.PimItem_id )
> WHERE ( PimItemTable.collectionId = ?
> AND ( PimItemFlagRelation.Flag_id = ?
> OR
> PimItemFlagRelation.Flag_id = ? ) )
>
> and
>
> SELECT count(id), sum(size)
> FROM PimItemTable
> WHERE collectionId = ?;
IIRC this provides collections statistics (total messages, unread messages,
...) - it's calculated quite often, as almost any Item operation can affect
the statistics, and unfortunately it's not cached, so when something changes,
all clients will request update on the collection stats, which means it's
actually calculated again for each client. Definitely something worth looking
into.
> I looked at the data distribution for both /collectionId/ and
> /Flag_id/ and there weren't many distinct values.
>
> I tried to /cheat/ and create some /covering/ indexes to get slightly
> better performance but it was marginal. This index helps the second
> query:
>
> CREATE INDEX idx01 ON PimItemTable (collectionId, id, size);
Do you have any numbers? We could probably add the index temporarily, and
remove it once the caching and sharing of collection statistics is
implemented, which should reduce the amount of queries to the point where the
index does not actually matter that much.
>
> I circled back to the queries and realized they're somewhat
> unbounded. I'd have expected the query to use a `date/time' column to
> further restrict it.
Well, they have to be, as stats are always computed from the entire
collection, not just part of it :-)
> For example, if we're checking for new data between `time x' and `time
> x + 1', we should ask the DBMS to only return the difference.
> Otherwise we're forcing the DBMS to do a lot of work, which in this
> case, translates itself to high CPU utilization (since all the data is
> in cache).
>
> The rule is to limit the amount of data pulled from the DB. This way
> the DB has less `data to traverse' .. which will likely translate to
> better scalability.
We only use the datetime limit for Baloo indexer, as the timestamp stored in
the Akonadi database does not represent the actual timestamp of when the item
was retrieved from remote backend - due to all the async stuff, queueing and
delayed processing, the timestamp stored in database is usually "newer" than
the actual revision of the remote server that we checked out. Using the
timestamp would cause us to miss data on next sync that arrived between
"actual last sync timestamp" and "timestamp of first item synced in last
sync".
Most remote backends use different methods for incremental sync, on IMAP
that's CONDSTORE/MODSEQ (RFC4551), which is much more reliable than
timestamps.
As of Akonadi 1.13 and KDE PIM 4.14 we don't even fetch all Items from Akonadi
into ItemSync (we used to do that so that ItemSync could compare the new Items
with the existing ones and decide which Items are new, which are changed and
which were removed from the remote server) - as of 4.14 the logic has moved to
Akonadi Server, which can perform this much more effectively without having to
checkout the entire database first.
> Carrying the above through the first example, we might do something
>
> like this in the WHERE-clause:
> ::: Time 0 :::
> --
> -- DB Client sets /min_timestamp/ to some low value (e.g.
> -- 1970 00:00:00)
> --
> WHERE ...
> AND some-timestamp BETWEEN min_timestamp AND curr_timestamp
>
> ::: Time 1 :::
> --
> -- The DB client knows the last time it checked (curr_timestamp)
> -- and that becomes /last_time_we_checked/ and it re-computes
> -- curr_timestamp.
> --
> WHERE ...
> AND some-timestamp BETWEEN last_time_we_checked AND curr_timestamp
>
> etc.
>
> Cheers,
> --
> Pablo Sanchez - Blueoak Database Engineering, Inc
> Ph: 819.459.1926 Blog: http://pablo-blog.blueoakdb.com
> iNum: 883.5100.0990.1054
>
> _______________________________________________
> KDE PIM users mailing list
> Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users
--
Daniel Vrátil | dvratil at redhat.com | dvratil on #kde-devel, #kontact, #akonadi
Software Engineer - KDE Desktop Team, Red Hat Inc.
GPG Key: 0xC59D614F6F4AE348
Fingerprint: 4EC1 86E3 C54E 0B39 5FDD B5FB C59D 614F 6F4A E348
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kdepim-users/attachments/20141114/5a602ce9/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users
More information about the kdepim-users
mailing list