[Akonadi] [Bug 351475] New: Optimization of an SQL query
Marc Cousin
cousinmarc at gmail.com
Wed Aug 19 11:07:59 BST 2015
https://bugs.kde.org/show_bug.cgi?id=351475
Bug ID: 351475
Summary: Optimization of an SQL query
Product: Akonadi
Version: 1.13.0
Platform: Archlinux Packages
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: NOR
Component: general
Assignee: kdepim-bugs at kde.org
Reporter: cousinmarc at gmail.com
I gave a look at the queries run by akonadi (on PostgreSQL, as it is the
database with which I am most proficient), as I see a load that is quite heavy
on my laptop (not that powerful, but very large number of emails).
I found out that 90-95% of the CPU time used on my database is running this
query (thanks to pg_stat_statements):
SELECT count(DISTINCT PimItemTable.id) FROM PimItemTable INNER JOIN
PimItemFlagRelation ON ( PimItemTable.id = PimItemFlagRelation.PimItem_id )
WHERE ( PimItemTable.collectionId = $1 AND ( PimItemFlagRelation.Flag_id = $2
OR PimItemFlagRelation.Flag_id = $3 ) );
Here is an example:
akonadi=# explain analyze SELECT count(DISTINCT PimItemTable.id) FROM
PimItemTable INNER JOIN PimItemFlagRelation ON ( PimItemTable.id =
PimItemFlagRelation.PimItem_id ) WHERE ( PimItemTable.collectionId = 54 AND (
PimItemFlagRelation.Flag_id = 1 OR PimItemFlagRelation.Flag_id = -1 ) )
;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11821.98..11821.99 rows=1 width=4) (actual
time=92.960..92.960 rows=1 loops=1)
-> Hash Join (cost=3862.72..11817.67 rows=1725 width=4) (actual
time=73.820..92.745 rows=1719 loops=1)
Hash Cond: (pimitemflagrelation.pimitem_id = pimitemtable.id)
-> Seq Scan on pimitemflagrelation (cost=0.00..6717.29 rows=244082
width=8) (actual time=0.014..59.284 rows=243780 loops=1)
Filter: ((flag_id = 1) OR (flag_id = (-1)))
Rows Removed by Filter: 84707
-> Hash (cost=3841.10..3841.10 rows=1730 width=4) (actual
time=1.217..1.217 rows=1719 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 61kB
-> Bitmap Heap Scan on pimitemtable (cost=69.83..3841.10
rows=1730 width=4) (actual time=0.254..0.813 rows=1719 loops=1)
Recheck Cond: (collectionid = 54)
Heap Blocks: exact=91
-> Bitmap Index Scan on pimitemtable_collectionindex
(cost=0.00..69.39 rows=1730 width=0) (actual time=0.231..0.231 rows=1719
loops=1)
Index Cond: (collectionid = 54)
Planning time: 0.758 ms
Execution time: 93.024 ms
Notice that this takes 93ms. This query wants to count each PimItemTable record
from the collection 54 (that's one of my IMAP folders) that has a flag to -1 or
1. As we are joining, the join may create several records, hence the DISTINCT.
I guess you already know that, sorry for the long introduction.
The thing is, what we want, is just to check if, for each record of
PimItemTable, there is a record in PimItemFlagRelation with a flag to -1 or 1
that matches. It's a semi-join, not a full-fledged join. It's less costly to
perform, and doesn't create duplicates.
This can be written through an EXISTS
akonadi=# explain analyze SELECT count(PimItemTable.id) FROM PimItemTable where
PimItemTable.collectionId = 54 and EXISTS (SELECT 1 FROM PimItemFlagRelation
WHERE pimItemTable.id = PimItemFlagRelation.PimItem_id AND
(PimItemFlagRelation.Flag_id = 1 OR PimItemFlagRelation.Flag_id = -1));
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=13935.92..13935.93 rows=1 width=4) (actual
time=11.151..11.151 rows=1 loops=1)
-> Nested Loop Semi Join (cost=70.25..13932.13 rows=1515 width=4) (actual
time=0.406..10.689 rows=1719 loops=1)
-> Bitmap Heap Scan on pimitemtable (cost=69.83..3841.10 rows=1730
width=4) (actual time=0.381..1.174 rows=1719 loops=1)
Recheck Cond: (collectionid = 54)
Heap Blocks: exact=91
-> Bitmap Index Scan on pimitemtable_collectionindex
(cost=0.00..69.39 rows=1730 width=0) (actual time=0.347..0.347 rows=1719
loops=1)
Index Cond: (collectionid = 54)
-> Index Only Scan using pimitemflagrelation_pkey on
pimitemflagrelation (cost=0.42..6.15 rows=1 width=8) (actual time=0.005..0.005
rows=1 loops=1719)
Index Cond: (pimitem_id = pimitemtable.id)
Filter: ((flag_id = 1) OR (flag_id = (-1)))
Heap Fetches: 1719
Planning time: 0.768 ms
Execution time: 11.240 ms
Same result (1719). Much less work for the database (semi-join, no need for
deduplication afterwards).
This query should work on the 3 databases engines, as EXISTS is supported
everywhere.
Reproducible: Always
Steps to Reproduce:
1. Use akonadi
2. Profile the database
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the Kdepim-bugs
mailing list