[Akonadi] [Bug 351475] New: Optimization of an SQL query

Wed Aug 19 11:07:59 BST 2015

https://bugs.kde.org/show_bug.cgi?id=351475

            Bug ID: 351475
           Summary: Optimization of an SQL query
           Product: Akonadi
           Version: 1.13.0
          Platform: Archlinux Packages
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: general
          Assignee: kdepim-bugs at kde.org
          Reporter: cousinmarc at gmail.com

I gave a look at the queries run by akonadi (on PostgreSQL, as it is the
database with which I am most proficient), as I see a load that is quite heavy
on my laptop (not that powerful, but very large number of emails).

I found out that 90-95% of the CPU time used on my database is running this
query (thanks to pg_stat_statements):

SELECT count(DISTINCT PimItemTable.id) FROM PimItemTable INNER JOIN
PimItemFlagRelation ON ( PimItemTable.id = PimItemFlagRelation.PimItem_id )
WHERE ( PimItemTable.collectionId = $1 AND ( PimItemFlagRelation.Flag_id = $2
OR PimItemFlagRelation.Flag_id = $3 ) );

Here is an example:

akonadi=# explain analyze SELECT count(DISTINCT PimItemTable.id) FROM
PimItemTable INNER JOIN PimItemFlagRelation ON ( PimItemTable.id =
PimItemFlagRelation.PimItem_id ) WHERE ( PimItemTable.collectionId = 54 AND (
PimItemFlagRelation.Flag_id = 1 OR PimItemFlagRelation.Flag_id = -1 ) )
;
                                                                          QUERY
PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=11821.98..11821.99 rows=1 width=4) (actual
time=92.960..92.960 rows=1 loops=1)
   ->  Hash Join  (cost=3862.72..11817.67 rows=1725 width=4) (actual
time=73.820..92.745 rows=1719 loops=1)
         Hash Cond: (pimitemflagrelation.pimitem_id = pimitemtable.id)
         ->  Seq Scan on pimitemflagrelation  (cost=0.00..6717.29 rows=244082
width=8) (actual time=0.014..59.284 rows=243780 loops=1)
               Filter: ((flag_id = 1) OR (flag_id = (-1)))
               Rows Removed by Filter: 84707
         ->  Hash  (cost=3841.10..3841.10 rows=1730 width=4) (actual
time=1.217..1.217 rows=1719 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 61kB
               ->  Bitmap Heap Scan on pimitemtable  (cost=69.83..3841.10
rows=1730 width=4) (actual time=0.254..0.813 rows=1719 loops=1)
                     Recheck Cond: (collectionid = 54)
                     Heap Blocks: exact=91
                     ->  Bitmap Index Scan on pimitemtable_collectionindex 
(cost=0.00..69.39 rows=1730 width=0) (actual time=0.231..0.231 rows=1719
loops=1)
                           Index Cond: (collectionid = 54)
 Planning time: 0.758 ms
 Execution time: 93.024 ms

Notice that this takes 93ms. This query wants to count each PimItemTable record
from the collection 54 (that's one of my IMAP folders) that has a flag to -1 or
1. As we are joining, the join may create several records, hence the DISTINCT.

I guess you already know that, sorry for the long introduction.

The thing is, what we want, is just to check if, for each record of 
PimItemTable, there is a record in PimItemFlagRelation with a flag to -1 or 1
that matches. It's a semi-join, not a full-fledged join. It's less costly to
perform, and doesn't create duplicates.

This can be written through an EXISTS

akonadi=# explain analyze SELECT count(PimItemTable.id) FROM PimItemTable where
PimItemTable.collectionId = 54 and EXISTS (SELECT 1 FROM PimItemFlagRelation
WHERE pimItemTable.id = PimItemFlagRelation.PimItem_id AND
(PimItemFlagRelation.Flag_id = 1 OR PimItemFlagRelation.Flag_id = -1));

QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=13935.92..13935.93 rows=1 width=4) (actual
time=11.151..11.151 rows=1 loops=1)
   ->  Nested Loop Semi Join  (cost=70.25..13932.13 rows=1515 width=4) (actual
time=0.406..10.689 rows=1719 loops=1)
         ->  Bitmap Heap Scan on pimitemtable  (cost=69.83..3841.10 rows=1730
width=4) (actual time=0.381..1.174 rows=1719 loops=1)
               Recheck Cond: (collectionid = 54)
               Heap Blocks: exact=91
               ->  Bitmap Index Scan on pimitemtable_collectionindex 
(cost=0.00..69.39 rows=1730 width=0) (actual time=0.347..0.347 rows=1719
loops=1)
                     Index Cond: (collectionid = 54)
         ->  Index Only Scan using pimitemflagrelation_pkey on
pimitemflagrelation  (cost=0.42..6.15 rows=1 width=8) (actual time=0.005..0.005
rows=1 loops=1719)
               Index Cond: (pimitem_id = pimitemtable.id)
               Filter: ((flag_id = 1) OR (flag_id = (-1)))
               Heap Fetches: 1719
 Planning time: 0.768 ms
 Execution time: 11.240 ms

Same result (1719). Much less work for the database (semi-join, no need for
deduplication afterwards).

This query should work on the 3 databases engines, as EXISTS is supported
everywhere.

Reproducible: Always

Steps to Reproduce:
1. Use akonadi
2. Profile the database

-- 
You are receiving this mail because:
You are the assignee for the bug.