[Kde-pim] akonadinext update: entity processing pipelines in resources

Martin Steigerwald Martin at lichtvoll.de
Thu Dec 18 11:52:28 GMT 2014


Am Donnerstag, 18. Dezember 2014, 09:55:30 schrieb Aaron J. Seigo:
> On Wednesday, December 17, 2014 14.49:53 you wrote:
> > On Wednesday 17 December 2014 11:39:10 Aaron J. Seigo wrote:
> > > currently, pipelines are just a simple one-after-the-other processing
> > > afair. It is set up already for asynchronous processing, however.
> > > Eventually I would like to allow filters to note that they can be
> > > parallelized, should be run in a separate thread, ??? ... mostly so that
> > > we can increase throughput.
[…]
> > What other, _common_ usecase do you think of that would benefit from the
> > additional design overhead?
> 
> The point of having pipelines is to ensure all post-delivery processing is
> done before clients start showing (wrong) data. Filters that move an email
> between folders, for instance, should be run *before* showing the email in
> the wrong folder in the client.
> 
> So, real world use cases:
> 
> 1. a mail filter that moves an email to a folder
> 2. a scam detector (currently this lives in libmessageviewer!)
> 3.full text indexer
> 4. threading agent (relies on knowing which folder it is in)
> 5. a mail filter that flags mails from your boss as important
> 6. an event checker that flags conflicts between incoming events and
> existing ones
> 
> 1, 2, 3, 4 and 6 do not modify the entity itself. They touch indexes, but
> not the entity itself. Number 5 does.
> 
> Number 1 needs to be run before numbers 3 and 4, but can be run in parallel

Why?

Doesn´t the full text indexer reference the mail by some index in the 
database? If so, it wouldn´t care about the folder its stored in and can look 
that up on demand

> as for why to parallelize, that's simple: throughput.
> 
> as you note, we should be able to parallelize processing of individual
> emails, but even then only to an extent. the threading agent is much
> simpler if it is only ever processing one email at a time, so maybe we
> never want it to be running in parallel, which the scam detector perhaps
> ought to be running in as many individual pipelines as possible at once.

I wonder whether its possible to parallelize in another way:

When I download new mail to a pop3 account, say about 1000 mails – I easily 
have this after a day absence – and I want them being filtered into folders: 
How about using mutiple filtering threads to sort the mail in order to utilize 
all available CPU cores and have it finished quickly?

Also spam filtering could be done multi-threaded.

Or just checking all folders of an IMAP account. The client could open 2-4 
folders at once to synchronize them. It might be good if that could be 
configured to what the IMAP client can handle. Or can Akonadi do this already? 
I usually see it checking one folder after another with Dovecot on the server 
side idling around.

On any account I would like a see an important design goal for Akonadi Next:

*Never* block the client. Never ever block the client gui.

Current Akonadi still has issues with it. If Akonadi is busy with itself, 
KMail can still become quite unusable. While I think Akonadi should postpone 
background jobs to serve current user requests *quickly*. If I click on a 
mail, I want to see it. *Now*. The only excuse would be that the IMAP server 
doesn´t serve in time. But with a POP3 with locally stored maildirs on a SSD 
based BTRFS RAID 1 there is zero excuse for not serving the mail *now*. Same 
goes with switching folders and so on.

So throughput is one thing, but I think from a users point of view there is 
even something more important: Latency!

From my work as trainer and consultant regarding performance analysis & tuning 
on Linux I know that it can be challenging to have both. But in order best 
user experience, if need be I would reduce throughput (of background jobs) to 
decrease latency.

Please keep this in mind. In my eyes latency is key. And as I understood it 
this was one of the promises of Akonadi that it never quite fulfilled. The 
client will never have to block cause processing work happens in the 
background. Yet current Akonadi can block. It can block badly. For half a 
minute and more. Up to the point that KMail seems to loose connection with 
Akonadi and then I have KMail sitting there, doing nothing anymore, and then 
Akonadi also sitting there, doing nothing.

So please keep latency in mind. And robustness. The client shall never loose 
connection with the background store.

I will happily test this with my "monster" account (one million mails and 
counting).

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list