[Kde-pim] akonadinext update: entity processing pipelines in resources

Aaron J. Seigo aseigo at kde.org
Thu Dec 18 08:55:30 GMT 2014


On Wednesday, December 17, 2014 14.49:53 you wrote:
> On Wednesday 17 December 2014 11:39:10 Aaron J. Seigo wrote:
> > currently, pipelines are just a simple one-after-the-other processing
> > afair. It is set up already for asynchronous processing, however.
> > Eventually I would like to allow filters to note that they can be
> > parallelized, should be run in a separate thread, ??? ... mostly so that
> > we can increase throughput.
> 
> This, imo, will kill user-configuration. You do not want to burden the user
> with a GUI where he can define dependencies etc. pp.

Yes, that would make no sense. So that's probably not what I'm proposing :)

> Also, I cannot think of any common use-case of mail filtering that could be
> parallelized for a single mail:

Christian already pointed to it, but:

	https://community.kde.org/KDE_PIM/Akonadi_Next/Terminology

Mail filtering is a specific use case, but the abstract concept is "processing 
an entity for content". Evidently the word "filter" is causing confusion, and 
that's perhaps understandable since the word has meaning in the scope of 
email. (.. and of course, Akonadi is not, strictly speaking, even an email 
system; it's a system that can be used to manage email stores ..)

Better suggestions for the word "filter" are welcome. We are early enough in 
that we can change these terms.

> What other, _common_ usecase do you think of that would benefit from the
> additional design overhead?

The point of having pipelines is to ensure all post-delivery processing is 
done before clients start showing (wrong) data. Filters that move an email 
between folders, for instance, should be run *before* showing the email in the 
wrong folder in the client.

So, real world use cases:

1. a mail filter that moves an email to a folder
2. a scam detector (currently this lives in libmessageviewer!)
3.full text indexer
4. threading agent (relies on knowing which folder it is in)
5. a mail filter that flags mails from your boss as important
6. an event checker that flags conflicts between incoming events and existing 
ones

1, 2, 3, 4 and 6 do not modify the entity itself. They touch indexes, but not 
the entity itself. Number 5 does.

Number 1 needs to be run before numbers 3 and 4, but can be run in parallel 
with 2 and 5 (which also needs to be run before 3 and 4). 3 and 4 can be run 
in parallel. 6 may run on emails and on calender events, does not touch the 
entities, nothing depends on its output.

the graph that comes from that is self-evident once all the information is 
known .. but that's the trick: making sure each element can provide enough 
useful, machine-processable information to know what the graph should be.

as for user configuration, they may wish to not have scam detection on (e.g.). 
with that off, then the set of filters that are run change (in this case #2 is 
just not run at all) and the graph changes as a result as well.

additionally, 1 and 5 are obviously generated from user configuration. the 
user won't know that, but that is what will be happening: their filters will 
be creating nodes in the pipeline.

as for why to parallelize, that's simple: throughput.

as you note, we should be able to parallelize processing of individual emails, 
but even then only to an extent. the threading agent is much simpler if it is 
only ever processing one email at a time, so maybe we never want it to be 
running in parallel, which the scam detector perhaps ought to be running in as 
many individual pipelines as possible at once.

additionally, some processes take more time than others and block yet others. 
runing 1, 2, 3 and 6 in parallel will gut us to 4, 5 that much faster. 
throughput, plain and simple.

we are thinking about all of these issues with datasets of 100s of 1000s of 
folders / emails in a single collection in mind. Kolab Systems has clients 
with exactly such data sets, in fact.

hope that helps clear up some things. if not, keep asking :)

-- 
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20141218/9065ad62/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list