[Kde-pim] Trying to understand mail filtering handling with maildir resource

Daniel Vrátil dvratil at redhat.com
Mon Jul 14 10:03:11 BST 2014


On Sunday 13 of July 2014 15:14:09 Martin Steigerwald wrote:
> Hi!
> 
> In order to finally break^H^H^H^H^H fix up performance of maildir resource I
> try to understand maildir filtering handling with maildir resource.
> Especially I want to ask the question:

Hi,

thanks for looking into this. I'm not a maildir expert, but I'll try to answer 
what I know.


> Why does filtering mails into a folder trigger a full folder sync including
> comparing all files each time?

By folder sync, do you mean a sync triggered by Akonadi (that is 
MaildirResource::retriveItems()  is called ( == folder sync in Akonadi terms),  
or does the resource just trigger some updates of it's internal cache? This is 
not clear to me from your email.

> I started from libmaildir and the only case it compares full directory
> contents seem to be in
> 
> 838 void Maildir::refreshKeyCache()
> 839 {
> 840   KeyCache::self()->refreshKeys( d->path );
> 841 }

Wouldn't the proper question be "what calls this method"? If I understand it 
correctly, your concern is, that filtering a mail into maildir folder triggers 
full sync ( == calls refreshKeys?), so you should be trying to find out what 
triggers the refreshKeys method, and try to optimize that.

> 
> which calls:
> 
>  26 void KeyCache::addKeys( const QString& dir )
>  27 {
>  28   if ( !mNewKeys.contains( dir ) ) {
>  29     mNewKeys.insert( dir, listNew( dir ) );
>  30     //kDebug() << "Added new keys for: " << dir;
>  31   }
>  32
>  33   if ( !mCurKeys.contains( dir ) ) {
>  34     mCurKeys.insert( dir, listCurrent( dir ) );
>  35     //kDebug() << "Added cur keys for: " << dir;
>  36   }
>  37 }
>  38
>  39 void KeyCache::refreshKeys( const QString& dir )
>  40 {
>  41     mNewKeys.remove( dir );
>  42     mCurKeys.remove( dir );
>  43     addKeys( dir );
>  44 }
> 
> which essential calls the routines I switched not to sort the filenames. The
> routines which list all the filenames of a mail folder into a buffer.
> 
> 
> Maildir resource calls this in:
> 
> 711 void MaildirResource::slotDirChanged(const QString& dir)
> 712 {
> 713   QFileInfo fileInfo( dir );
> 714   if ( fileInfo.isFile() ) {
> 715     slotFileChanged( fileInfo );
> 716     return;
> 717   }
> 718
> 719   if ( dir == mSettings->path() ) {
> 720     synchronizeCollectionTree();
> 721    synchronizeCollection( Collection::root().id() );
> 722     return;
> 723   }
> 
> TODO: Figure out what these exactly do.
> 
> Documentation
> 
> void ResourceBase::synchronizeCollection 	( 	qint64  	id	)
> 	protected
> 
> This method is called whenever the collection with the given id shall be
> synchronized.
> 
> Definition at line 1071 of file resourcebase.cpp.
> 
> isn´t exactly helpful. What does synchronize mean?

It will schedule collection resync in ResourceScheduler, which will result in 
ResourceBase::retrieveItems() being called - i.e. the purpose of 
synchronizeCollection() is to update Akonadi cache to be up-to-date with 
remote storage (the maildir folder in this case). This is a uni-directional 
sync, i.e. changes stored in Akonadi are not synced to the maildir)

> 
> Hmmm, okay it calls an collection fetch job, so retrieves all items from it

No, CollectionFetchJob fetches the actual collection (Akonadi::Collection), 
not items in that collection (ItemFechJob does that).
> 
> 724
> 725   if ( dir.endsWith( QLatin1String( ".directory" ) ) ) {
> 726     synchronizeCollectionTree(); //might be too much, but this is not a
> common case anyway
> 727     return;
> 728   }
> 729
> 730   QDir d( dir );
> 731   if ( !d.cdUp() )
> 732     return;
> 733
> 734   Maildir md( d.path() );
> 735   if ( !md.isValid() )
> 736     return;
> 737
> 738   md.refreshKeyCache();
> 
> And this probably is in there to make sure KeyCache of libmaildir also has
> the most recent view on things… but why? Shouldn´t it always have it.
> 
> 
> So everytime the dir is changed this is called.
> 
> 
> Yet I so not understand why it is called on filter. Thus I looked in
> filteragent in kdepim… there is
> 
> 445 bool FilterManager::processContextItem( ItemContext context )
> 446 {
> […]
> 459         if ( context.moveTargetCollection().isValid() &&
> context.item().storageCollectionId() != context.moveTargetCollection().id()
> ) {
> 460             if ( itemCanDelete  ) {
> 461                 Akonadi::ItemMoveJob *moveJob = new
> Akonadi::ItemMoveJob( context.item(), context.moveTargetCollection(), this
> );
> 462                 connect( moveJob, SIGNAL(result(KJob*)),
> SLOT(moveJobResult(KJob*)) );
> 463             } else {
> 464                 return false;
> 465             }
> 466         }
> 
> 
> which is called from
> 
> 406 bool FilterManager::process( const Akonadi::Item& item, bool
> needsFullPayload, const MailFilter* filter )
> 
> 
> ItemMoveJob is in kdepim-runtime/filestore.

You are looking at the wrong ItemMoveJob :). The ItemMoveJob called from 
FilterManager is the one in kdepimlibs/akonadi.

However, the ItemMoveJob in kdepim-runtime is called by Resource 
implementation. If the Maildir resource implements item(s)Moved(), then the 
ItemMoveJob in kdepim-runtime will be called by the resource.

> 
> 
> I looked there as well… but didn´t yet get, how these mail folder
> synchronisations on filtering get called.
> 
> 
> I am tempted to remove
> 
> 738   md.refreshKeyCache();
> 
> for testing from the slotDirChanged method, but I do not really completely
> understand the implications of it.
> 
> 
> Is there a flow chart of mail filtering handling?
> 
> Anyone willing to explain to me how it works? Can make an appointment on IRC
> and I will post the results here.

It's pretty simple. The following description assumes that a new email arrives 
to a maildir folder "INBOX", matches a filter and is moved to folder "Spam"

* new email is downloaded by KMail/POP3, which then calls ItemCreateJob() to 
add the item into Akonadi
* Akonadi stores the email in the cache, and emits itemAdded notification
* Maildir resource and MailFilter agent pick up the notification
	* MaildirResource::itemAdded() is called by AgentBase::Observer
		* resource stores the email in the maildir storage	
	* MailFilterAgent::itemAdded() is called by AgentBase::Observer
		* matching filter is found, Akonadi::ItemMoveJob(item, 
destinationCollection) is started
		* Akonadi moves the item in internal cache and emits itemMoved 
notification
		* MailDir resource picks up the notification
			* MaildirResource::itemMoved is called by AgentBase::observer()
			* resource moves the email in the maildir storage


> Any high level overview of how all this Akonadi stuff works, how the
> individual parts interconnect with each other?

Generically, it works like this:

* ItemAdded notification is generated by Akonadi server, because something (a 
	client, or a resource) invoked ItemCreateJob and added a new item into 
	Akonadi.
* MailFilter agent picks up the notification, decides whether it's interested 
	in it and drops it if not
* If the notification is relevant, the item is matched against all filters 
	until a match is found or all filters are tested
* Implementation of matching filter is invoked
	* This usually invoked calling ItemMoveJob (filter into folder), 
		ItemModifyJob (assign/remove flags), ItemRemoveJob (delete email) etc
		* Akonadi processes the request and emits a new change notification, 
			which is picked up by the owning resource this time to replay the 
			change

In other words, there is no chain processing (currently. I want to have it at 
some point), but MailFilter just picks up change notifications from Akonadi 
and tells back to Akonadi what to do with the item (move/remove/modify/...)

> 
> I also looked at the method reference earlier already, but it didn´t help me
> to see the forest instead of the single trees.
> 
> 
> I will probably try to dig deeper anyway. But for now this is where my
> research is. All I assume is that a ton of needless work is involved. I
> would expect mail filtering to just involve the
 >
>  46 void KeyCache::addNewKey( const QString& dir, const QString& key )
>  47 {
>  48     mNewKeys[dir].insert( key );
>  49   // kDebug() << "Added new key for : " << dir << " key: " << key;
>  50 }
>  51
>  52 void KeyCache::addCurKey( const QString& dir, const QString& key )
>  53 {
>  54     mCurKeys[dir].insert( key );
>  55   // kDebug() << "Added cur key for : " << dir << " key:" << key;
>  56 }
>  57
>  58 void KeyCache::removeKey( const QString& dir, const QString& key )
>  59 {
>  60   //kDebug() << "Removed new and cur key for: " << dir << " key:" <<
> key; 61     mNewKeys[dir].remove( key );
>  62     mCurKeys[dir].remove( key );
>  63 }
> 
> methods of maildir instead of hovering over all files in the folder over and
> over and over again. These methods would just remove and add the one file
> of the mail that is moved.

I would expect that this is the current implementation of 
MaildirResource::itemMoved

disableWatchingFoldersForChanges();
removeKey(emailKey);
newKey = generateNewKeyForEmail(email);
mv srcDir/emailKey destDir/newKey
addKey(newKey)
enableWatchingFoldersForChanges();

> 
> Making this change could superboost mail filtering with large maildirs big
> time, I bet.

I think the biggest bottleneck currently is that the MailFilter agent does not 
do batch processing. Every morning I come to the office, have my inbox synced, 
which means about 500 new emails is pulled from the IMAP server, and then I 
have see 500 ItemMoveJobs being enqueued in Akonadi, and the IMAP resource 
then has to issue 500 MOVE commands to move the items one by one.

> Also… as the name suggests this is a cache. So we have a cache in libmaildir
> and a cache in Akonadi and a cache on the operating system level. Why?

The KeyCache in maildir is AFAIK just a hash for fast lookup between folder 
and maildir IDs (so collection.remoteId() => item.remoteId() lookup)

Akonadi cache is the general cache

Filesystem cache does not hold this kind of mapping afaik.


Dan

> If you have any hint that may help me to fit my single pieces of research
> together to some grand enlightment on how Akonadi works, I am all ears. I am
> holding a training next week again, I may be able to show up on IRC.
> Otherwise its next weekend or after that.
>
> I am willing to document things I learn.
> 
> Actually I think I am not missing all that much anymore. Just can´t connect
> the single parts spread over several git repos together yet.
> 
> Ciao,

-- 
Daniel Vrátil | dvratil at redhat.com | dvratil on #kde-devel, #kontact, #akonadi
KDE Desktop Team
Associate Software Engineer, Red Hat

GPG Key: 0xC59D614F6F4AE348
Fingerprint: 4EC1 86E3 C54E 0B39 5FDD B5FB C59D 614F 6F4A E348
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20140714/b0b5adbd/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list