[Kde-pim] Trying to understand mail filtering handling with maildir resource
Daniel Vrátil
dvratil at redhat.com
Mon Jul 14 10:03:11 BST 2014
On Sunday 13 of July 2014 15:14:09 Martin Steigerwald wrote:
> Hi!
>
> In order to finally break^H^H^H^H^H fix up performance of maildir resource I
> try to understand maildir filtering handling with maildir resource.
> Especially I want to ask the question:
Hi,
thanks for looking into this. I'm not a maildir expert, but I'll try to answer
what I know.
> Why does filtering mails into a folder trigger a full folder sync including
> comparing all files each time?
By folder sync, do you mean a sync triggered by Akonadi (that is
MaildirResource::retriveItems() is called ( == folder sync in Akonadi terms),
or does the resource just trigger some updates of it's internal cache? This is
not clear to me from your email.
> I started from libmaildir and the only case it compares full directory
> contents seem to be in
>
> 838 void Maildir::refreshKeyCache()
> 839 {
> 840 KeyCache::self()->refreshKeys( d->path );
> 841 }
Wouldn't the proper question be "what calls this method"? If I understand it
correctly, your concern is, that filtering a mail into maildir folder triggers
full sync ( == calls refreshKeys?), so you should be trying to find out what
triggers the refreshKeys method, and try to optimize that.
>
> which calls:
>
> 26 void KeyCache::addKeys( const QString& dir )
> 27 {
> 28 if ( !mNewKeys.contains( dir ) ) {
> 29 mNewKeys.insert( dir, listNew( dir ) );
> 30 //kDebug() << "Added new keys for: " << dir;
> 31 }
> 32
> 33 if ( !mCurKeys.contains( dir ) ) {
> 34 mCurKeys.insert( dir, listCurrent( dir ) );
> 35 //kDebug() << "Added cur keys for: " << dir;
> 36 }
> 37 }
> 38
> 39 void KeyCache::refreshKeys( const QString& dir )
> 40 {
> 41 mNewKeys.remove( dir );
> 42 mCurKeys.remove( dir );
> 43 addKeys( dir );
> 44 }
>
> which essential calls the routines I switched not to sort the filenames. The
> routines which list all the filenames of a mail folder into a buffer.
>
>
> Maildir resource calls this in:
>
> 711 void MaildirResource::slotDirChanged(const QString& dir)
> 712 {
> 713 QFileInfo fileInfo( dir );
> 714 if ( fileInfo.isFile() ) {
> 715 slotFileChanged( fileInfo );
> 716 return;
> 717 }
> 718
> 719 if ( dir == mSettings->path() ) {
> 720 synchronizeCollectionTree();
> 721 synchronizeCollection( Collection::root().id() );
> 722 return;
> 723 }
>
> TODO: Figure out what these exactly do.
>
> Documentation
>
> void ResourceBase::synchronizeCollection ( qint64 id )
> protected
>
> This method is called whenever the collection with the given id shall be
> synchronized.
>
> Definition at line 1071 of file resourcebase.cpp.
>
> isn´t exactly helpful. What does synchronize mean?
It will schedule collection resync in ResourceScheduler, which will result in
ResourceBase::retrieveItems() being called - i.e. the purpose of
synchronizeCollection() is to update Akonadi cache to be up-to-date with
remote storage (the maildir folder in this case). This is a uni-directional
sync, i.e. changes stored in Akonadi are not synced to the maildir)
>
> Hmmm, okay it calls an collection fetch job, so retrieves all items from it
No, CollectionFetchJob fetches the actual collection (Akonadi::Collection),
not items in that collection (ItemFechJob does that).
>
> 724
> 725 if ( dir.endsWith( QLatin1String( ".directory" ) ) ) {
> 726 synchronizeCollectionTree(); //might be too much, but this is not a
> common case anyway
> 727 return;
> 728 }
> 729
> 730 QDir d( dir );
> 731 if ( !d.cdUp() )
> 732 return;
> 733
> 734 Maildir md( d.path() );
> 735 if ( !md.isValid() )
> 736 return;
> 737
> 738 md.refreshKeyCache();
>
> And this probably is in there to make sure KeyCache of libmaildir also has
> the most recent view on things… but why? Shouldn´t it always have it.
>
>
> So everytime the dir is changed this is called.
>
>
> Yet I so not understand why it is called on filter. Thus I looked in
> filteragent in kdepim… there is
>
> 445 bool FilterManager::processContextItem( ItemContext context )
> 446 {
> […]
> 459 if ( context.moveTargetCollection().isValid() &&
> context.item().storageCollectionId() != context.moveTargetCollection().id()
> ) {
> 460 if ( itemCanDelete ) {
> 461 Akonadi::ItemMoveJob *moveJob = new
> Akonadi::ItemMoveJob( context.item(), context.moveTargetCollection(), this
> );
> 462 connect( moveJob, SIGNAL(result(KJob*)),
> SLOT(moveJobResult(KJob*)) );
> 463 } else {
> 464 return false;
> 465 }
> 466 }
>
>
> which is called from
>
> 406 bool FilterManager::process( const Akonadi::Item& item, bool
> needsFullPayload, const MailFilter* filter )
>
>
> ItemMoveJob is in kdepim-runtime/filestore.
You are looking at the wrong ItemMoveJob :). The ItemMoveJob called from
FilterManager is the one in kdepimlibs/akonadi.
However, the ItemMoveJob in kdepim-runtime is called by Resource
implementation. If the Maildir resource implements item(s)Moved(), then the
ItemMoveJob in kdepim-runtime will be called by the resource.
>
>
> I looked there as well… but didn´t yet get, how these mail folder
> synchronisations on filtering get called.
>
>
> I am tempted to remove
>
> 738 md.refreshKeyCache();
>
> for testing from the slotDirChanged method, but I do not really completely
> understand the implications of it.
>
>
> Is there a flow chart of mail filtering handling?
>
> Anyone willing to explain to me how it works? Can make an appointment on IRC
> and I will post the results here.
It's pretty simple. The following description assumes that a new email arrives
to a maildir folder "INBOX", matches a filter and is moved to folder "Spam"
* new email is downloaded by KMail/POP3, which then calls ItemCreateJob() to
add the item into Akonadi
* Akonadi stores the email in the cache, and emits itemAdded notification
* Maildir resource and MailFilter agent pick up the notification
* MaildirResource::itemAdded() is called by AgentBase::Observer
* resource stores the email in the maildir storage
* MailFilterAgent::itemAdded() is called by AgentBase::Observer
* matching filter is found, Akonadi::ItemMoveJob(item,
destinationCollection) is started
* Akonadi moves the item in internal cache and emits itemMoved
notification
* MailDir resource picks up the notification
* MaildirResource::itemMoved is called by AgentBase::observer()
* resource moves the email in the maildir storage
> Any high level overview of how all this Akonadi stuff works, how the
> individual parts interconnect with each other?
Generically, it works like this:
* ItemAdded notification is generated by Akonadi server, because something (a
client, or a resource) invoked ItemCreateJob and added a new item into
Akonadi.
* MailFilter agent picks up the notification, decides whether it's interested
in it and drops it if not
* If the notification is relevant, the item is matched against all filters
until a match is found or all filters are tested
* Implementation of matching filter is invoked
* This usually invoked calling ItemMoveJob (filter into folder),
ItemModifyJob (assign/remove flags), ItemRemoveJob (delete email) etc
* Akonadi processes the request and emits a new change notification,
which is picked up by the owning resource this time to replay the
change
In other words, there is no chain processing (currently. I want to have it at
some point), but MailFilter just picks up change notifications from Akonadi
and tells back to Akonadi what to do with the item (move/remove/modify/...)
>
> I also looked at the method reference earlier already, but it didn´t help me
> to see the forest instead of the single trees.
>
>
> I will probably try to dig deeper anyway. But for now this is where my
> research is. All I assume is that a ton of needless work is involved. I
> would expect mail filtering to just involve the
>
> 46 void KeyCache::addNewKey( const QString& dir, const QString& key )
> 47 {
> 48 mNewKeys[dir].insert( key );
> 49 // kDebug() << "Added new key for : " << dir << " key: " << key;
> 50 }
> 51
> 52 void KeyCache::addCurKey( const QString& dir, const QString& key )
> 53 {
> 54 mCurKeys[dir].insert( key );
> 55 // kDebug() << "Added cur key for : " << dir << " key:" << key;
> 56 }
> 57
> 58 void KeyCache::removeKey( const QString& dir, const QString& key )
> 59 {
> 60 //kDebug() << "Removed new and cur key for: " << dir << " key:" <<
> key; 61 mNewKeys[dir].remove( key );
> 62 mCurKeys[dir].remove( key );
> 63 }
>
> methods of maildir instead of hovering over all files in the folder over and
> over and over again. These methods would just remove and add the one file
> of the mail that is moved.
I would expect that this is the current implementation of
MaildirResource::itemMoved
disableWatchingFoldersForChanges();
removeKey(emailKey);
newKey = generateNewKeyForEmail(email);
mv srcDir/emailKey destDir/newKey
addKey(newKey)
enableWatchingFoldersForChanges();
>
> Making this change could superboost mail filtering with large maildirs big
> time, I bet.
I think the biggest bottleneck currently is that the MailFilter agent does not
do batch processing. Every morning I come to the office, have my inbox synced,
which means about 500 new emails is pulled from the IMAP server, and then I
have see 500 ItemMoveJobs being enqueued in Akonadi, and the IMAP resource
then has to issue 500 MOVE commands to move the items one by one.
> Also… as the name suggests this is a cache. So we have a cache in libmaildir
> and a cache in Akonadi and a cache on the operating system level. Why?
The KeyCache in maildir is AFAIK just a hash for fast lookup between folder
and maildir IDs (so collection.remoteId() => item.remoteId() lookup)
Akonadi cache is the general cache
Filesystem cache does not hold this kind of mapping afaik.
Dan
> If you have any hint that may help me to fit my single pieces of research
> together to some grand enlightment on how Akonadi works, I am all ears. I am
> holding a training next week again, I may be able to show up on IRC.
> Otherwise its next weekend or after that.
>
> I am willing to document things I learn.
>
> Actually I think I am not missing all that much anymore. Just can´t connect
> the single parts spread over several git repos together yet.
>
> Ciao,
--
Daniel Vrátil | dvratil at redhat.com | dvratil on #kde-devel, #kontact, #akonadi
KDE Desktop Team
Associate Software Engineer, Red Hat
GPG Key: 0xC59D614F6F4AE348
Fingerprint: 4EC1 86E3 C54E 0B39 5FDD B5FB C59D 614F 6F4A E348
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 490 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20140714/b0b5adbd/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/
More information about the kde-pim
mailing list