kmail sigh

Mon Jan 29 23:16:16 GMT 2018

Hi All,
This is a repost (with slight changes to make things clearer) of some work I 
did on the akonadi database tools.
I'm reposting it because I hope that it will give a clue to duplicates issue 
with the kmail/akonadi setup. I work exclusively with POP mail.

The creation of duplicate emails is somethines hard to avoid the the problem 
of the creation of duplicates but the problem I describe that I describe where 
new headers are created which which contain no content no content should be 
avoidable. The fsync process does not appear to be checking whether the 
duplicate that is being deleted 

Ok I decided to take matters into my own hands and I have found out some 
stuff.
I won't bore you with all the details but there are are three things I have 
discovered so far.
Probably the most important initially is that 'akonadict fsck' simple fscks 
things worse.
Here's what I have observed and checked out by using queries on the akonadi 
database.
1: Certain processes (which I won't detail at the moment) create duplicate but 
ghost database entries in the one of the primary tables. 
These 'ghosts' have all the same attributes as the original with the exception 
of their size. This is easily confirmed by checking for the original file in 
.local.share/local-mail/<collection folder> You will only find a single file 
there is no trace of the ghost. Thus it must have been introduced by the 
process of building the database.

2: If akonadictl fsck finds one of these duplicates it does not delete the 
ghost it instead deletes the valid entry.  At this point there is a message in 
the database which has not content just the header section
Once the fsck process is complete and the mail reloaded by opening the entry 
for the original mail that exists in the local-mail folder is recreated and 
suddenly you have a duplicate again. This makes it impossible to clean the 
database once a duplicate has ben introduced.

Running "remove duplicates" from the "folders menu" does not remove the 
duplicates either in these circustances since you have two identical header 
sections but only one reference has content. Since the remove duplicates 
algorithm works by comparing the md5 sums of the mails it sees each of the 
mails as unique and thus does not remove the duplicate.

There seem to be a number of actions that create ghost mails. The first seems 
to be the building of the db in the first place if a genuine duplicate is 
found then some kind of action should be taken to alert the user of the 
problem or to simply remove it to a lost&found folder or similar action.
I haven't yet managed to establish whether ghost duplicates are created while 
the tables are being filled from the maildir folders though I have a strong 
suspicion that they are.
Any actions that involve filters almost always create duplicates in the target 
folder.
Some of these situations arise when a database has been rebuilt and mail is 
downloaded agin to make sure nothing has been lost. In these circumstances 
it's entirely possible that duplicates can be created by filters filtering a 
mail to a folder that had had the same mail previously downloaded.

I have found genuine duplicates where the same file occurs in the maildir 
subfolders cur, new and tmp in this case they are genuine files. I have 
studied the db tables but I cannot find an entry that would give a clue to the 
absolute location (whether in cur, new or tmp) of an individual mail. The 
implication of this is that the db has no way of detecting this condition and 
thus no amount of 'akonadictl fscking'  is going to fix it.
This said this condition may have been created by my efforts to get kmail2 
working again with my archived mail which I have done numerous times each time 
my OS updates the package the db breaks again.

In the early days of kmail2 I switched to postgresql as the preferred db . I 
intend to do this again to see whether I can observe the same issues as I have 
with Mariadb the default application shipped with my os.

Best,

Colin Close
itchka at compuserve.com

On Monday, 29 January 2018 19:46:17 GMT Pablo Sanchez wrote:
> [ Comments below, in-line ]
> 
> On Mon, 29 Jan 2018 18:30:16 +0100, Martin Steigerwald wrote:
> > Dear Pablo.
> 
> Hi Martin,
> 
> > Pablo Sanchez - 29.01.18, 17:17:
> >> [ Comments below, in-line ]
> >> 
> >> On Mon, 29 Jan 2018 17:12:41 +0100, Sandro Knauß wrote:
> >>> Hi,
> >>> 
> >>>> If in fact mysql is slower compared to pg, we should be able to
> >>>> instrument and literally see why that's the case.
> >>>> 
> >>>> On my TODO is to set up a kmail development sandbox and work on
> >>>> the DB side of things .... but this has a lot of pre-work.  While
> >>>> I'm a database weenie, it's been eons since I've done any C/C++.
> >>>> Because of this, I've been slacking and doing other hacking.  ;)
> >>> 
> >>> we already provide such an sandbox [0] - it is a Docker image with
> >>> a complete working kmail (master). Where you could play with very
> >>> easily.  I use this docker setup to develop for kdepim. Until now
> >>> I just use the mysql backend, so some deps may be missing for
> >>> running a postgres backend.
> >>> 
> >>> If you have any question regaring the current docker solution - do
> >>> not hesitate to ask.
> >> 
> >> Well, well, well!  Isn't that great news!  Now, if there was only a
> >> docker image to awaken my old C/C++ brain-bits.  :)
> > 
> > Well, I think your database skills already would be helpful on its
> > own!
> 
> Right on.
> 
> > [ db tuning tips trimmed ]
> 
> +1
> 
> > I think this is called team work.
> 
> I always wondered what it was called!  *grin*
> 
> > You do not have to do it all on your own.  You don´t need to be a
> > coder to contribute to the project.
> 
> I'm game.  Which developer(s) would like to work with me?
> 
> We probably should move to email and/or IRC.
> 
> Cheers,
> --
> Pablo Sanchez - Blueoak Database Engineering, Inc
> Ph:    819.459.1926        iNum:  883.5100.0990.1054