kmail sigh
Colin Close
itchka at compuserve.com
Mon Jan 29 23:16:16 GMT 2018
Hi All,
This is a repost (with slight changes to make things clearer) of some work I
did on the akonadi database tools.
I'm reposting it because I hope that it will give a clue to duplicates issue
with the kmail/akonadi setup. I work exclusively with POP mail.
The creation of duplicate emails is somethines hard to avoid the the problem
of the creation of duplicates but the problem I describe that I describe where
new headers are created which which contain no content no content should be
avoidable. The fsync process does not appear to be checking whether the
duplicate that is being deleted
Ok I decided to take matters into my own hands and I have found out some
stuff.
I won't bore you with all the details but there are are three things I have
discovered so far.
Probably the most important initially is that 'akonadict fsck' simple fscks
things worse.
Here's what I have observed and checked out by using queries on the akonadi
database.
1: Certain processes (which I won't detail at the moment) create duplicate but
ghost database entries in the one of the primary tables.
These 'ghosts' have all the same attributes as the original with the exception
of their size. This is easily confirmed by checking for the original file in
.local.share/local-mail/<collection folder> You will only find a single file
there is no trace of the ghost. Thus it must have been introduced by the
process of building the database.
2: If akonadictl fsck finds one of these duplicates it does not delete the
ghost it instead deletes the valid entry. At this point there is a message in
the database which has not content just the header section
Once the fsck process is complete and the mail reloaded by opening the entry
for the original mail that exists in the local-mail folder is recreated and
suddenly you have a duplicate again. This makes it impossible to clean the
database once a duplicate has ben introduced.
Running "remove duplicates" from the "folders menu" does not remove the
duplicates either in these circustances since you have two identical header
sections but only one reference has content. Since the remove duplicates
algorithm works by comparing the md5 sums of the mails it sees each of the
mails as unique and thus does not remove the duplicate.
There seem to be a number of actions that create ghost mails. The first seems
to be the building of the db in the first place if a genuine duplicate is
found then some kind of action should be taken to alert the user of the
problem or to simply remove it to a lost&found folder or similar action.
I haven't yet managed to establish whether ghost duplicates are created while
the tables are being filled from the maildir folders though I have a strong
suspicion that they are.
Any actions that involve filters almost always create duplicates in the target
folder.
Some of these situations arise when a database has been rebuilt and mail is
downloaded agin to make sure nothing has been lost. In these circumstances
it's entirely possible that duplicates can be created by filters filtering a
mail to a folder that had had the same mail previously downloaded.
I have found genuine duplicates where the same file occurs in the maildir
subfolders cur, new and tmp in this case they are genuine files. I have
studied the db tables but I cannot find an entry that would give a clue to the
absolute location (whether in cur, new or tmp) of an individual mail. The
implication of this is that the db has no way of detecting this condition and
thus no amount of 'akonadictl fscking' is going to fix it.
This said this condition may have been created by my efforts to get kmail2
working again with my archived mail which I have done numerous times each time
my OS updates the package the db breaks again.
In the early days of kmail2 I switched to postgresql as the preferred db . I
intend to do this again to see whether I can observe the same issues as I have
with Mariadb the default application shipped with my os.
Best,
Colin Close
itchka at compuserve.com
On Monday, 29 January 2018 19:46:17 GMT Pablo Sanchez wrote:
> [ Comments below, in-line ]
>
> On Mon, 29 Jan 2018 18:30:16 +0100, Martin Steigerwald wrote:
> > Dear Pablo.
>
> Hi Martin,
>
> > Pablo Sanchez - 29.01.18, 17:17:
> >> [ Comments below, in-line ]
> >>
> >> On Mon, 29 Jan 2018 17:12:41 +0100, Sandro Knauß wrote:
> >>> Hi,
> >>>
> >>>> If in fact mysql is slower compared to pg, we should be able to
> >>>> instrument and literally see why that's the case.
> >>>>
> >>>> On my TODO is to set up a kmail development sandbox and work on
> >>>> the DB side of things .... but this has a lot of pre-work. While
> >>>> I'm a database weenie, it's been eons since I've done any C/C++.
> >>>> Because of this, I've been slacking and doing other hacking. ;)
> >>>
> >>> we already provide such an sandbox [0] - it is a Docker image with
> >>> a complete working kmail (master). Where you could play with very
> >>> easily. I use this docker setup to develop for kdepim. Until now
> >>> I just use the mysql backend, so some deps may be missing for
> >>> running a postgres backend.
> >>>
> >>> If you have any question regaring the current docker solution - do
> >>> not hesitate to ask.
> >>
> >> Well, well, well! Isn't that great news! Now, if there was only a
> >> docker image to awaken my old C/C++ brain-bits. :)
> >
> > Well, I think your database skills already would be helpful on its
> > own!
>
> Right on.
>
> > [ db tuning tips trimmed ]
>
> +1
>
> > I think this is called team work.
>
> I always wondered what it was called! *grin*
>
> > You do not have to do it all on your own. You don´t need to be a
> > coder to contribute to the project.
>
> I'm game. Which developer(s) would like to work with me?
>
> We probably should move to email and/or IRC.
>
> Cheers,
> --
> Pablo Sanchez - Blueoak Database Engineering, Inc
> Ph: 819.459.1926 iNum: 883.5100.0990.1054
More information about the kdepim-users
mailing list