[Kde-pim] Akonadi: single database design mistake?

Fri Dec 2 18:08:26 GMT 2011

On Fri, Dec 02, 2011 at 09:40:29AM +0100, Martin Steigerwald wrote:
> Am Donnerstag, 1. Dezember 2011 schrieb Dmitry Torokhov:
> > Hi Martin,
> 
> Hi Dmitry,
> 
> > On Thu, Dec 01, 2011 at 06:50:23PM +0100, Martin Steigerwald wrote:
> > > Hi Dmitry!
> > > 
> > > I rarely post here as a user of KDEPIM, but this I like to comment:
> > > 
> > > Am Dienstag, 29. November 2011 schrieb Dmitry Torokhov:
> > > > > Another thing: If we ever come to the point where MySQL is the
> > > > > bottleneck, the  current architecture should make it rather
> > > > > simple to come up with an alternative, optimized architecture.
> > > > > Personally, I just doubt that we are able to design a relational
> > > > > database from scratch that will outperform MySQL so easily...
> > > > 
> > > > Thet is the main question: do we really want a monolithic database
> > > > here.
> > > 
> > > At work we use a Zimbra Collaboration Suite server as our groupware
> > > solution. It uses MySQL to store mail metadata, Lucene to provide a
> > > search index and files to store the actual mail. It is serving about
> > > 30 users all day via Zimbra webclient, outlook and various IMAP
> > > clients - I partly use KMail with it.
> > > 
> > > Now consider this:
> > > 
> > > - about 100 folders including subfolders
> > > - hundred of thousands of mails
> > > - several gigabytes of mail easily (do not see it in the webclient
> > > and do not want to open up a VPN to look in the administrative
> > > interface for the size right now)
> > > - folders with tens of thousands of mails
> > 
> > Here at work we like to eat our dogfood and so we also converted almost
> > entirely to Zimbra. So it is several thousand employees with so many
> > mails, server farm running Zimbra that is partitioned properly, etc,
> > etc. And it all scaled and partitioned nicely and if our infrastructure
> > guys see that the load on one of the nodes gets too big they can bring
> > in another node and so forth. So yes, MySQL apparently can handle that.
> > I never said that MySQL is not suitable for large amounts of data.
> 
> You work at VMware?

Guilty.

> 
> Well you said:
> 
> > Thet is the main question: do we really want a monolithic database
> > here.
> 
> And in your initial post:
> 
> > And this brings me to the question: is stuffing everything into a single
> > database such a good idea?
> 
> And also hinted at the slow speed of KMail during import.

Yes, I did say this. And I still wonder if using single set of mysql
tables to store all mail data ifrom all accounts makes sense. Note that
I am not arguing against having Akonadi serving the data and unified
API for accessing it. Just the use of monolithic database as storage
backend. Not a single database server instance, single _database_.

>  
> > But the point I was trying to make is that I do not want to replicate
> > that setup on my tiny laptop. There is a reason I have IMAP - I
> > _offload_ tasks from laptop to other boxes, such as receiving and
> > sorting mail, anti-spam and anti-virus checks, etc, etc, so that the
> > laptop only does fraction of work required. I do not want to fine-tune
> > MySQL on laptop to make sure indices fit into memory, that the log size
> > is appropriate, and so forth. And my question was - given that there
> > normally a single user (as in person) working with a single folder at a
> > given time, would not it be more effective to restrict the size of the
> > data we are working with to that single folder instead of trying to
> > handle the data as whole.
> 
> But then this is a completely different argument IMHO.
> 
> Except for can it work fast when we stuff everything into a database this 
> is the argument: I have that intelligence on the server, why replicate it 
> on the client?

No, that was not my argument. My argument was that server and laptop
cases are different; good server setup - normally dedicated for a
certain task, lots of memory, fast disks, multitude of concurrent users,
lots of requests to data that is not localized (as in all requests are
for mails in single folder) - is not necessarily proper setup for
laptop case.

> 
> Well now I use KMail with 5 POP accounts - some freemail and my main POP 
> account. Intelligence on the server is restricted to dovecot, Postfix and 
> policyd-weight capabilities. And I do not plan to use Zimbra for my 
> private mail. I have been thinking to convert to IMAP, but then: What for? 
> My current setup does what I need. Except for fast fulltext search and 
> having all the stuff in the background and there I have high hopes for 
> Akonadi.

That would work if you only access your mail from one box. If you prefer
accessing this from different boxes then you really want to put most of
your logic on server.

> 
> KMail is not only for corporate users, but also for personal users. And 
> while most of them might use IMAP already - although I read from quite 
> some POP3 based setups on kdepim-users - not everyone has that amount of 
> intelligence on the server that Zimbra features.
> 
> But even then I see benefits for Akonadi such as better disconnected IMAP 
> with fast fulltext search.

Yes, it can do fast fulltext search. But does _everyone_ need it? It
would be nice if people who are not using it or use it sporadically
would not have to pay the price of indexing everything so that data is
there just in case one might need it.

I really appreciate that strigi can be disabled (not, I really do not
want to index all my git trees, thank you very much); not doing the same
with email would be nice to have too.

> Basically the same was the Zimbra Desktop is 
> doing as well. The Zimbra Desktop client is basically a complete Zimbra 
> Server with web gui

And thus is slower than web client... Unless you really need offline
access I do not see the point.

> - but on the desktop that synchronizes all mail on the 
> server to the desktop. Quite handy to access mail when you are offline, I 
> would think.

Frankly, when I travel and I know I'll be stuck on a plane for a few
hours without network I just use offlineimap to sync my inbox/kernel
lists and use mutt ;) And once I'm done I can simply sync and blow away
all that data switching back to online IMAP. I admit I don't travel that
much.

> 
> > To be fair, after the pain of initial import and after running for a
> > couple days, the system has settled down and is now usable.
> 
> And thats important. From what I read actually I believe that there is 
> *much* room for optimization. But I do not see why a database Akonadi 
> shouldn´t be fast and lean.
> 
> Everybody and his dog is using a database these days: Digikam (by default 
> SQLite3), Amarok (by default MySQL embedded), Firefox for bookmarks 
> (SQLite3). And these are just three examples on the client side. And as 
> long as you do not put Firefox onto a BTRFS - where a co-worker indeed had 
> performance issues, maybe due to problems with a workload with many 
> fsync(), as far as I recall there have been optimizations regarding that 
> in BTRFS recently - these three applications perform well.

Yes, they perform well because they have their own databases; Imagine if
they all stuffed their data into a single database (no, I'm not talking
about having single mysqld instance, come with a schema that has all
akonadi data and these 3 applications data together) and see how well it
works.

> 
> Also Nepomuk from KDE 4.7.2 seems to behave much saner than before. It 
> crawled even this new ThinkPad T520 almost down to a halt at times - I am 
> exaggerating this a bit I admit -, but now is barely noticable. For that 
> it takes more time to index my home folder - than it likely would have 
> taken, didn´t it crash on some file before -, but searching for stuff thats 
> already indexed is really fast. And its a 956 MB big Virtuoso index 
> already.
> 

Unfortunately speed of searches argument does not work for me. I
understand that searching might be important for some people but you (as
in KDE/akonadi/nepomuk developers) need to understand that it is not
_the_ killer feature for everyone. I.e. if a person (me) never uses it
then any amount of CPU cycles spent on indexing and be ready to search
is utter waste in the eyes of such person. And while the cycles burnt
are not really noticeable that's OK, but when you start wondering what
is going on with your box then you start to complain.

Thanks.

-- 
Dmitry
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/