KIO directory listing - CPU slows down SSD

Mark Gaiser markg85 at gmail.com
Sun May 11 19:57:58 UTC 2014


Hi,

I've been playing with KIO speed improvements for quite a while now
and found some very interesting issues with KIO in combination with my
SSD drive: "Samsung SSD 840 PRO Series".

My testcase is one directory filled with 500.000 files to test
directory listing speed. Please don't comment on this big number. I'm
well aware that it's insane! However, it shows bottlenecks that are
there but don't become visible with small sized folders like 1000
entries.

Some numbers. Listing a directory using just C++ and Qt (so QT_STATBUF,
QT_READDIR and QT_LSTAT -- those are just platform defines. Nothing
custom is done there)

500.000 files: ~700ms

Executing the same test using KIO::listDir:

500.000 files: ~4.500ms

Note: my recent incremental cleanups already made it about a second
faster then it used to be. It used to be around 5.500ms. However, the
current speed is still ~7x slower then using raw C++ and Qt. My goal
is to get it within 2x slower then raw C++/Qt.

Now you could say that KIO is using a multiprocess approach thus it
can be expected to be a bit slower. That is exactly what i expect as
well, but not a 7x difference. So i did another benchmark. Testing how
long it takes the kio slave itself (file slave) to list the directory
without sending it back to the client. That gives me an accurate
timing for listing a folder inside the slave. Be aware, for that i
also disabled batching so it really is only doing a listdir and
UDSEntry creation.

The numbers:
500.000 files: ~2.700ms - the patch i used to measure this:
http://p.sc2.nl/p4cs6a26o

And that is very surprising to me. This number means that the CPU all
the stuff we do to an entry is enough to slow down the IO speed. And
my CPU isn't slow by any means, yet there is enough time spend on the
CPU to slow down my SSD performance.

Whatever you think of this insane optimization, i think that the CPU
should never be the cause of slowing the SSD down.

So i want to fix this. The CPU should not slow the SSD down, but this
can't be easily done. In fact, my SSD seems so insanely fast that i
can't even do this on the same thread without slowing down the SSD.

My theoretical solution (and i really hope to get feedback on this) is
to introduce a worker thread in slavebase.cpp. The slave should be
reduced in functionality. It should not make a UDSEntry anymore.
Instead, it should send the STAT object to slavebase.cpp (no function
for it yet). SlaveBase then puts it in a QVector object which would be
shared with the actual worker thread. The thread would only read from
the QVector which takes away the need to care for thread safety. The
workerthread should then process the entries (in a new short lived
vector i guess), create UDSEntry objects and send them over the socket
connection to the client.

This way the IO time can be as fast as possible where the remaining
time is spend in a worker thread. The only real issue i see here is
how to handle the current SlaveBase::send function. That is executed
in the same thread as the slave thus will still block IO time while
sending a batch. I think i need to move this in the worker thread as
well, right?

I'm looking forward to your feedback.

Cheers,
Mark


More information about the Kde-frameworks-devel mailing list