KIO directory listing - CPU slows down SSD

Wed May 14 10:50:18 UTC 2014

On Sunday, May 11, 2014 21:57:58 Mark Gaiser wrote:
> Note: my recent incremental cleanups already made it about a second
> faster then it used to be. It used to be around 5.500ms. However, the
> current speed is still ~7x slower then using raw C++ and Qt. My goal
> is to get it within 2x slower then raw C++/Qt.

i did some benchmarking and profiling of this in the past and threading won't 
help that much. i did actually have a pretty simple threading model 
implemented which provided a fairly significantly speed up that was noticeable 
(e.g. 20%+) in the large (>50k files) folder case, but that was really just 
masking the problem: it offloaded the reaaaaaally slow part so that i/o could 
continue, which is classic latency vs throughput. the real solution is to cut 
latency here, by which i mean get rid of UDSEntry and stop using Qt's slow 
collection classes for things like kio_file. and because it was offoading a 
really slow part that came in large chunks (collections of dirents) it didn't 
parallelize all that nicely: on large dirs i/o would finish quickly and the 
thread pool would starve as each thread pushed its way through the UDSEntry 
dance as jobs piled up from the now unencumbered i/o path. the latency of 
threading killed throughput as a result.

with kio_file, the reaaaaaaally slow bit, and i mean REAAAAALY slow, as in 90% 
of wall clock time, is populating and serializing UDSEntrys. in particular, 
adding fields is very expensive. each field is added one at a time and the 
mechanism for storing those fields is horrendously slow. honestly, i don't even 
know why UDSEntry is used for the IOSlave side: it means creating huge numbers 
of small objects which each have their own collection objects holding a 
further inefficient structure (the Field struct in the FieldHash). each of those 
objects has a private data member that is a QSharedData ...

UDSEntry makes sense for the client side (though these days i'd rather just 
have a more efficient private data structure with a QAIM in front of that, but 
compatibility must be preserved) ... but for the ioslave side, especially ones 
like kio_file, it's completely needless overhead as a result of an 
overgeneralization in the code.

for kio_file, the ioslave side basically does:

* list entries via the protocol (e.g. readdir in the case of kio-file)
* turn each entry into a UDSEntry
* add the UDSEntry to a cold store
* pump each UDSEntry out to the client once the cold store is full

what would make so much more sense imho is having kio_file write directly to a 
stream ... which is exactly what those UDSEntry objects end up doing anyways! 
they are created one at a time, and then when there are N of them (or N 
seconds have passed) they serialize out to a QDataStream. for kio_file, the 
UDSEntry fields are NEVER changed and they are NEVER queried individually on 
the ioslave side. they are bulked up and then drained, and for no particular 
reason other than it lets one re-use UDSEntry and have a symmetry on both 
sides of the code base.

it would be fairly trivial to have a class that has an API very much like 
UDSEntry but instead has a QByteArray internally that it just writes 
everything to. the contract with its users would be:

* one entry at a time, serialized
* no changing data once put there
* no getting the data back out

the class would then just batch up data into its QByteArray and then shunt it 
across the wire when needed. no middle men, no 100s of 1000s of objects and 
QLists and << operators. just an API sth like:

	startEntry
	addField
	endEntry

internally it could either batch up the fields and then stream them out when 
endEntry is called, or (and this is what i would do personally) on startEntry 
it would set up the header but with bad data (e.g. field count of 0) and then 
start appending the fields as they come in and increment a field count variable. 
when endEntry is called, update the header with the right field count. this 
would be much faster, though it would require a strictly serial approach to 
creating items. all 'safeties' such as "if you set the 'is a symlink' field 
twice, it gets updated" and would rely much more on the ioslave being well 
behaved. batching up the fields internally (so the equivalent of one UDSEntry's 
internals) and then writing them out would be enough to return these safeties, 
but for kio_file i don't think that would *ever* be useful.

as for determinate order of fields (to take advantage of the static list of 
QStrings to share QString data trick), that would be preserved as the order of 
addField calls in kio_file is fully deterministic: it is the exact same order 
every time.

so while i disagree with some of dfaure's arguments, in particular that we 
should care about single CPU systems (in practice , they simply do not exist 
anymore for KDE software; even mobile is nearly all multicore now) and that 
the intermediate data structures have a lot of overhead (UDSEntry is already 
the definition of "intermediate data structure" and "overhead"; using readdir_r 
instead of readdir has exceptionally little impact next to that) ... he is 
essentially correct: the answer is not threading. it is decreasing the 
operations. such as by kicking UDSEntry to the curb on the ioslave side.

UDSEntry could still use optimization on the client side of ioslave 
relationship, of course ...

-- 
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20140514/fa83d0f4/attachment.sig>