KIO directory listing - CPU slows down SSD
Aaron J. Seigo
aseigo at kde.org
Wed May 14 10:50:18 UTC 2014
On Sunday, May 11, 2014 21:57:58 Mark Gaiser wrote:
> Note: my recent incremental cleanups already made it about a second
> faster then it used to be. It used to be around 5.500ms. However, the
> current speed is still ~7x slower then using raw C++ and Qt. My goal
> is to get it within 2x slower then raw C++/Qt.
i did some benchmarking and profiling of this in the past and threading won't
help that much. i did actually have a pretty simple threading model
implemented which provided a fairly significantly speed up that was noticeable
(e.g. 20%+) in the large (>50k files) folder case, but that was really just
masking the problem: it offloaded the reaaaaaally slow part so that i/o could
continue, which is classic latency vs throughput. the real solution is to cut
latency here, by which i mean get rid of UDSEntry and stop using Qt's slow
collection classes for things like kio_file. and because it was offoading a
really slow part that came in large chunks (collections of dirents) it didn't
parallelize all that nicely: on large dirs i/o would finish quickly and the
thread pool would starve as each thread pushed its way through the UDSEntry
dance as jobs piled up from the now unencumbered i/o path. the latency of
threading killed throughput as a result.
with kio_file, the reaaaaaaally slow bit, and i mean REAAAAALY slow, as in 90%
of wall clock time, is populating and serializing UDSEntrys. in particular,
adding fields is very expensive. each field is added one at a time and the
mechanism for storing those fields is horrendously slow. honestly, i don't even
know why UDSEntry is used for the IOSlave side: it means creating huge numbers
of small objects which each have their own collection objects holding a
further inefficient structure (the Field struct in the FieldHash). each of those
objects has a private data member that is a QSharedData ...
UDSEntry makes sense for the client side (though these days i'd rather just
have a more efficient private data structure with a QAIM in front of that, but
compatibility must be preserved) ... but for the ioslave side, especially ones
like kio_file, it's completely needless overhead as a result of an
overgeneralization in the code.
for kio_file, the ioslave side basically does:
* list entries via the protocol (e.g. readdir in the case of kio-file)
* turn each entry into a UDSEntry
* add the UDSEntry to a cold store
* pump each UDSEntry out to the client once the cold store is full
what would make so much more sense imho is having kio_file write directly to a
stream ... which is exactly what those UDSEntry objects end up doing anyways!
they are created one at a time, and then when there are N of them (or N
seconds have passed) they serialize out to a QDataStream. for kio_file, the
UDSEntry fields are NEVER changed and they are NEVER queried individually on
the ioslave side. they are bulked up and then drained, and for no particular
reason other than it lets one re-use UDSEntry and have a symmetry on both
sides of the code base.
it would be fairly trivial to have a class that has an API very much like
UDSEntry but instead has a QByteArray internally that it just writes
everything to. the contract with its users would be:
* one entry at a time, serialized
* no changing data once put there
* no getting the data back out
the class would then just batch up data into its QByteArray and then shunt it
across the wire when needed. no middle men, no 100s of 1000s of objects and
QLists and << operators. just an API sth like:
startEntry
addField
endEntry
internally it could either batch up the fields and then stream them out when
endEntry is called, or (and this is what i would do personally) on startEntry
it would set up the header but with bad data (e.g. field count of 0) and then
start appending the fields as they come in and increment a field count variable.
when endEntry is called, update the header with the right field count. this
would be much faster, though it would require a strictly serial approach to
creating items. all 'safeties' such as "if you set the 'is a symlink' field
twice, it gets updated" and would rely much more on the ioslave being well
behaved. batching up the fields internally (so the equivalent of one UDSEntry's
internals) and then writing them out would be enough to return these safeties,
but for kio_file i don't think that would *ever* be useful.
as for determinate order of fields (to take advantage of the static list of
QStrings to share QString data trick), that would be preserved as the order of
addField calls in kio_file is fully deterministic: it is the exact same order
every time.
so while i disagree with some of dfaure's arguments, in particular that we
should care about single CPU systems (in practice , they simply do not exist
anymore for KDE software; even mobile is nearly all multicore now) and that
the intermediate data structures have a lot of overhead (UDSEntry is already
the definition of "intermediate data structure" and "overhead"; using readdir_r
instead of readdir has exceptionally little impact next to that) ... he is
essentially correct: the answer is not threading. it is decreasing the
operations. such as by kicking UDSEntry to the curb on the ioslave side.
UDSEntry could still use optimization on the client side of ioslave
relationship, of course ...
--
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20140514/fa83d0f4/attachment.sig>
More information about the Kde-frameworks-devel
mailing list