KIO directory listing - CPU slows down SSD

Thu May 29 14:32:28 UTC 2014

On Thu, May 29, 2014 at 12:21 AM, Aaron J. Seigo <aseigo at kde.org> wrote:
> On Wednesday, May 28, 2014 21:12:43 Mark Gaiser wrote:
>> You've written that with the assumption of backwards compatibility.
>> It's a nice idea, but why should we even try to remain backwards
>> compatible?
>
> The question should be inverted: why *should* we break compatibility?
>
> Least change always wins, because least change means:
>
> * least effort
> * retaining knowledge (of existing code people know)
> * lower change of introducing new bugs
>
> The question is never "why shouldn't we change things" but "why should we
> change things"...

You're right. In a company i would do it like that as well.

However, i'm not doing this for a company :)
It's just a hobby for me to optimize it to levels that match raw speed
(without any KIO).
I'm "trying" to improve things by making them simpler. If that
requires a new code path: fine with me.

>
>> It's all internal to SlaveBase so we can change whatever
>> we want if that suits a more optimized design.
>
> If it is more optimized, yes.... which is why I did a bunch of measurements on
> the seek-back-and-set method to see if it was slow. (It isn't ...)
>
>> Consider this patch (lives for a month):
>> http://p.sc2.nl/pgfto3npy
>>
>> I don't see any harm in introducing a second path for slaves to send
>> their entries to the client. This new path from the patch above
>> completely eliminates the UDSEntry object need from a SlaveBase point
>> of view. It's then up to the slaves themselves to make use of it.
>
> Yes, this much makes sense. It allows us to preserve the wire format while
> making a significant optimization...
>
> preserving the wire format is particularly desirable as the format is not
> formally described anywhere ("the code is the documentation") and so i would
> consider it highly fragile.
>
>> With some work the current listEntries(const UDSEntryList &list) could
>> even be adjusted to the streaming based mechanism (aka, inject a
>> "startEntry" to the stream right before it does "stream << entry"
>> which would make all slaves use the new path.
>
> Do you mean listEntry? listEntries is already taking a batch of UDSEntry
> objects so not much is won there. Modifying listEntry, however, to use the
> streaming method would be a win.

No, i mean listEntries.
I also mean completely removing the SlaveBase::listEntry call from
slaves. The slaves are filling a stream which would then be send to
the client by calling (the new proposed) SlaveBase::listEntries.

What i mean here is that the current listEntries function that takes a
UDSEntryList can be modified to use the new path.

>
>> Thinking about this again, i won't even need a "endEntry" call as long
>> as each new entry starts with a "startEntry".
>
> The last entry can be closed in the listEntries(const QByteArray&) method.
> Which means endEntry() is unnecessary even with the field count.
+1
>
>> I can then use that to
>> detect the start of a new entry.
>
> Not a great idea, imho. It means checking EVERY field read. It also means
> having to read the next field as if it *was* "startEntry" and then if it isn't
> re-reading it as a "normal" field. That means double deserialization .. but
> even worse, if a field happens to start with the same bit sequence.. ouch. No,
> field count is a good idea.

this ..
>
>> also removes the need to know how many fields are present in a given
>> entry and conveniently also removes the need to move back in a
>> datastream :)
>
> Moving back in the datastream is stupidly cheap. That was the entire point of
> the benchmarking I did :) There's no reason to try and avoid it ...

and this convince me that this is the way to go. Checking for
"startEntry" for every entry is indeed overkill if moving back in the
data stream is so cheap.

>
>> Something else that comes in mind here is the current batching. Right
>> now each batch is 200 entries max. But if we know the actual byte size
>> (which we do when filling a stream) then we might as well change the
>> batching to not exceed X number of kb.
>
> Yep.
>
>> That might be more efficient
>> batching. We obviously keep the timer. Send all entries currently
>> collected once 300ms has passed.
>
> Agreed ... in fact, for the *first* buffer I'd make the timer even shorter to
> give the initial impression of speed. It's a complete hack and wouldn't really
> be any faster (the contrary), but I bet it would be noticeable in the GUI.
>
> (perhaps even a back-off algo -> first time-out is 50ms; if a buffer gets sent
> due to timeout then back-off to 100; repeat until you hit 300 .. easy to
> implement and should hopefully get first items to the client quicker)

This is theory and it works very well. In theory!
In practice things work a bit differently.

The gui seems to be stuttering _while_ new entries flow in. Test it in
dolphin on a massive folder. In fact, those that use kio::listdir for
listing folders only have interfaces that become usable when all
entries are fetched. So while it's a very nice idea in theory, in
practice there isn't much to gain.

Another issue is sorting. You don't often see views "as they are".
It's almost always sorted by X or grouped by X which kinda add a need
for those applications to wait till everything is fetched, then
sort/group and then it's usable for you as a user. I wish it was
different, but it sadly isn't in my experience.

I am trying to change this in my accretion project where i'm not
caring about the entries order. I only care about the entries visible
in the current view and only sort that using std::nth_element and
std::partial_sort. It works very neat already but needs more
optimization.

Unless you can draft up a simple gui that work fast and fluid _while_
entries flow in.
>
>> > soooo.. i think performance wise this should be just fine, seeing as one
>> > would probably be sending buffers on the order of some 100s of KB rather
>> > than 100s of MB ;)
>>
>> I can always increase the benchmark from 500.000 files to even more if
>> needed ^_-
>
> Heh .. but you'd never send 500k files at a time ... :) This is all *per buffer
> sent*, so if you send 400MB of data but do so in 250k chunks this extra
> seeking still won't show up in the callgraph.
>
> On the other hand, hopefully you can make this code fast enough that you'll be
> tempted to raise it to 500k files to REALLY push it. ;)
>
>
> .. you'll also find some perhaps useful cleanups and optimizations in the
> aseigo/cleanups branch in kio. feel free to cherry-pick.

You seem to like KIO :)
>
> --
> Aaron J. Seigo
> _______________________________________________
> Kde-frameworks-devel mailing list
> Kde-frameworks-devel at kde.org
> https://mail.kde.org/mailman/listinfo/kde-frameworks-devel
>