KIO directory listing - CPU slows down SSD

Mon May 12 12:07:23 UTC 2014

On Mon, May 12, 2014 at 1:09 PM, Milian Wolff <mail at milianw.de> wrote:
> On Monday 12 May 2014 12:34:33 Mark Gaiser wrote:
>> On Mon, May 12, 2014 at 10:55 AM, David Faure <faure at kde.org> wrote:
>> > On Sunday 11 May 2014 21:57:58 Mark Gaiser wrote:
>> >> My theoretical solution (and i really hope to get feedback on this) is
>> >> to introduce a worker thread in slavebase.cpp.
>> >
>> > First feedback: threads create complexity, so I'm very wary of using them
>> > unless absolutely necessary. In any case, *if* there's a good reason for
>> > one, do it in kio_file itself, don't impose it on all slaves (which are
>> > not ready for it).
>>
>> Good point. I wrote this with the intention to have it threaded (or
>> async using QtConcurrent) and have it working for all slaves. I guess
>> that's a bit too much at this moment.
>>
>> >> The slave should be
>> >> reduced in functionality. It should not make a UDSEntry anymore.
>> >> Instead, it should send the STAT object to slavebase.cpp (no function
>> >> for it yet). SlaveBase then puts it in a QVector object which would be
>> >> shared with the actual worker thread. The thread would only read from
>> >> the QVector which takes away the need to care for thread safety.
>> >
>> > Wrong. A read and a write to the same area of memory create a race
>> > condition, by definition. You need to protect the vector with a mutex.
>> >
>> > Think of the case where append() has to reallocate, too... (but even if
>> > you
>> > take that case away with a reserve() call, you still have a race on the
>> > individual vector items, unless you use a mutex).
>>
>> Oh, i thought i would be safe if Thread X (see it as the mainthread)
>> would fill up a data structure and then notify Thread Y of changes.
>> Thread Y would then only read entries that Thread X has already placed
>> in the vector. I thought that would be safe without any care for
>> mutexes or race conditions?
>
> No. For the notifications alone you'll need a wait condition and thus a mutex
> and of course you must prevent any background thread from working while the
> foreground thread reads in the data. I really doubt this is going to be useful
> here.
>
>> >> The
>> >> workerthread should then process the entries (in a new short lived
>> >> vector i guess), create UDSEntry objects and send them over the socket
>> >> connection to the client.
>> >
>> > Your analysis is that currently this happens:
>> > I/O, CPU, I/O, CPU, I/O, CPU, I/O.
>>
>> Exactly
>>
>> > You want to use threads so that one thread does
>> > I/O, I/O, I/O and the other receives that and does CPU, CPU, CPU.
>>
>> Exactly
>>
>> > I'm not sure this is going to even be faster, the additional overhead
>> > (context switching on single-cpu machines, intermediate data structures,
>> > mutex protection) will probably be bigger than the gain, since you're not
>> > using thread for what they were meant for, when it comes to performance:
>> > CPU operations being run in parallel over multiple threads.
>>
>> Yes, my fear as well.
>> Do you think if QtConcurrent will help here? Since it's fairly simply
>> to defer UDSEntry creation to a single function and create it from
>> there. Then stuffing it in the QVector. Here i would certainly have to
>> use a mutex ;)
>
> QtConcurrent is a bad idea. If at all, use Threadweaver. Also, taking the
> above into account: I think you'll pessimize performance for the usual
> workload of less items per folder. The context switching and synchronization
> will easily make things worse there.

I actually tried QtConcurrent out now.
It is as you expect, much slower.

It's nice stuff to play with, but in this case it's not worth it.
>
>> I'm just not sure about the possible overhead there.. Would it be
>> worse or less then a separate thread?
>>
>> > I would favour much more an approach that reduces the amount of CPU
>> > operations needed (-> making UDSEntry faster) than an approach that uses
>> > threads for this.
>
> +1
>
>> Ohh, but patches for that will arrive in reviewboard soon once i
>> figured out one last nasty issue :)
>> Even with those, i still see I/O, CPU, I/O, CPU... Just faster CPU times.
>
> Then this is a good thing. And maybe one can even optimize it further
> eventually. I have experience this myself. Optimize it, leave it around for a
> few months, then take another look at the problem and find another way to
> speed it up.

I will certainly do that:)
>
> Furthermore, you should keep the "perceived" performance in mind. The user
> will see items arrive in the GUI instantly and thus KIO feels fast. If it now
> takes 2 or 3s for 5k items, I don't think its too bad. And if this becomes the
> bottleneck for an application, it should not use KIO in the first place
> probably but rather QDir directly. KIO's biggest feature is the network
> transparency after all. And if you add network latency in the mix, the CPU
> time becomes negligible.

Neh, that's not my idea.
In my opinion we should be able to have the benefit of the KIO design
and close to the raw speed of C++/Qt. There are places that can be
tuned without losing functionality and gaining speed. UDSEntry is one,
but sending the data over the socket (even filling the QDataStream) is
another hotspot that can be tuned. How, i don't know. Yet!

But this is obviously an area i just like to play in so i'm sure i
will find something :)

>
> Bye
>
> --
> Milian Wolff
> mail at milianw.de
> http://milianw.de