KIO directory listing - CPU slows down SSD

Mon May 12 11:09:11 UTC 2014

On Monday 12 May 2014 12:34:33 Mark Gaiser wrote:
> On Mon, May 12, 2014 at 10:55 AM, David Faure <faure at kde.org> wrote:
> > On Sunday 11 May 2014 21:57:58 Mark Gaiser wrote:
> >> My theoretical solution (and i really hope to get feedback on this) is
> >> to introduce a worker thread in slavebase.cpp.
> > 
> > First feedback: threads create complexity, so I'm very wary of using them
> > unless absolutely necessary. In any case, *if* there's a good reason for
> > one, do it in kio_file itself, don't impose it on all slaves (which are
> > not ready for it).
> 
> Good point. I wrote this with the intention to have it threaded (or
> async using QtConcurrent) and have it working for all slaves. I guess
> that's a bit too much at this moment.
> 
> >> The slave should be
> >> reduced in functionality. It should not make a UDSEntry anymore.
> >> Instead, it should send the STAT object to slavebase.cpp (no function
> >> for it yet). SlaveBase then puts it in a QVector object which would be
> >> shared with the actual worker thread. The thread would only read from
> >> the QVector which takes away the need to care for thread safety.
> > 
> > Wrong. A read and a write to the same area of memory create a race
> > condition, by definition. You need to protect the vector with a mutex.
> > 
> > Think of the case where append() has to reallocate, too... (but even if
> > you
> > take that case away with a reserve() call, you still have a race on the
> > individual vector items, unless you use a mutex).
> 
> Oh, i thought i would be safe if Thread X (see it as the mainthread)
> would fill up a data structure and then notify Thread Y of changes.
> Thread Y would then only read entries that Thread X has already placed
> in the vector. I thought that would be safe without any care for
> mutexes or race conditions?

No. For the notifications alone you'll need a wait condition and thus a mutex 
and of course you must prevent any background thread from working while the 
foreground thread reads in the data. I really doubt this is going to be useful 
here.

> >> The
> >> workerthread should then process the entries (in a new short lived
> >> vector i guess), create UDSEntry objects and send them over the socket
> >> connection to the client.
> > 
> > Your analysis is that currently this happens:
> > I/O, CPU, I/O, CPU, I/O, CPU, I/O.
> 
> Exactly
> 
> > You want to use threads so that one thread does
> > I/O, I/O, I/O and the other receives that and does CPU, CPU, CPU.
> 
> Exactly
> 
> > I'm not sure this is going to even be faster, the additional overhead
> > (context switching on single-cpu machines, intermediate data structures,
> > mutex protection) will probably be bigger than the gain, since you're not
> > using thread for what they were meant for, when it comes to performance:
> > CPU operations being run in parallel over multiple threads.
> 
> Yes, my fear as well.
> Do you think if QtConcurrent will help here? Since it's fairly simply
> to defer UDSEntry creation to a single function and create it from
> there. Then stuffing it in the QVector. Here i would certainly have to
> use a mutex ;)

QtConcurrent is a bad idea. If at all, use Threadweaver. Also, taking the 
above into account: I think you'll pessimize performance for the usual 
workload of less items per folder. The context switching and synchronization 
will easily make things worse there.

> I'm just not sure about the possible overhead there.. Would it be
> worse or less then a separate thread?
> 
> > I would favour much more an approach that reduces the amount of CPU
> > operations needed (-> making UDSEntry faster) than an approach that uses
> > threads for this.

+1

> Ohh, but patches for that will arrive in reviewboard soon once i
> figured out one last nasty issue :)
> Even with those, i still see I/O, CPU, I/O, CPU... Just faster CPU times.

Then this is a good thing. And maybe one can even optimize it further 
eventually. I have experience this myself. Optimize it, leave it around for a 
few months, then take another look at the problem and find another way to 
speed it up.

Furthermore, you should keep the "perceived" performance in mind. The user 
will see items arrive in the GUI instantly and thus KIO feels fast. If it now 
takes 2 or 3s for 5k items, I don't think its too bad. And if this becomes the 
bottleneck for an application, it should not use KIO in the first place 
probably but rather QDir directly. KIO's biggest feature is the network 
transparency after all. And if you add network latency in the mix, the CPU 
time becomes negligible.

Bye

-- 
Milian Wolff
mail at milianw.de
http://milianw.de