KIO directory listing - CPU slows down SSD

Fri Jun 6 09:10:00 UTC 2014

On Monday, June 2, 2014 20:54:11 Mark Gaiser wrote:
> On Mon, Jun 2, 2014 at 6:42 PM, Aaron J. Seigo <aseigo at kde.org> wrote:
> > On Thursday, May 29, 2014 16:32:28 Mark Gaiser wrote:
> >> On Thu, May 29, 2014 at 12:21 AM, Aaron J. Seigo <ase

> But don't you just move logic to the slave that way?

yes and no.

sorting and grouping are easily abstracted. sort/group currently happens 
client side and doesn't care where the data comes from. so while moving the 
logic to the slave, it should be achieved in a slave-neutral fashion.

> And lose flexibility in the apps using the slave (like dolphin?

if they need to sort / group in some way not supported by KIO they can request 
the results unsorted and then you have exactly the same situation we have now. 
there would be no loss of flexibility.

and really: how often do new sort / group methods appear in KDE file listings? 
very, very rarely. new methods could be also be moved to the ioslave side 
later on where they would be usable by all clients.

> Oh and
> complicating kio "a bit" to pass a sorting and/or grouping style.

yes, it would make the protocol more complex.

> Also, for a slave to give you the n items in a sorted way requires the
> slave to fetch _all_ items to do the sorting.

not all slaves, no. some protocols offer server-side functionality.

the file slave (among others) would indeed need to fetch all items, yes .. but 
that is *precisely* what happens currently on the client side anyways. so 
whatever mechanisms are employed client-side could be done in the slave, minus 
the IPC overhead and with whatever benefits can be gained from using lower-
level API in the slave.

> All that you will save is IPC traffic. It might

not necessarily; common use cases like "sort by name" may end up significantly 
faster if the slave can quickly gather file names and process them into order 
before stat'ing.

> not even be faster. Take a slow ftp for example. Without "slave side
> sorting" you would get your first results after 300ms, guaranteed.

slaves dealing with slow data sources could do incremental updates. done well 
the worst case should be equal to the current typical case.

i used the word "stream" and perhaps that was not the best word to use. it's 
more like synchronized models where population is informed by 

* sorting preference (inc "none")
* grouping preference (inc "none")
* offset (if any)
* hard and soft limits (if any)[1]

> With server side sorting you will have to wait however long the slave
> takes to fetch _all_ items. Then you can do a (partial) sort and send
> it to the client.

if you want perfect sorting (e.g. no re-layouts) then this is true in the 
current case too: the client has to wait for ALL results to be able to sort 
items correctly.

so let's assume that incremental sort is allowable and the streamed data can 
be updated by the slave. in that case one can view it as an incrementally 
updating window onto a dataset.

example: 1000 items to be sorted by name with a normal distribution across all 
values in the names. the client is interested in at least the first 100 items. 
the slave fetches the first 100 entries, sorts them, and they are immediately 
streamed to the client. 

the slave fetches the next 100 entries and adds them to the sorted set. 10 
items (statistically that should be about the rate) are no longer in the first 
100 items in the set; those 10 items are now streamed across with their index 
and updated indexes are sent for items at the previous indexes 89-99.

rinse / repeat until listing is complete.

now if sorting is done by name, this could result in even more speed-ups: only 
the items in the current window of results being sent would need to be stat'd. 
the rest of the items could be stat'd lazily (or when the window shifts)

obviously, for a slave listing youtube results that would be entirely 
unnecessary: simply query the youtube API for the exact result set desired.

> > KIO listing is all-or-nothing batch oriented; a stream-based approach that
> > supports seeking through listings that are pre-sorted/grouped in the slave
> > process would be moderately gorgeous. it would prevent more IPC than
> > necessary and allow the slave to use any&all service-specific features
> > for pre-sort/group of entries.
> 
> So you would save the stream in the slave side?

that would depend partially on the slave (a youtube video slave wouldn't need 
to do this; the file slave might). the client is also going to have a model in 
its process with the data that is populated, so there would always be data 
client-side.

> But how would you
> change the sorting if you just have a stream? Parse all data, sort it
> and put it in a stream again?

hard to say without some experimentation. my first thought would be to divide 
this into two cases: completed listing and listing in progress.

in the case of a completed listing it would happen entirely client-side.

in the case of a listing in progress, perhaps immediately re-sort/group the 
data already on the client side (for fast response) and then incrementally re-
populate from the slave which may elect to hold on to all results until 
listing is completed (for quick re-sort/group if needed)

a file manager might approach the streaming this way:

* if the directory has fewer than N entries, stream entries until the end (can 
be done by simply requesting from the stream until N entries arrive, and only 
go beyond that number if required). for MOST directories this would result in 
a fast listing where items don't move around in the UI and allow changes in 
sort/group to happen client-side

* if the directory has more than N entries, then seek around in the stream as 
needed and if a re-sort / re-group is requested then process what has been 
received already and wait for more responses. this is an edge case, however, 
as MOST directories should fall under the N cap

a file manager for mobile devices would have a smaller N and a desktop file 
manager would have a bigger N.

> > that's easy to do with QML. we have numerous examples of this in plasma
> > active, in fact. the real trick is ensuring that your entries come to you
> > pre- arranged so you don't show them moving around to the user.
> 
> If you have a view - where you don't scroll - then you're right. Then
> it's possible.
> If you have a view where entries flow in _while_ you are scrolling..
> well.. then you're screwed. The current listview just stutters when
> you do a trick like that (and yes, my pc is fast enough). I'm guessing
> it's doing a lot of re calculations and repaints when entries flow in
> and you move through them..?

or that the fetch / processing is happening in the same thread that is 
populating the QML. one of the common tricks for QML apps doing fluid image 
galleries from slow data sources is to fetch the image data in a separate 
thread and supply it as it comes in to the QML. then things very smooth.

-- 
Aaron J. Seigo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20140606/3ce2253f/attachment-0001.sig>