KIO::ListJob::entries emits twice for folders with 100+ entries. Why?

Fri Nov 23 21:23:24 GMT 2012

On Fri, Nov 23, 2012 at 5:59 PM, Mark <markg85 at gmail.com> wrote:
> On Fri, Nov 23, 2012 at 5:33 PM, David Faure <faure at kde.org> wrote:
>> On Friday 23 November 2012 16:26:59 Mark wrote:
>>> 1. SlaveBase::listEntry is really using wrong names and types to get
>>> something done. For instance, d->totalSize is a KIO::filesize_t type
>>> and is being used to measure a list count. It's in reality no
>>> "filesize" at all! That just adds to the confusion.
>>
>> I understand your confusion. This actually documented, though:
>>
>>     /**
>>      * Call this in get and copy, to give the total size
>>      * of the file
>>      * Call in listDir too, when you know the total number of items.
>>      */
>>     void totalSize( KIO::filesize_t _bytes );
>>
>> Well, it was 2000 or so, someone thought reusing a method was a good idea :)
>
> Code re-using: oke. But this? Not ok! :p
>>
>>> 2. this is where things get really interesting! The actual
>>> file_unix.cpp listdir function seems wrong to me. [1]
>>> The function FileProtocol::listDir is basically having two distinct
>>> directory listings.
>>> 1 for the case where details are 0 (if)
>>> 1 for the case where the details are not 0 (else)
>>
>> Right. And I'm to blame for the details==0 code.
>>
>> Hmm, now I see the problem already: I forgot to call totalSize() in that code
>> path. For speed reasons in that code, I didn't want to iterate multiple times,
>> so I emit the stuff as it comes out of readdir...
>>
>>> Both cases use a different "tactic". If the details are 0 then a
>>> folder is read, file by file, and a listEntry(entry, false); is done
>>> for each file. So here you do have possible IO time.
>>> Now if the details are not 0 then the tactic is different. The
>>> folder is read in it's entirety inside a list (line 397 of [1]). Till
>>> that moment it hasn't done any calls to listEntry(...) yet! Rather
>>> iterates through the list again and builds up the UDSEntry stuff on
>>> line 430 (also [1]).
>>
>> Yes (because it wants to stat() each file, to provide full details).
>>
>>> While the step for details = 0 is faster (visually) it does include IO
>>> time and is therefore probably slower in time compared to details > 0.
>>
>> No, that is definitely wrong. They both start with the same I/O (readdir), and
>> then for details > 0 you get a lot more I/O (stat for each file).
>
> Ahh, didn't notice the stat call yet.
>>
>>> The details > 0 is only using CPU time and is probably faster thus
>>> that's probably why i'm not seeing a second signal in the "details >
>>> 0" case.
>>
>> Isn't it rather that the buffering in SlaveBase gets confused because it
>> doesn't know the total number of items to come (the famous "totalSize" int),
>> so it emits the items more often than in the details>0 case, where the number
>> of items is known?
>
> I don't think listEntry should should have any knowledge about how
> many items are going to pass it's function...
>>
>>> My preference is restructuring FileProtocol::listDir to one of the two
>>> cases that it's doing now and adjust FileProtocol::createUDSEntry to
>>> handle details = 0 properly.
>>
>> Moving code to createUDSEntry wouldn't give us much, since the whole point is
>> "no stat if details==0", so basically it would be an if at the beginning of
>> the method -- no behavior change overall.
>
> Yes indeed. That would be cleaner then it's currently done.
>>
>> If you want to let SlaveBase know about the total number of items to come,
>> then you need to create a QList of UDSEntry in the details==0 code path, add
>> the entries to the list, and then emit them. I don't like this, it uses more
>> CPU and memory, all this just to give a total to SlaveBase.
>>
>> I would prefer a fix that makes SlaveBase handle better the case where the
>> total number of items isn't known.
>>
>> Or I would just leave it as it is now, because honestly I don't see a big
>> problem with entries() being emitted in two chunks, it's perfectly valid.
>
> Yeah, i know. By now i fully agree with you there :) More signals for
> the win! The thing i can't figure out is why batching isn't used as
> that, batching. Right now it waits for a fixed number (100) and then
> decides to batch or not. I would just do that based on time alone and
> simply send out a batch when the time difference gets too large. In
> that case you would get multiple signals. As in more then 2 if needed
> :p
>>
>>> What's your opinion here?
>>> Note: These issues that i find never stay on the surface do they..
>>> This one is diving very deep in KIO, the shortcuts one was diving very
>>> deep in KDE internals and even Qt..
>>
>> Well, this is 2012. The easy bugs were fixed long ago, only the interesting
>> ones remain :-)
>
> interesting as in the ones that takes days to fix, are not trival to
> debug and have magic code paths, Yeah, i certainly saw a few of those
> ;)

Fixed :)
Or well, i have a patch for it now.

Here it is: http://paste.kde.org/613670/

The changes:
- It's actually doing batching now regardless of the details level
- Batching is time based only (300ms). So if the listing takes longer
then 300ms then all entries thus far will be send. That continues till
all entries are send.
- and (while at it) simplified file_unix.cpp's FileProtocol::listDir
function so that it only follows one path for listing folders. It
still has a check for details in it, but this "looks" cleaner to me.
And there is no more putting it in a list and iterating over that one
for the details != 0 case.

I tested this out on a folder with 100.000 files in it and in my case
i do get 2 entries signals now with no detail level provided (thus
quite a few details).
I get only one signal with all the entries in it if i pass details as
0. Now it behaves as i was expecting it in the first place.

Do you guys want me to put this on reviewboard or can i just push it
to 4.10? My patches are never perfect the first time so i guess you
guys will have some comments first :p