KIO::ListJob::entries emits twice for folders with 100+ entries. Why?

Mark markg85 at gmail.com
Fri Nov 23 16:59:30 GMT 2012


On Fri, Nov 23, 2012 at 5:33 PM, David Faure <faure at kde.org> wrote:
> On Friday 23 November 2012 16:26:59 Mark wrote:
>> 1. SlaveBase::listEntry is really using wrong names and types to get
>> something done. For instance, d->totalSize is a KIO::filesize_t type
>> and is being used to measure a list count. It's in reality no
>> "filesize" at all! That just adds to the confusion.
>
> I understand your confusion. This actually documented, though:
>
>     /**
>      * Call this in get and copy, to give the total size
>      * of the file
>      * Call in listDir too, when you know the total number of items.
>      */
>     void totalSize( KIO::filesize_t _bytes );
>
> Well, it was 2000 or so, someone thought reusing a method was a good idea :)

Code re-using: oke. But this? Not ok! :p
>
>> 2. this is where things get really interesting! The actual
>> file_unix.cpp listdir function seems wrong to me. [1]
>> The function FileProtocol::listDir is basically having two distinct
>> directory listings.
>> 1 for the case where details are 0 (if)
>> 1 for the case where the details are not 0 (else)
>
> Right. And I'm to blame for the details==0 code.
>
> Hmm, now I see the problem already: I forgot to call totalSize() in that code
> path. For speed reasons in that code, I didn't want to iterate multiple times,
> so I emit the stuff as it comes out of readdir...
>
>> Both cases use a different "tactic". If the details are 0 then a
>> folder is read, file by file, and a listEntry(entry, false); is done
>> for each file. So here you do have possible IO time.
>> Now if the details are not 0 then the tactic is different. The
>> folder is read in it's entirety inside a list (line 397 of [1]). Till
>> that moment it hasn't done any calls to listEntry(...) yet! Rather
>> iterates through the list again and builds up the UDSEntry stuff on
>> line 430 (also [1]).
>
> Yes (because it wants to stat() each file, to provide full details).
>
>> While the step for details = 0 is faster (visually) it does include IO
>> time and is therefore probably slower in time compared to details > 0.
>
> No, that is definitely wrong. They both start with the same I/O (readdir), and
> then for details > 0 you get a lot more I/O (stat for each file).

Ahh, didn't notice the stat call yet.
>
>> The details > 0 is only using CPU time and is probably faster thus
>> that's probably why i'm not seeing a second signal in the "details >
>> 0" case.
>
> Isn't it rather that the buffering in SlaveBase gets confused because it
> doesn't know the total number of items to come (the famous "totalSize" int),
> so it emits the items more often than in the details>0 case, where the number
> of items is known?

I don't think listEntry should should have any knowledge about how
many items are going to pass it's function...
>
>> My preference is restructuring FileProtocol::listDir to one of the two
>> cases that it's doing now and adjust FileProtocol::createUDSEntry to
>> handle details = 0 properly.
>
> Moving code to createUDSEntry wouldn't give us much, since the whole point is
> "no stat if details==0", so basically it would be an if at the beginning of
> the method -- no behavior change overall.

Yes indeed. That would be cleaner then it's currently done.
>
> If you want to let SlaveBase know about the total number of items to come,
> then you need to create a QList of UDSEntry in the details==0 code path, add
> the entries to the list, and then emit them. I don't like this, it uses more
> CPU and memory, all this just to give a total to SlaveBase.
>
> I would prefer a fix that makes SlaveBase handle better the case where the
> total number of items isn't known.
>
> Or I would just leave it as it is now, because honestly I don't see a big
> problem with entries() being emitted in two chunks, it's perfectly valid.

Yeah, i know. By now i fully agree with you there :) More signals for
the win! The thing i can't figure out is why batching isn't used as
that, batching. Right now it waits for a fixed number (100) and then
decides to batch or not. I would just do that based on time alone and
simply send out a batch when the time difference gets too large. In
that case you would get multiple signals. As in more then 2 if needed
:p
>
>> What's your opinion here?
>> Note: These issues that i find never stay on the surface do they..
>> This one is diving very deep in KIO, the shortcuts one was diving very
>> deep in KDE internals and even Qt..
>
> Well, this is 2012. The easy bugs were fixed long ago, only the interesting
> ones remain :-)

interesting as in the ones that takes days to fix, are not trival to
debug and have magic code paths, Yeah, i certainly saw a few of those
;)




More information about the kde-core-devel mailing list