KDirModelV2, KDirListerV2 and UDSEntryV2 suggestions

Wed Jan 9 15:42:25 UTC 2013

On Wed, Jan 9, 2013 at 2:51 PM, David Faure <faure at kde.org> wrote:
> On Wednesday 09 January 2013 11:15:20 Mark wrote:
>> A little more in depth questions for KDirLister and KFileItem. In my
>> profiling KFileItem ends up high due to various reasons, but
>> KDirLister is also a bit of a heavy resource hog due to it's default
>> behavior of fetching all file information (thus at least 1 stat call
>> per file) which severely slows down the dir listing process for large
>> folders.
>
> This stat call happens in kio_file though, not in the GUI process where
> KDirLister lives, right?
> So I'm surprised that you see that when profiling...
> or is there a nasty stat() in KDirLister somewhere?

ehm, well i'm not really monitoring stat calls. I'm monitoring the
time it takes for a directory listing on 1 million files to be
available in my application. There i see a massive time difference
when i use details = 0 (no stat) and no details provided (all
details). KDirLister ends up high in the stack because it parses a
massive QByteArray that comes from kio_file. I wasn't actually talking
about the stat call but more about the massive data (as a result of
that stat call) that gets send back.
>
>> My idea is as follows here. By default fetch as little information as
>> possible. So make KDirLister fetch the folder content using "details =
>> 0" by default. Perhaps with an additional KDirLister function (or
>> flag) to change it's behavior to fetch all info (like it works now).
>> The magic should happen in KFileItem. Right now that class isn't lazy
>> loaded at all thus you get as many KFileItem instances as you have
>> entries in the folder your listing. What i want to do here is make it
>> lazy loading. When the class constructs it should have the immediate
>> availability of the current UDSEntry without making anything heavy.
>> The model can then use this class just fine. When a KFileItem is
>> constructed it should (in the background) fetch the file to get all
>> the details. When that's done it should notify the model of the data
>> change which in turn makes the model update it's data.
>
> You realize that "fetching the additional info" means an async kio job, in the
> general case (non local files), right?

Ah crap, didn't thought about that one.. How can that be resolved?
>
> Surely we don't want to do this in KFileItem itself (value class, no QObject,
> no signals), but in KDirLister itself. In fact, we don't want a KIO::stat per
> file, but a full-details listing for everything in one go, otherwise there will
> be a lot of jobs and roundtrips to kioslaves involved. More complexity in
> KDirLister, though, isn't something I'm looking forward to.

I don't know. If the visual experience is faster because it will then
obviously do lazy loading.. More on this below.
>
> Did you check the speed difference between details==0 and details==2?
> For local files it's "just" one stat() call per file (yes that can be a lot of
> calls, but I'd like proof that this actually makes a difference compared to
> everything else). And for other protocols there's no difference, even (with FTP
> you have to list the directory, which gives you full details anyway), so the
> two-stage approach will only make it slower.

Didn't check that one.
>
>> Using this approach the model can show data a lot faster (it doesn't
>> have to wait for the file details anymore) and will automatically
>> update it's entry when KFileITem tells it to do so.The net result for
>> the user should be a much faster (visually) file browsing experience
>> which should be a lot less taxing on the CPU and memory.
>
> This only works in an icon view with no details and no filtering.
> If you ask for details under the icons (size, etc.) it will come in delayed,
> and if you use a list view, the details columns will come in delayed; visually
> this might be ok, or arguable, don't really know. This is starting to be a
> topic for kfm-devel so that the dolphin developers can give their input as
> well...

Dolphin happens to be a big user here, but i'm really not talking
about dolphin specifically. The structures behind it are my target :)
Filtering will indeed be done afterwards, but you have to think about
it. What would be better?
1. A fast usable list that resorts itself when all data is in.
2. A - still fast - list but not usable till all data is in.
3. A really slow and actually unusable list even when all data is in and sorted.
My aim with this thread it to get at number 1. Dolphin right now is at
number 3. Do know that i'm talking about massive folders 100.000+
items.
>
> For sure it will break or complicate filtering. It's more rare so I guess it's
> ok if it works delayed, but it still has to be taken into account.
>
> Overall, I'm just not sure the speed gain vs complexity ratio is high enough.
> We already identified "reading .desktop files and .directory files" as something
> that has to be done delayed, I would very much favor doing that first, it's
> clearly a much bigger performance issue for network-mounted paths. Martin
> Koller had started some time ago, but I'm not sure what the current status is;
> if he doesn't have time to finish maybe you can take over that :-)
>
> --
> David Faure, faure at kde.org, http://www.davidfaure.fr
> Working on KDE, in particular KDE Frameworks 5
>

I would indeed be very interested in seeing what Martin has done already :)

My "theory" is as follows:
-----
Show data as early as possible even if it's incomplete. Defer the
complete data set to:
1. when it actually needs to be shown (thus a _lot_ less data)
2. only fetch the extra data when it's really needed to display or
calculate something
I'm "guessing" a stat call per file will in fact be faster then doing
it all at once (lazy loading part, it will only be done when needed).
With this you can have millions of files and yet only have a few stat
calls for the files you actually see.
Using a two stage approach might be slower but it "should" be a lot
faster in visual terms. Because then you can actually use the file
browser even with millions of files while the details are dripping in.
Right now you have to wait quite a bit for your browser (dolphin) to
even become active. The net result might be slower, but the user
experience is very likely faster.
-----

In all fairness, this is just one big experiment because i'm working
on a QML application that will use these classes a lot. In my
experimenting thus far it turned out that KDE right now is doing "too
much" where it could do a lot less, be faster at it and show the data
earlier and as soon as it drips in. Right now the same is also
possible but it's notably slow and i'm trying to speed up the source
of that slowness (the classes discussed in here). No, QML's is not
slow :)

Cheers,
Mark