Review Request 108540: Add some benchmarks for KFileItemModel, improve worst-case performance of KFileItemModel::insertItems() and KFileItemModel::removeItems()

Frank Reininghaus frank78ac at googlemail.com
Tue Jan 22 11:16:43 GMT 2013



> On Jan. 22, 2013, 8:50 a.m., Mark Gaiser wrote:
> > Hi Frank,
> > 
> > I guess i know who you mean by "everyone is invited to join the fun and look at other classes" ;) I try to limit myself to kdelibs in this case where Dolphin would (or could) benefit automatically.
> > 
> > I do have a few suggestions for you.
> > 1. You talk about std::copy, but why do you want to copy anyway? Why don't you just use std::move? (C++11)
> > 2. Why do you insert based on id? Why don't you just append the new list followed by a resort of the entire list?
> > 3. If the new item list is bigger then the current item list them perhaps it might be worth to consider a list swap (QList::swap):
> >  - store the current list in some temp value
> >  - swap the new list to the current list
> >  - append the old items to the new list
> > 
> > I hope these suggestions make sense :)
> > 
> > Also very nice results in the graphs, good job! I'm glad i inspired you to improve dolphin for massive folders ^_-
> > 
> > Cheers,
> > Mark
> 
> Frank Reininghaus wrote:
>     Hi Mark,
>     
>     > I guess i know who you mean by "everyone is invited to join the fun and look at other classes"
>     
>     I really mean everyone, including you, of course :-)
>     
>     > 1. You talk about std::copy, but why do you want to copy anyway? Why don't you just use std::move?
>     
>     We have to copy/move items inside a container. std::move moves the entire contents of a container if I'm not mistaken. The benefit of std::copy compared to assigning the items one by one is that it saves some loops.
>     
>     > 2. Why do you insert based on id? Why don't you just append the new list followed by a resort of the entire list?
>     
>     Because we need to know the positions where the items were inserted. We could of course determine these also after a full resorting of the combined list, but I can imagine quite a few not-so-uncommon cases where this might be a lot more expensive (consider inserting a single new item into a sorted list).
>     
>     > 3. If the new item list is bigger then the current item list them perhaps it might be worth to consider a list swap (QList::swap):
>     
>     That might be faster in some cases, but I doubt that it will be beneficial in the average case. It would require two different code paths for the determination of the 'inserted item ranges'.

>> 2. Why do you insert based on id? Why don't you just append the new list followed by a resort of the entire list?
>
> Because we need to know the positions where the items were inserted. We could of course determine these also after a full resorting of the combined list, but I can imagine quite a few > not-so-uncommon cases where this might be a lot more expensive (consider inserting a single new item into a sorted list).

Thinking about it again: it might make sense to first sort the new items and then merge them with the existing items using the existing 'merge' function. This would require less code - right now, insertItems() basically contains an alternative 'merge' implementation. One could then update the hash m_items and use that to create the KItemRangeList.

This approach might also improve performance for the case that a single item is inserted somewhere in the sorted list. Currently, we need O(N) comparisons due to the linear search in that case.

A similiar approach could be used for removeItems(), and we would not only be faster, but need less code overall. I'll try that.


- Frank


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/108540/#review25961
-----------------------------------------------------------


On Jan. 22, 2013, 7:48 a.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/108540/
> -----------------------------------------------------------
> 
> (Updated Jan. 22, 2013, 7:48 a.m.)
> 
> 
> Review request for Dolphin.
> 
> 
> Description
> -------
> 
> I think it might be a good idea to make sure that Dolphin behaves better when handling many files and started looking at places in the code which have sub-optimal performance. There are probably quite a few of them - I'll start with KFileItemModel. Of course, everyone is invited to join the fun and look at other classes :-) My first target is inserting and removing items. The patch does two things:
> 
> 1. Add a kfileitemmodelbenchmark that tests two things: inserting many files in two equally sized batches, and removing one half from a set of many files. The half that is added/removed can be the first or second half in a sorted list of files, or all odd files. I've made the benchmark a regular executable, rather than a unit test. The reasoning is that unit tests are often run automatically, and their results are only looked at if something goes wrong, so it looks like running benchmarks with every "make test" run might waste CPU cycles. Nonetheless, I do test the model state for correctness, I think that this can never hurt.
> 
> When passing the -xml option to the benchmark, the XML results can be transformed to nice graphs using qtestlib-tools: 
> http://blog.qt.digia.com/blog/2008/12/05/qtestlib-now-with-nice-graphs-pointing-upwards/
> I'll attach screenshots of "before" and "after" benchmark results. The values on the y-axis are milliseconds.
> 
> 2. The problem with KFileItemModel::insertItems() and KFileItemModel::removeItems() is that they insert/remove the items one by one in the QList m_itemData. This was easy to implement, but inserting/removing is O(N) for QList, and if we do that N times, then we can get O(N^2) complexity, which is not good. My test results indeed show that this happens in some cases. The only cases that work well are "add many items at the back" and "remove many items from the back".
> 
> To resolve this, I propose to first only calculate the item ranges that are added/removed, and do the actual inserting/removing only when this is finished, such that the final position for each item is known, and every item has to be moved only once. This does make the code a bit more complicated, but it ensures that adding/removing N items is O(N). One might be able to do it with less code by replacing the int-based indices in the functions by iterators and use std::copy, std::copy_backward to do some of the work, but I didn't want to change too many things at the same time.
> 
> Any thoughts? If there is agreement that this makes sense, I'll start to look at other places in those functions and other parts KFileItemModel. I have some ideas already - for example, I think that we can make sure that removing items is not more expensive than inserting them, which is currently the case (see benchmark results).
> 
> 
> Diffs
> -----
> 
>   dolphin/src/kitemviews/kfileitemmodel.h 304161a 
>   dolphin/src/kitemviews/kfileitemmodel.cpp 69db217 
>   dolphin/src/tests/CMakeLists.txt 6575eec 
>   dolphin/src/tests/kfileitemmodelbenchmark.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/108540/diff/
> 
> 
> Testing
> -------
> 
> Yes, unit tests still work fine. Benchmark results attached (obtained with a release build).
> 
> 
> File Attachments
> ----------------
> 
> Benchmark results for master
>   http://git.reviewboard.kde.org/media/uploaded/files/2013/01/22/master.png
> Benchmark results with patch
>   http://git.reviewboard.kde.org/media/uploaded/files/2013/01/22/patched.png
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20130122/1ecc0b27/attachment.htm>


More information about the kfm-devel mailing list