Review Request 112979: Make use of multi-threading in KItemListSizeHintResolver

Sun Dec 22 14:20:11 GMT 2013

> On Oct. 1, 2013, 12:04 a.m., Frank Reininghaus wrote:
> > Thanks for looking into this! The layouting procedure is indeed one of the major bottlenecks when entering a directory, and I was also thinking that making use of multiple CPU cores is the easiest way for short-term improvements without major modifications in the layouting code. Your approach looks good to me - I think I would have also tried something like that.
> > 
> > I'm a bit surprised/disappointed though that it saves only 1/3 of the time. In can confirm that observation on my system (with a rather old dual-core AMD 64X2 5000+ CPU, which has to my knowledge no "Turbo boost" feature if only one core is busy, which might otherwise have been an explanation for this). I can currently see two possible explanations for this:
> > 
> > (a) Could it be that QtConcurrent::blockingMap() adds a lot of overhead to each function call? Maybe it's designed for the case that each invocation of the function is expensive (in which case the overhead wouldn't matter much). However, in our case, a single call of "itemSizeHint" is not extremely expensive - it's the large number of calls that is the problem.
> > 
> > But I don't know if that assumption is correct. Maybe QtConcurrent::blockingMap() is clever enough to split the input list into only a few large chunks.
> > 
> > 
> > (b) Maybe some parts of the font/text code in Qt, which is called by KStandardItemListWidgetInformant::itemSizeHint(), are protected by a mutex internally. Then there could be several causes for performance loss. If a large part of the time is spent under mutex protection, then the gain from parallelization is severely limited by Amdahl's Law. Moreover, if the mutex is locked/released frequently by several competing threads, then the performance will suffer greatly because of mutex contention.
> > 
> > 
> > An interesting experiment would be to test your patch on a system with more than 2 physical cores and check how much it improves the performance. If anyone with such a system is willing to try this:
> > 
> > * Uncomment the line "// #define KITEMLISTVIEWLAYOUTER_DEBUG" in dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp
> > * Either switch on all debug output with kdebugdialog or change the kDebug() in this file to qDebug()
> > * Open a folder with many files (100,000 or more) in Icons View (possibly reload it a couple of times with F5).
> > * Apply Emmanuel's patch and repeat the procedure.
> > 
> > If mutex contention is indeed an issue here, then I would expect that 4 or more cores will not bring a much larger performance improvement (compared to the 1/3 that we get with 2 cores), or that it would even get worse.
> > 
> > 
> > Another idea that I had recently (not tested yet, and orthogonal to your patch in the sense that it might improve the performance even in the single-thread case, and that it could be combined with a QtConcurrent approach to bring further improvements):
> > 
> > Much of the code inside KStandardItemListWidgetInformant::itemSizeHint() does exactly the same every time. In Icons View, this is everything except for the calculation of the number of lines for the item text. To improve this, one could replace the function
> > 
> > QSizeF KStandardItemListWidgetInformant::itemSizeHint(int index, const KItemListView* view) const
> > 
> > by
> > 
> > void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF>& sizeHints, const KItemListView* view) const
> > 
> > or something like that, do most of the things only once, and then calculate only the number of lines in a loop.
> > 
> > A rough draft:
> > 
> > void KStandardItemListWidgetInformant::itemSizeHint(QVector<QSizeF> sizeHints, const KItemListView* view) const
> > {
> >     //...
> > 
> >     switch (static_cast<const KStandardItemListView*>(view)->itemLayout()) {
> >     case KStandardItemListWidget::IconsLayout: {
> > 
> >         // Stuff that applies to all items, like...
> >         const qreal itemWidth = view->itemSize().width();
> >         const qreal maxWidth = itemWidth - 2 * option.padding;
> >         const qreal additionalRolesHeight = additionalRolesCount * option.fontMetrics.lineSpacing();
> > 
> >         for (int i = 0; i < sizeHints.count(); ++i) {
> >             if (!sizeHints.at(i).isEmpty()) {
> >                 continue;
> >             }
> > 
> >             const QString text = KStringHandler::preProcessWrap(itemText(index, view));
> > 
> >             // Calculate the number of lines and the "textHeight"
> >             //...
> > 
> >             const qreal totalHeight = qMin(textHeight + additionalRolesHeight, maxTextHeight);
> >             sizeHints[i] = QSizeF(itemWitdth, totalHeight);
> >         }
> > 
> >     // ...
> >     }
> > }
> > 
> >  
> > 
> > Maybe it's worth trying how much time that saves, and how it can be combined with your QtConcurrent idea?
> >
> 
> Emmanuel Pescosta wrote:
>     > Maybe it's worth trying how much time that saves, and how it can be combined with your QtConcurrent idea?
>     
>     Sounds great :)
>     I'll test it!
> 
> Frank Reininghaus wrote:
>     I did have a chance to test your patch on a computer with 4 CPU cores, and restricted the number of threads by calling
>     
>     QThreadPool::globalInstance()->setMaxThreadCount(number);
>     
>     before the QtConcurrent call, with values for 'number' between 1 and 4. When using more than 2 threads, the time needed for the layout is not reduced further, but increases slightly. Moreover, the total CPU time required as reported by 'time dolphin /path/to/large/folder' increases significantly when using more than 1 thread. So it appears that contention is really a problem here, and the 'size hint' calculation using Qt's text engine stuff is not suitable for parallel computation.
>     
>     Have you tried the "calculate all size hints in all function" approach yet?

Thanks for testing it with a 4 core machine!

Patch "calculate all size hints at once": http://pastebin.com/N0FCm04f
Patch "calculate all size hints at once - incl. multithreading": http://pastebin.com/t69bpz5f

The results (average time / 1.000.000 items / 1 additional role / 2 CPU cores):
1. Dolphin unpatched: 
   a. Icons:   12357 ms
   b. Compact: 18742 ms
   c. Details: 86 ms
2. Dolphin patched with "calculate all size hints at once": 
   a. Icons:   11792 ms
   b. Compact: 18322 ms
   c. Details: 21 ms
3. Dolphin patched with "calculate all size hints at once - incl. multithreading": 
   a. Icons:   7288 ms
   b. Compact: 10798 ms
   c. Details: 354 ms

The real performance benefits are too small for these big changes, so I think we should discard this patch. ;)

What do you think?

- Emmanuel

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/112979/#review41057
-----------------------------------------------------------

On Sept. 28, 2013, 9:07 p.m., Emmanuel Pescosta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/112979/
> -----------------------------------------------------------
> 
> (Updated Sept. 28, 2013, 9:07 p.m.)
> 
> 
> Review request for Dolphin.
> 
> 
> Repository: kde-baseapps
> 
> 
> Description
> -------
> 
> Make use of multi-threading in KItemListSizeHintResolver.
> 
> We need the item size hints of all items in KItemListLayouter::doLayout(),
> so instead of resolving it sequentially during layout calculation, 
> we can resolve all item size hints in parallel before we start the 
> layout calculation.
> 
> 
> Diffs
> -----
> 
>   dolphin/src/kitemviews/private/kitemlistsizehintresolver.h 486f9b6 
>   dolphin/src/kitemviews/private/kitemlistsizehintresolver.cpp 0e2286b 
>   dolphin/src/kitemviews/private/kitemlistviewlayouter.h 306fcd3 
>   dolphin/src/kitemviews/private/kitemlistviewlayouter.cpp d6e78ae 
> 
> Diff: https://git.reviewboard.kde.org/r/112979/diff/
> 
> 
> Testing
> -------
> 
> Everything works fine.
> 
> It is about 1/3 faster on my machine (tested with 100k items in item view mode).
> 
> 
> Thanks,
> 
> Emmanuel Pescosta
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20131222/6e68b33f/attachment.htm>