Review Request 115432: Only initialize the hash m_items in KFileItemModel if it is needed

Mon Feb 3 09:50:02 GMT 2014

> On Feb. 2, 2014, 8 p.m., Mark Gaiser wrote:
> > Hi Frank,
> > 
> > You're adding quite a bit of complexity.
> > 
> > Is it an option to just remove m_items and change the storage of m_itemData? That is already one big list with all data in it.
> > If you change that to a QHash<KUrl, KFileItem> and drop m_items you reduce quite a bit it seems. It will certainly be a bit bigger in memory, but you get rid of m_data completely so there might be no tradeoff memory wise (right?).
> > 
> > In terms of speed.. I don't know which one will be faster. I'm guessing a QList wins there.
> 
> Frank Reininghaus wrote:
>     Hi Mark, thanks for your comment. I'm not sure if I fully understand what you mean - a QHash<KUrl, KFileItem> is an unordered container, but the files have to appear in the view in a well-defined order, and we also need index-based access. We would lose that if we only used the hash for storage, and then we woudl have to find some other way to establish the "index -> KFileItem" mapping.
> 
> Mark Gaiser wrote:
>     Hi Frank,
>     
>     Ahh, i didn't think about that. In my models for Accretion i use a QSortFilterProxyModel for the sorting and index which works very well, but that isn't an option in Dolphin due to it's completely different model system.
> 
> Frank Reininghaus wrote:
>     I don't think that this has anything to do with the different model/view system. Even if you use a QSortFilterProxyModel to sort a subclass of QAbstractItemModel, then the "source model" must provide index-based access to the items in the model (and keeping the contents (or pointers to the contents) in an ordered container inside the source model is the easiest way to achieve that). Or are you saying that Accretion keeps all data in a QHash and lets QSortFilterProxyModel handle the rest? I would be very surprised if that was possible.
> 
> Mark Gaiser wrote:
>     "Or are you saying that Accretion keeps all data in a QHash and lets QSortFilterProxyModel handle the rest? I would be very surprised if that was possible."
>     
>     Exactly! well, nearly exactly. I actually store all entries in a QVector.
>     Now you probably want to know how that is done.
>     I'm going to skip class name reasoning and much of how it all works and jump right in class names and small descriptions.
>     
>     When you open a folder it will be stored in a KDirectory class under m_filteredEntries: https://gitorious.org/kdirchainrebuild/master/source/kdirectoryprivate_p.cpp
>     
>     My models have a bit of a layered structure:
>     DirListModel: https://gitorious.org/kdirchainrebuild/master/source/models/dirlistmodel.cpp The model actually having all entries. The outside world (QML) never sees this model.
>     
>     A DirGroupedModel (which inherits a QSortFilterProxyModel) https://gitorious.org/kdirchainrebuild/master/source/models/dirgroupedproxymodel.h is what is known to the outside world. This model knows the before mentioned DirListModel. DirGroupedModel is "just" taking care of grouping a folder by:
>     - name
>     - extension
>     - type
>     - ... much more possibilities exactly one must always be chosen. I also have a type called None for no grouping but uses the same logic.
>     
>     Once you have chosen a type to group on then DirGroupedModel creates new DirGroupedProxyModel (https://gitorious.org/kdirchainrebuild/master/source/models/dirgroupedproxymodel.cpp) instances for each unique mime (assuming we wanted mime types). The you end up with sub models like so:
>     - model with image/jpg items
>     - model with inode type (folders)
>     - ...
>     Important here is that each sub model is put in DirGroupedProxyModel along with a type to filter on (image/jpg, ...). DirGroupedProxyModel then takes care of the filtering. Also important here is that the initial model with all items in it (DirListModel) is passed in each sub model using setSourceModel. Passing the initial model every time is something i still have to look at since it smells like a possible bottleneck.
>     
>     This allows me to group by type and within each group sort in any way i want. I first wanted this in Dolphin, but it seemed very difficult to patch dolphin up to support sorting per group.
>     It is far from easy and took me a good week to understand and implement in Accretion. This structure allows me to do everything by index and works very well. I guess QSortFilterProxyModel is doing the mapping from it's new index to the index in the source model and is probably also redirecting data(...) role calls back to the source model.
>     
>     It really works wonderful, you should try it sometime :)
>     Oh and sorry for - again - being extremely off topic in a reviewrequest. If you want to know more, feel free to send me a mail.

Right, so to summarize our discussion:

Mark: Use a QHash for everything!

Frank: That doesn't work. We need a container that keeps its items in a well-defined order and provides index-based access!

Mark: The QSortFilterProxyModel does the trick!

Frank: Really? Does QSortFilterProxyModel sort the items in your QHash?

Mark: Exactly! well, nearly exactly. I actually store all entries in a QVector.

QVector is a container that stores all items in a well-defined order and provides index-based access, so I have no further questions now ;-)

About the sorting/grouping stuff: This is really off-topic, but since you brought it up: it's certainly a nice idea, but to be honest, I'm not sure if this feature is useful enough for a large enough user group to justify adding not only an additional layer of indirection (in the form of another proxy model) between the view and the actual data, but also a considerable amount of visual and conceptual clutter to a default file manager. I could imagine that quite a few users will be confused if there is not only a "Sort by -> Name/Date/etc." menu, but also a "Group by -> Name/Date/etc." one. But let us please not continue that discussion in this review request.

- Frank

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/115432/#review48778
-----------------------------------------------------------

On Feb. 2, 2014, 7:03 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/115432/
> -----------------------------------------------------------
> 
> (Updated Feb. 2, 2014, 7:03 p.m.)
> 
> 
> Review request for Dolphin.
> 
> 
> Bugs: 329494
>     http://bugs.kde.org/show_bug.cgi?id=329494
> 
> 
> Repository: kde-baseapps
> 
> 
> Description
> -------
> 
> KFileItemModel has a member
> 
> QHash<KUrl, int> m_items
> 
> which makes it possible to find the index of an item if the URL is known. m_items is updated every time an item is added to/removed from the current directory, or if the sort order changes. This is convenient in many situations, but sometimes, this hash also causes problems:
> 
> 
> 1. In some corner cases, the model can get into a state where the indexes in m_items are not correct. There were, e.g.,
> 
> https://bugs.kde.org/show_bug.cgi?id=324371
> https://bugs.kde.org/show_bug.cgi?id=325359
> 
> which I could reproduce by doing a search in Details View and expanding folders, such that the same URL appeared twice in the view (but only once in m_items).
> 
> But I think that there are other ways to make the relationship between m_items and the model contents inconsistent. A crash that looks like it is most likely caused by such problems is
> 
> https://bugs.kde.org/show_bug.cgi?id=329494
> 
> 
> 2. Initializing m_items costs CPU cycles and memory, which is wasteful if m_items is not used at all.
> 
> 
> Therefore, I propose to make the following changes:
> 
> (a) Do not require that all URLs are in m_items any more. Instead, the new requirement is: if there are N URLs in m_items, it is guaranteed that they are the first N URLs in the model.
> 
> (b) Replace most calls of m_items.value(url) by index(url) in KFileItemModel. Note that some local variables 'index' had to be renamed for this to work.
> 
> (c) In KFileItemModel::index(const KUrl&), check if the URL is in m_items already. If that is not the case, add 1000 URLs to the hash, and check again. Repeat until either the URL is found, or all URLs are in the hash (-1 is returned in the latter case). Note that we always know where to continue adding URLs: if m_items.count() is N, then "m_itemData.at(i)->item.url()" is the first URL that is not in m_items.
> 
> BTW, my first version of index(const KUrl&) added URLs to the hash one by one, and checked every time if the next URL is the right one by comparing the URLs with ==. However, this was slow and required a lot more memory because comparing URLs with the == operator triggers a parsing of the URL, which needs memory and CPU cycles. I don't know if this is still the case in Qt 5, but in any case, adding new URLs in larger batches and then checking if the hash contains the searched URL should have a lower amortized cost in any case.
> 
> (d) In insertItems() and removeItems(), clear the hash m_items. It will be re-populated the next time index() is called. Note that Dolphin < 4.11 also cleared the entire hash in these cases (it re-created the entire hash immediately though). In contrast to that, 4.11 and 4.12 only update the URLs/indexes starting from the first added/removed item. This can be problematic though - in bug 329494, we get a crash when trying to remove one of the removed URLs from m_items.
> 
> 
> Diffs
> -----
> 
>   dolphin/src/kitemviews/kfileitemmodel.h 0224290 
>   dolphin/src/kitemviews/kfileitemmodel.cpp b3c56e1 
> 
> Diff: https://git.reviewboard.kde.org/r/115432/diff/
> 
> 
> Testing
> -------
> 
> I could not find any regressions. Even if I could never reproduce the crash in bug 329494, I'm confident that the crash is fixed because the line where it tries to access the URL of a removed KFileItem and then crashes is removed (and there is no other access to the KFileItem in that function).
> 
> Memory usage when loading a directory with 100,000 files is reduced by about 5 MiB in all view modes (from about 150 to 145) if previews are disabled. (If previews are enabled, KFileItemModelRolesUpdater calls the index(KUrl&) method, which fills m_items, and memory usage is as before).
> 
> The time for loading a directory with 100,000 files is reduced by about 200 ms on my system (to about 4.3 seconds in Icons View, 3.7 in Compact and 2.8 in Details).
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20140203/87d06c55/attachment.htm>