Review Request 113355: Reduce UDSEntry memory usage with implicit sharing

Tue Oct 22 09:32:43 BST 2013

> On Oct. 21, 2013, 4:26 p.m., Jan Kundrát wrote:
> > Have you tried a naive implementation with a std::vector<std::pair<Key,Value>>? You say that a typical use case has eight entries; that's a very small number where a well-tuned vector could easily beat the O(1) of QHash or the O(log n) of QMap by its O(n) semantics. Linear scan might very well turn out to be faster due to its better cache locality and the associated memory streaming benefits. The STL class has a lower overhead than an implicitly shared QVector (and you do not need implicit sharing of the actual entries, do you?).
> > 
> > Anyway, the point is, if the number is small enough, the big-O notation does not necessarily matter.
> 
> Milian Wolff wrote:
>     This is very true. Regarding std::vector not sharing, this is OK since UDSEntryPrivate _is_ sharing. Also, QString in Field is also shared (and with this patch actually makes use of this). I'd be interested in the performance of a (std::)vector/pair approach.
> 
> Frank Reininghaus wrote:
>     I fully agree that big-O is not always the most important thing. However, kioslaves can in principle add many more than 8 fields to a UDSEntry, and I had thought that there is a reason why it has always used a QHash.
>     
>     About the "std::vector<std::pair<Key,Value>>" idea: sounds interesting. I would assume that this approach uses more memory than my current patch though (if Value == Field). Note that my patch only adds one pointer to the UDSEntry to keep track of the uint keys (assuming that the QHash is shared with many UDSEntries), i.e., 8 bytes on a 64 bit system.
>     
>     In a std::pair<uint, Field>, you would add a uint to every field of the entry, and the compiler might actually add some padding to preserve the alignment of the members of the "Field", such that the uint effectively needs 8 bytes for every entry.
>     
>     Another approach would be to store a QVector<Field>/std::vector<Field> and a separate vector containing the uint keys. A linear scan of the keys would be much faster if they are all stored next to each other, right? In a std::vector<std::pair<uint, Field>>, the different keys would be many bytes apart.
> 
> Jan Kundrát wrote:
>     Regarding the "reason why it has always used a QHash" -- this might be true, or perhaps a programmer used the first thing which came across their mind. I do this all the time.
>     
>     You are right on the analysis of the memory consumption -- storing the keys in the vector (or in the QHash) does have an overhead. The advantage is that it will save you at least one pointer indirection compared to an implicitly shared QMap/QHash/... of keys. That might not be a good choice in this context.
>     
>     I do not have any numbers comparing performance of a linear scan over a vector<pair> on one hand and a scan of a vector<int>. Your idea might or might not be correct; the memory prefetch in CPUs is known to be an impressive piece of silicon. It's entirely possible that the speed improvement of a single vector<int> would get neutralized by a missing prefetch of the target vector<Field>.
>     
>     How does your current patch work when you replace a QVector with a std::vector?
> 
> Milian Wolff wrote:
>     I just thought some more about it and have a potentially even more performant approach, albeit more complicated code-wise. Food for thought though:
>     
>     Just use a std::vector<Field>, but add the uint (uds field) to Field. Then encapsulate m_str and m_long in a union and add a bool m_isStr or similar. I.e.:
>     
>         struct Field {
>           // ctors ...
>           union { // should be sizeof 8
>             QString m_str;
>             long long m_long;
>           };
>           uint m_uds; // sizeof 4 but would incur a 4byte padding
>           bool m_isStr; // partially fill padding
>         };
>     
>     If I'm not mistaken, this has the same size as the current Field struct. Then you'd fill this in a std::vector which gets sorted after it is filled by m_uds. In the lookup functions you can then use binary searches (though at an estimated size of 8, a simple linear search might perform better).
>     
>     /me would be interested in the performance of this approach. Note that you should still keep the QString cache to use implicit sharing there. The benefit here is that you have _no_ overhead as far as I can see.
>     
>     Cheers

Unfortunately, you cannot put a QString (or any other type with non-trivial constructor/destructor) into a union. It's easy to see why: when destructing such an object, how can the compiler know if the 8-byte object that is stored in the union is the d-pointer of a QString or a long long?

- Frank

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/113355/#review42115
-----------------------------------------------------------

On Oct. 21, 2013, 6:23 p.m., Frank Reininghaus wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/113355/
> -----------------------------------------------------------
> 
> (Updated Oct. 21, 2013, 6:23 p.m.)
> 
> 
> Review request for kdelibs.
> 
> 
> Repository: kdelibs
> 
> 
> Description
> -------
> 
> This patch is based on some discussions on the kfm-devel mailing list, see http://lists.kde.org/?t=137935784800003&r=1&w=2
> 
> Mark found out that KIO's UDSEntry class is one of the major consumers of memory in applications which use KIO to list directories with a large number of files, and I found a way use implicit sharing to drastically reduce the amount of memory it needs. Many thanks to Milian for his great blog post http://milianw.de/blog/katekdevelop-sprint-vienna-2012-take-1, without which I would probably not have had such ideas.
> 
> 
> 1. The problem
> 
> The UDSEntry keeps all sorts of information about files which can be stored in a string (name, user, group, etc.) or in a long long (file size, modification time, etc.). All these data can be accessed with a uint key, and UDSEntry returns the corresponding QString or long long in O(1) time. Internally, it stores the data in a QHash<uint, Field>, where Field is a struct that has a QString and a long long member.
> 
> The problem is that QHash needs quite a lot of memory to provide the O(1) access, see http://doc.qt.digia.com/qq/qq19-containers.html for details, and that the minimum capacity of a QHash seems to be 17, even though the number of entries in a UDSEntry is often 8 in the rather typical standard kio_file use case.
> 
> 
> 2. Proposed solution
> 
> (a) We can store the "Fields" in a QVector<Field>, which needs only very little overhead compared to the memory that the actual "Fields" need. When loading a UDSEntry from a QDataStream, we just append all "Fields" to this QVector one by one. Moreover, we need a QHash<uint, int>, which maps each key to the index of its Field in the QVector. This restructuring alone does not reduce the memory usage, of course.
> 
> (b) The key observation is that the QDataStream, which KIO::ListJob reads the UDSEntries from, typically provides the different "Fields" in exactly the same order. This means that the QHash<uint, int> is usually exactly the same for every UDSEntry, and we can make use of implicit sharing to store only one copy of this QHash. I've modified
> 
> void UDSEntryPrivate::load(QDataStream &s, UDSEntry &a)
> 
> such that it remembers the most recent QHash<uint, int> and just adds an implicitly shared copy of it to "a" if the order of the Fields has not changed.
> 
> (c) Moreover, some of the QString Fields in the UDSEntries in one directory are often the same, like, e.g., the user and the group. The "load" function also remembers the most recently read values for each Field in a static QVector<QString> and just puts an implicitly shared copy into the UDSEntry if possible.
> 
> 
> 3. Possible disadvantages
> 
> (a) When using the "remove" member, the new version of UDSEntry does not remove the Field from the QVector<Field>. This means that removing and adding a "Field" repeatedly would let the memory usage grow indefinitely. According to David (http://lists.kde.org/?l=kfm-devel&m=138052519927973&w=2), this doesn't matter though because no known user of UDSEntry uses its remove() member. Maybe we should remove remove (pun stolen grom David) in the frameworks branch then?
> 
> (b) In principle, the old version of UDSEntryPrivate::load(QDataStream&, UDSEntry&) was reentrant. This is not the case for my changed version. Reentrancy could be restored rather easily by protecting the access to the static data with a mutex, but given that most of KIO is not supposed to be used from outside the main thread AFAIK, I don't know if this is necessary.
> 
> 
> 4. Changes since the first version of the patch which I posted in http://lists.kde.org/?l=kfm-devel&m=137962995805432&w=2
> 
> (a) Implemented the minor changes suggested by David in http://lists.kde.org/?l=kfm-devel&m=137975442807965&w=2
> 
> (b) Added a unit test to verify that storing and loading UDSEntries from a stream works even if the order of the fields is permuted, and some fields are removed or added in between.
> 
> (c) Fixed a bug which was uncovered by the test: cachedUdsFields.erase(cachedUdsFields.begin() + i, cachedUdsFields.end()) instead of cachedUdsFields.erase(cachedUdsFields.begin() + i)
> 
> (d) Use QVector::reserve to reserve the appropriate size for the QVector<Field>. Saves some time when loading the UDSEntry, and reduces the memory usage further.
> 
> (e) Changed the type of the loop variable from quint32 to int to fix some compiler warnings.
> 
> 
> Diffs
> -----
> 
>   kio/kio/udsentry.cpp 1e1f503 
>   kio/tests/CMakeLists.txt 1019312 
>   kio/tests/udsentrybenchmark.cpp PRE-CREATION 
>   kio/tests/udsentrytest.h PRE-CREATION 
>   kio/tests/udsentrytest.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/113355/diff/
> 
> 
> Testing
> -------
> 
> Old and new unit tests pass. The memory usage of Dolphin when loading a directory with 100,000 files in Icons View is reduced from 165.4 MB to 113.3 MB. Any application that uses a file dialog, a KDirLister, or anything else that uses a KIO::ListJob to list directory contents should benefit from similar savings.
> 
> 
> Thanks,
> 
> Frank Reininghaus
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20131022/60eec650/attachment.htm>