KDirModelV2, KDirListerV2 and UDSEntryV2 suggestions

Mark markg85 at gmail.com
Tue Feb 5 13:05:11 UTC 2013


On Tue, Feb 5, 2013 at 11:47 AM, David Faure <faure+bluesystems at kde.org> wrote:
> On Tuesday 05 February 2013 11:05:35 Mark wrote:
>> On Tue, Feb 5, 2013 at 10:19 AM, David Faure <faure+bluesystems at kde.org>
> wrote:
>> > On Tuesday 05 February 2013 09:01:21 Mark wrote:
>> >> The thing i'm puzzling most with right now is how i can optimize
>> >> UDSEntry. Internally it's a hash and that very visible in profiling.
>> >> Also in KFileItem one part that i find a little strange is this line:
>> >> http://quickgit.kde.org/?p=kdelibs.git&a=blob&h=6667a90ee9e1d57488bb7e085
>> >> 167
>> >> 658f2fb9f172&hb=533b48c610319f3ad67e6f5f0cbb65028b009b8f&f=kio%2Fkio%2Fk
>> >> file item.cpp (line 290). That line is causing a chain of performance
>> >> penalties. Which is very odd because i'm testing this benchmark with
>> >> 500.000
>> >> files, not directories. It should not even end up in that if.
>> >
>> > You're reading the if() wrong.
>> > When used via KDirLister, KFileItem is constructed with a base URL and a
>> > UDSEntry. The base URL is the url of the directory (so urlIsDirectory is
>> > true) and the UDSEntry contains the filename (from the kioslave). So
>> > m_url.addPath(m_strName) is done, in order to construct the full URL to
>> > the
>> > file.
>>
>> Ahh oke. It's not that obvious from the code. Thank you for clarifying that
>> one.
>
> (It's documented in the API docs for the KFileItem constructor)
>
>> > The thing is, KDirListerCache keeps all KFileItems in cache, for faster
>> > directory browsing (of already-visited dirs). So if you want to reduce
>> > memory usage, implement a LRU mechanism in KDirListerCache, to throw out
>> > the oldest unused dirs. Would help in real life -- not really in your
>> > testcase though (one huge directory).
>>
>> My intention is to make it fast enough to not even need a cache.
>> Though i'm guessing that goal won't be reached since caching will
>> still be useful for slower media.
>
> Yeah, for FTP and such we'll always want a cache.
> For local files, well, surely a cache will always be faster than starting a
> kioslave, listing a directory, transferring that over a socket, and decoding
> that. I already find the file dialog a bit slow to show up with a dir listing.
> Let's limit its memory usage, but let's not get rid of the cache altogether.
>
>> > Well, the kfileitems are kept around, and each kfileitem has a KUrl in it,
>> > which is kept too. I'm surprised that this would be the main use of
>> > memory though. Well, it's the biggest field in KFileItem, indeed.
>> >
>> > We could of course construct this KUrl on demand (so that the "directory"
>> > part of it is shared amongst all KFileItems, via QString's implicit
>> > sharing)... This would shift the balance towards "more CPU, less memory",
>> > so one would have to check the performance impact of such a change.
>>
>> Just wondering - since this will likely be KF5 material when patched -
>> will this be any better with QUrl in Qt5? Or is QUrl just as "heavy"
>> as KUrl?
>
> Good point: QUrl in Qt5 can be twice as small, because Qt4 kept both the
> encoded and the decoded versions of the fields. I say "can", because I suppose
> it filled that on demand, so I don't know if we were actually filling both
> variants of every field. More details below, in fact.
>
>> Also, lets discuss the memory usage a bit since that really shocks me.
>> I'm having a folder with 500.000 files (generated). All 0 bytes. The
>> filename is like this:
>> a000000.txt - a500000.txt
>> with the path:
>> /home/mark/massive_files/
>>
>> Now if we do a very rough calculation that means one complete full url
>> looks like this:
>> file:///home/mark/a000000.txt
>> That line is 29 characters thus lets say 29 bytes as well.
>
> That would be in a char*. But now think QString, every characher is a QChar,
> i.e. 2 bytes.
>
>> Lets say we
>> need a bit more then that for bookkeeping in QString, perhaps some
>> other unexpected stuff so lets make it 48 bytes (just to be generous)
>
> QUrlPrivate looks rather like this, in Qt4. I added comments with expected
> memory usage (in bytes, on a 64bit machine) for each field.
>
>     QAtomicInt ref;  // 4
>     QString scheme; // "file" -> 8 + 8 [d pointer] + 32 [QString::Data]
>     QString userName; // null -> 8 [d pointer]
>     QString password; // null -> 8
>     QString host;  // null -> 8
>     QString path; // "file:///home/mark/a00000000.txt" -> 58 + 8 + 32
>     QByteArray query; // null -> 8
>     QString fragment; // null -> 8
>     QByteArray encodedOriginal; // hopefully null -> 8, gone in Qt5
>     QByteArray encodedUserName; // hopefully null -> 8, gone in Qt5
>     QByteArray encodedPassword; // hopefully null -> 8, gone in Qt5
>     QByteArray encodedPath; // hopefully null -> 8, gone in Qt5
>     QByteArray encodedFragment; // hopefully null -> 8, gone in Qt5
>     int port; // 4
>     QUrl::ParsingMode parsingMode; // 4, gone in Qt5
>     bool hasQuery; // 1
>     bool hasFragment; // 1
>     bool isValid; // 1
>     bool isHostValid; // 1
>     char valueDelimiter; // 1, gone in Qt5 [moved to QUrlQuery]
>     char pairDelimiter; // 1, gone in Qt5 [moved to QUrlQuery]
>     int stateFlags; // 4, gone in Qt5
>     QMutex mutex; // 8, gone in Qt5
>     QByteArray encodedNormalized; // full url -> 29 + 8 + 32, gone in Qt5
>     QUrlErrorInfo errorInfo; // char*, char*, 2*char, total 20, only 8 in Qt5
>
> Total estimated memory usage for QUrl("file:///home/mark/a000000.txt"):
> in Qt4: 345 bytes, plus d pointer = 353 bytes.
> in Qt5: 206 bytes, plus d pointer = 214 bytes.
>
> You were way understimating this, with "48 bytes" :)

Pff, you can say that again. I wasn't expecting that much
"bookkeeping" for just one url to occur.. Very interesting stuff!
>
>> If we multiple that by 500.000 we get:
>> 48 * 500.000 = 24000000 bytes (22.8 MB)
>
> 353 * 500000 = 176500000 = 168.3 MB

so much! That has to be lowered. It just doesn't seem to be ok to
expand the size of a dataset from ~20MB to ~160MB .. That's an 8to1
ratio.
I wonder how much CPU overhead you would get if you calculate
everything "when needed" and not store it. I mean, storing username,
password, host, query, fragment and some others "might" be useful when
it's needed, but for a local filesystem this is a little wasteful
imho.

Imagine you would design a completely new system here that has
performance and memory efficiency as key points. How would you design
this? I would probably do a little tradeoff so having the most common
function store it's data locally and calculate the rest when needed
without storing anywhere.

>
> And that's just the KUrl in the KFileItems. I surely hope we share that KUrl
> with the hash in KDirListerCache...

I don't know.
>
> QUrl gives us fast parsing, but maybe we should use QStrings as the URL in
> KFileItem and in KDirListerCache's storage. However this would increase the
> risks of mixing up strings and urls, plus losing CPU time in reparsings.

That's something i will have to benchmark once i get there.
>
> Another possible conclusion: who in their right mind puts 500.000 files in a
> directory? :-)

Ha, funny you ;) This is just a testcase that performs .. not optimal
.. If this performs well then i'm guessing all other things will
perform equally well or better. Certainly not worse.


More information about the Kde-frameworks-devel mailing list