KDirModelV2, KDirListerV2 and UDSEntryV2 suggestions

David Faure faure+bluesystems at kde.org
Tue Feb 5 10:47:31 UTC 2013


On Tuesday 05 February 2013 11:05:35 Mark wrote:
> On Tue, Feb 5, 2013 at 10:19 AM, David Faure <faure+bluesystems at kde.org> 
wrote:
> > On Tuesday 05 February 2013 09:01:21 Mark wrote:
> >> The thing i'm puzzling most with right now is how i can optimize
> >> UDSEntry. Internally it's a hash and that very visible in profiling.
> >> Also in KFileItem one part that i find a little strange is this line:
> >> http://quickgit.kde.org/?p=kdelibs.git&a=blob&h=6667a90ee9e1d57488bb7e085
> >> 167
> >> 658f2fb9f172&hb=533b48c610319f3ad67e6f5f0cbb65028b009b8f&f=kio%2Fkio%2Fk
> >> file item.cpp (line 290). That line is causing a chain of performance
> >> penalties. Which is very odd because i'm testing this benchmark with
> >> 500.000
> >> files, not directories. It should not even end up in that if.
> > 
> > You're reading the if() wrong.
> > When used via KDirLister, KFileItem is constructed with a base URL and a
> > UDSEntry. The base URL is the url of the directory (so urlIsDirectory is
> > true) and the UDSEntry contains the filename (from the kioslave). So
> > m_url.addPath(m_strName) is done, in order to construct the full URL to
> > the
> > file.
> 
> Ahh oke. It's not that obvious from the code. Thank you for clarifying that
> one.

(It's documented in the API docs for the KFileItem constructor)

> > The thing is, KDirListerCache keeps all KFileItems in cache, for faster
> > directory browsing (of already-visited dirs). So if you want to reduce
> > memory usage, implement a LRU mechanism in KDirListerCache, to throw out
> > the oldest unused dirs. Would help in real life -- not really in your
> > testcase though (one huge directory).
> 
> My intention is to make it fast enough to not even need a cache.
> Though i'm guessing that goal won't be reached since caching will
> still be useful for slower media.

Yeah, for FTP and such we'll always want a cache.
For local files, well, surely a cache will always be faster than starting a 
kioslave, listing a directory, transferring that over a socket, and decoding 
that. I already find the file dialog a bit slow to show up with a dir listing.
Let's limit its memory usage, but let's not get rid of the cache altogether.

> > Well, the kfileitems are kept around, and each kfileitem has a KUrl in it,
> > which is kept too. I'm surprised that this would be the main use of
> > memory though. Well, it's the biggest field in KFileItem, indeed.
> > 
> > We could of course construct this KUrl on demand (so that the "directory"
> > part of it is shared amongst all KFileItems, via QString's implicit
> > sharing)... This would shift the balance towards "more CPU, less memory",
> > so one would have to check the performance impact of such a change.
> 
> Just wondering - since this will likely be KF5 material when patched -
> will this be any better with QUrl in Qt5? Or is QUrl just as "heavy"
> as KUrl?

Good point: QUrl in Qt5 can be twice as small, because Qt4 kept both the 
encoded and the decoded versions of the fields. I say "can", because I suppose 
it filled that on demand, so I don't know if we were actually filling both 
variants of every field. More details below, in fact.

> Also, lets discuss the memory usage a bit since that really shocks me.
> I'm having a folder with 500.000 files (generated). All 0 bytes. The
> filename is like this:
> a000000.txt - a500000.txt
> with the path:
> /home/mark/massive_files/
> 
> Now if we do a very rough calculation that means one complete full url
> looks like this:
> file:///home/mark/a000000.txt
> That line is 29 characters thus lets say 29 bytes as well. 

That would be in a char*. But now think QString, every characher is a QChar, 
i.e. 2 bytes.

> Lets say we
> need a bit more then that for bookkeeping in QString, perhaps some
> other unexpected stuff so lets make it 48 bytes (just to be generous)

QUrlPrivate looks rather like this, in Qt4. I added comments with expected 
memory usage (in bytes, on a 64bit machine) for each field.

    QAtomicInt ref;  // 4
    QString scheme; // "file" -> 8 + 8 [d pointer] + 32 [QString::Data]
    QString userName; // null -> 8 [d pointer]
    QString password; // null -> 8
    QString host;  // null -> 8
    QString path; // "file:///home/mark/a00000000.txt" -> 58 + 8 + 32
    QByteArray query; // null -> 8
    QString fragment; // null -> 8
    QByteArray encodedOriginal; // hopefully null -> 8, gone in Qt5
    QByteArray encodedUserName; // hopefully null -> 8, gone in Qt5
    QByteArray encodedPassword; // hopefully null -> 8, gone in Qt5
    QByteArray encodedPath; // hopefully null -> 8, gone in Qt5
    QByteArray encodedFragment; // hopefully null -> 8, gone in Qt5
    int port; // 4
    QUrl::ParsingMode parsingMode; // 4, gone in Qt5
    bool hasQuery; // 1
    bool hasFragment; // 1
    bool isValid; // 1
    bool isHostValid; // 1
    char valueDelimiter; // 1, gone in Qt5 [moved to QUrlQuery]
    char pairDelimiter; // 1, gone in Qt5 [moved to QUrlQuery]
    int stateFlags; // 4, gone in Qt5
    QMutex mutex; // 8, gone in Qt5
    QByteArray encodedNormalized; // full url -> 29 + 8 + 32, gone in Qt5
    QUrlErrorInfo errorInfo; // char*, char*, 2*char, total 20, only 8 in Qt5

Total estimated memory usage for QUrl("file:///home/mark/a000000.txt"):
in Qt4: 345 bytes, plus d pointer = 353 bytes.
in Qt5: 206 bytes, plus d pointer = 214 bytes.

You were way understimating this, with "48 bytes" :)

> If we multiple that by 500.000 we get:
> 48 * 500.000 = 24000000 bytes (22.8 MB)

353 * 500000 = 176500000 = 168.3 MB

And that's just the KUrl in the KFileItems. I surely hope we share that KUrl 
with the hash in KDirListerCache...

QUrl gives us fast parsing, but maybe we should use QStrings as the URL in 
KFileItem and in KDirListerCache's storage. However this would increase the 
risks of mixing up strings and urls, plus losing CPU time in reparsings.

Another possible conclusion: who in their right mind puts 500.000 files in a 
directory? :-)

-- 
David Faure, faure at kde.org, http://www.davidfaure.fr
Sponsored by BlueSystems and KDAB to work on KDE Frameworks



More information about the Kde-frameworks-devel mailing list