Some optimization opportunities
Daniël Mantione
daniel at deadlock.et.tudelft.nl
Sat Jun 19 11:02:07 CEST 2004
Hi,
This week I dropped my laptop unfortunately it didn't survive. Now I
find myself using my old K6 again. Not that I don't have a fast desktop,
but often it runs that other commercial operating system and it's often
more convenient to use the k6 than reboot my computer.
Well, of course everything feels a lot slower. I did some investigations
wether I could tweak things a bit. One of the first things I did was to
convert all icons to 256 colours. I'm not sure if KDE will convert them
back to 32-bit colours, but if it doesn't this'll reduce their memory
usage 4 times.
Next I started to investigate what KDE programs do at startup. So I used
strace on a konqueror that was starting up and I found some things I'd
like to share with you. (The KDE version in use here is 3.2.1.)
First interresting observation, loading an icon:
32761 15:36:39.647614 access("/usr/opt/kde3/share/apps/konqueror/icons/crystalsvg/16x16/actions/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.648313 access("/etc/opt/kde3/share/icons/crystalsvg/16x16/actions/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.648922 access("/usr/opt/kde3/share/icons/crystalsvg/16x16/actions/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.649527 access("/etc/opt/kde3/share/icons/crystalsvg/16x16/apps/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.650119 access("/usr/opt/kde3/share/icons/crystalsvg/16x16/apps/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.650712 access("/etc/opt/kde3/share/icons/crystalsvg/16x16/devices/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.651355 access("/usr/opt/kde3/share/icons/crystalsvg/16x16/devices/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.651961 access("/etc/opt/kde3/share/icons/crystalsvg/16x16/filesystems/www.png", R_OK) = -1 ENOENT (No such file or directory)
32761 15:36:39.652560 access("/usr/opt/kde3/share/icons/crystalsvg/16x16/filesystems/www.png", R_OK) = 0
32761 15:36:39.653258 open("/usr/opt/kde3/share/icons/crystalsvg/16x16/filesystems/www.png", O_RDONLY|O_LARGEFILE) = 12
32761 15:36:39.653793 fstat64(12, {st_mode=S_IFREG|0644, st_size=926, ...}) = 0
32761 15:36:39.654312 fstat64(12, {st_mode=S_IFREG|0644, st_size=926, ...}) = 0
32761 15:36:39.654765 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40e6e000
32761 15:36:39.655238 _llseek(12, 0, [0], SEEK_SET) = 0
32761 15:36:39.656136 read(12, "\211PNG\r\n\32\n\0\0\0\rIHDR\0\0\0\20\0\0\0\20\10\6\0\0"..., 4096) = 926
32761 15:36:39.657881 close(12) = 0
Apparently, KDE needs 6 ms to find an icon, while needing just 4 ms to
load and decompress it. From these 4 ms, a lot of time is spent between
opening the file and reading the data in it, so the actual decompression
time is not really a lot. I speculate that the png file is decompressed
before the file is closed, and not much interresting happens between the
"access" system calls.
If this is true, there is a lot of system call overhead. It does seem
there is room for optimization here. First, it seems nonsense to be to
search for icons in the etc directory, which accounts for half of the
"access" system calls. More seriously, an "access" system call is done for
each subdirectory. This is a scalability issue, if more subdirectories
will be added in the future, things will slow down.
I'm going to throw all my icons in a single subdirectory.
Lastly, there are some unnecessary system calls while loading the image.
"fstat64" is being called twice and the "_llseek" system call is a nop.
The next interresting point I saw was during opening op qtrc:
32761 15:36:37.710200 open("/usr/lib/qt3//etc/settings/qtrc", O_RDONLY|O_LARGEFILE) = 5
32761 15:36:37.711074 fstat64(5, {st_mode=S_IFREG|0644, st_size=6529, ...}) = 0
32761 15:36:37.712417 fstat64(5, {st_mode=S_IFREG|0644, st_size=6529, ...}) = 0
32761 15:36:37.712924 old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40979000
32761 15:36:37.713365 read(5, "[3.1]\nlibraryPath=/root/.kde/lib"..., 4096) = 4096
32761 15:36:37.732684 brk(0) = 0x80ae000
32761 15:36:37.733075 brk(0x80af000) = 0x80af000
32761 15:36:37.748649 brk(0) = 0x80af000
32761 15:36:37.749015 brk(0x80b0000) = 0x80b0000
32761 15:36:37.763967 brk(0) = 0x80b0000
32761 15:36:37.764336 brk(0x80b1000) = 0x80b1000
32761 15:36:37.767200 read(5, "hi Mincho^eAr Pl Kaitim Big5^eAr"..., 4096) = 2433
32761 15:36:37.781304 brk(0) = 0x80b1000
32761 15:36:37.781690 brk(0x80b2000) = 0x80b2000
32761 15:36:37.792246 brk(0) = 0x80b2000
32761 15:36:37.792614 brk(0x80b3000) = 0x80b3000
32761 15:36:37.799938 fstat64(5, {st_mode=S_IFREG|0644, st_size=6529, ...}) = 0
32761 15:36:37.800689 close(5) = 0
32761 15:36:37.801111 munmap(0x40979000, 4096) = 0
32761 15:36:37.801552 fcntl64(4, F_SETLKW64, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}, 0xbfffdff0) = 0
32761 15:36:37.802112 close(4) = 0
32761 15:36:37.803218 access("/root/.qt/qtrc", F_OK) = -1 ENOENT (No such file or directory)
32761 15:36:37.805648 brk(0) = 0x80b3000
32761 15:36:37.806039 brk(0x80b4000) = 0x80b4000
32761 15:36:37.818747 brk(0) = 0x80b4000
32761 15:36:37.819153 brk(0x80b5000) = 0x80b5000
32761 15:36:37.826493 brk(0) = 0x80b5000
32761 15:36:37.826891 brk(0x80b6000) = 0x80b6000
32761 15:36:37.834954 brk(0) = 0x80b6000
32761 15:36:37.835353 brk(0x80b7000) = 0x80b7000
32761 15:36:37.843527 brk(0) = 0x80b7000
32761 15:36:37.843926 brk(0x80b8000) = 0x80b8000
32761 15:36:37.852115 brk(0) = 0x80b8000
32761 15:36:37.852516 brk(0x80b9000) = 0x80b9000
32761 15:36:37.860777 brk(0) = 0x80b9000
32761 15:36:37.861219 brk(0x80ba000) = 0x80ba000
32761 15:36:37.870319 brk(0) = 0x80ba000
32761 15:36:37.870715 brk(0x80bb000) = 0x80bb000
32761 15:36:37.877668 open("/usr/X11R6/lib/X11/locale/locale.alias", O_RDONLY) = 4
Apparently a lot of memory is needed here, only it is allocated 4kb at a
time. It's unclear how many system call overhead there is, as likely a lot
happens between each system call. It is however interresting to see that
it takes about 90 ms to parse qtrc (the time between opening and closing
the file), which is a lot compared to decompressing the png files. 20 kb
of memory is allocated, qtrc is 6529 bytes large.
I noted an interresting behaviour when libraries are being loaded:
32761 15:36:39.070284 open("/opt/kde3/lib/libmng.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
32761 15:36:39.070793 open("/usr/lib/qt3/lib/libmng.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)
32761 15:36:39.071641 open("/etc/ld.so.cache", O_RDONLY) = 13
32761 15:36:39.072059 fstat64(13, {st_mode=S_IFREG|0644, st_size=101052, ...}) = 0
32761 15:36:39.072505 old_mmap(NULL, 101052, PROT_READ, MAP_PRIVATE, 13, 0) = 0x40e6e000
32761 15:36:39.072938 close(13) = 0
32761 15:36:39.073332 open("/usr/lib/libmng.so.1", O_RDONLY) = 13
This behaviour might be beyond control of the KDE team, but nevertheless
it is interresting to see that time is wasted trying to find libraries in
directories while they cannot be found. LD_LIBRARY_PATH was empty, so it
would seem to be more logical that the library is being looked up in
ld.so.cache right away. Also strange is that ld.so.cache is being opened
and closed for each library that needs to be loaded. The àmount of time
this exercise wastes is limited though, just 3 ms.
Another observation:
32761 15:36:42.449984 gettimeofday({1087565802, 450171}, NULL) = 0
32761 15:36:42.450714 gettimeofday({1087565802, 450905}, NULL) = 0
32761 15:36:42.451667 gettimeofday({1087565802, 451861}, NULL) = 0
32761 15:36:42.452777 gettimeofday({1087565802, 452980}, NULL) = 0
32761 15:36:42.453542 gettimeofday({1087565802, 453736}, NULL) = 0
32761 15:36:42.454275 gettimeofday({1087565802, 454467}, NULL) = 0
32761 15:36:42.455007 gettimeofday({1087565802, 455198}, NULL) = 0
32761 15:36:42.455732 gettimeofday({1087565802, 455924}, NULL) = 0
32761 15:36:42.456456 gettimeofday({1087565802, 456650}, NULL) = 0
32761 15:36:42.457184 gettimeofday({1087565802, 457373}, NULL) = 0
32761 15:36:42.457913 gettimeofday({1087565802, 458105}, NULL) = 0
32761 15:36:42.458727 gettimeofday({1087565802, 458925}, NULL) = 0
32761 15:36:42.459436 ioctl(5, FIONREAD, [0]) = 0
32761 15:36:42.459841 write(6, "\0", 1) = 1
32761 15:36:42.460624 gettimeofday({1087565802, 460817}, NULL) = 0
32761 15:36:42.461707 gettimeofday({1087565802, 461913}, NULL) = 0
32761 15:36:42.462460 gettimeofday({1087565802, 462654}, NULL) = 0
32761 15:36:42.463461 gettimeofday({1087565802, 463667}, NULL) = 0
32761 15:36:42.464204 gettimeofday({1087565802, 464393}, NULL) = 0
32761 15:36:42.465155 gettimeofday({1087565802, 465353}, NULL) = 0
32761 15:36:42.466002 gettimeofday({1087565802, 466196}, NULL) = 0
32761 15:36:42.466913 gettimeofday({1087565802, 467113}, NULL) = 0
32761 15:36:42.467762 gettimeofday({1087565802, 467961}, NULL) = 0
32761 15:36:42.468747 gettimeofday({1087565802, 468946}, NULL) = 0
32761 15:36:42.469698 gettimeofday({1087565802, 469896}, NULL) = 0
32761 15:36:42.470562 gettimeofday({1087565802, 470756}, NULL) = 0
32761 15:36:42.475218 gettimeofday({1087565802, 475440}, NULL) = 0
32761 15:36:42.475899 gettimeofday({1087565802, 476089}, NULL) = 0
32761 15:36:42.476416 gettimeofday({1087565802, 476590}, NULL) = 0
32761 15:36:42.477673 gettimeofday({1087565802, 478056}, NULL) = 0
32761 15:36:42.479127 gettimeofday({1087565802, 479343}, NULL) = 0
32761 15:36:42.479735 gettimeofday({1087565802, 479921}, NULL) = 0
32761 15:36:42.480452 gettimeofday({1087565802, 480642}, NULL) = 0
32761 15:36:42.482380 gettimeofday({1087565802, 482612}, NULL) = 0
32761 15:36:42.482945 gettimeofday({1087565802, 483115}, NULL) = 0
32761 15:36:42.483548 gettimeofday({1087565802, 483726}, NULL) = 0
32761 15:36:42.484345 gettimeofday({1087565802, 484540}, NULL) = 0
Calling gettimeofday this often seems a little strange to me and certainly
an important part of the 35 ms here is wasted due to system call overhead.
I hope this info is usefull in further optimizing KDE. I am not subscribed
to this list, however I will monitor the archives the next days. If you
can cc me in possible replies that would be nice though, since it's hardly
to reply to a message in a list archive.
Greetings,
Daniël Mantione
More information about the Kde-optimize
mailing list