accumulating projects in kate lead to excessive kate startup time due to git
Milian Wolff
mail at milianw.de
Sun Jun 19 20:37:21 BST 2022
On Sonntag, 19. Juni 2022 21:10:27 CEST you wrote:
> On Samstag, 18. Juni 2022 14:15:42 CEST Milian Wolff wrote:
> > Hey all,
> >
> > Kate took ~4s to show its main window on my beefy workstation with lots of
> > RAM, CPUs and speedy NVME disks. I found this quite odd and wondered about
> > the reason so I sat down and profiled it. Perf shows a lot of external git
>
> > processes running sequentially, which I could also replicate with strace:
> <snip>
>
> > b) Can we query the git status in parallel for all projects, instead of
> > serially? My machine has 12 cores and 24 threads, and the NVME disk and
> > ram
> > should also allow this.
>
> Sorry, hit sent too early...
>
> You can download the perfparser file here:
>
> https://milianw.de/files/kate.slow.startup.perfparser
>
> You can open that in hotspot and then go to the off-CPU time flame graph.
> Basically all of that comes from _really_ slow memory allocations, which is
> a first for me. It seems like my system is suffering from some extreme
> slowdowns in `int_malloc` - but only in kate. Other applications don't show
> this behavior, and I'm unsure where this comes from... See the excessively
> slow calls to rwsem_down_read_slowpath from _int_malloc, even in the main
> thread. If you look at the main thread e.g. there we see ~1s off cpu time
> from _int_realloc by _FcConfigParse::FcStrBufData alone!
>
> I'll try to continue to figure this out
OK, this is apparently totally unrelated to git and kate. Thiago, do you
happen to have an insight here maybe? Is it known that using QProcess can
really badly influence the runtime behavior of malloc in other threads?
Here's a small example to trigger this behavior already:
https://invent.kde.org/-/snippets/2239
I have nproc == 24. Let's run this without any external processes:
```
$ perf stat -r 5 ./slow-malloc
Performance counter stats for './slow-malloc' (5 runs):
6,868.17 msec task-clock # 12.781 CPUs utilized
( +- 0.82% )
35,262 context-switches # 5.078 K/sec
( +- 0.73% )
1,518 cpu-migrations # 218.590 /sec
( +- 10.47% )
477,765 page-faults # 68.797 K/sec
( +- 0.23% )
27,414,859,033 cycles # 3.948 GHz
( +- 0.88% ) (84.46%)
9,269,828,127 stalled-cycles-frontend # 33.46% frontend cycles
idle ( +- 0.80% ) (84.58%)
2,503,409,257 stalled-cycles-backend # 9.04% backend cycles
idle ( +- 1.38% ) (82.85%)
12,211,168,505 instructions # 0.44 insn per cycle
# 0.77 stalled cycles
per insn ( +- 0.26% ) (82.54%)
2,699,403,475 branches # 388.710 M/sec
( +- 0.34% ) (82.99%)
7,276,801 branch-misses # 0.27% of all branches
( +- 0.68% ) (84.56%)
0.53735 +- 0.00317 seconds time elapsed ( +- 0.59% )
```
So far so good. Now let's also run `ls /tmp` , which by itself is plenty fast:
```
$ time ls /tmp
real 0m0.006s
user 0m0.000s
sys 0m0.006s
```
Doing that a hundred times per thread as in the example file above should only
take ~600ms. But instead this is what I observe:
```
$ perf stat -r 5 ./slow-malloc --with-subprocess
Performance counter stats for './slow-malloc --with-subprocess' (5 runs):
26,197.00 msec task-clock # 4.373 CPUs utilized
( +- 0.29% )
148,400 context-switches # 5.669 K/sec
( +- 2.19% )
11,287 cpu-migrations # 431.174 /sec
( +- 2.25% )
1,559,820 page-faults # 59.587 K/sec
( +- 0.22% )
99,501,234,050 cycles # 3.801 GHz
( +- 0.15% ) (85.67%)
30,922,803,968 stalled-cycles-frontend # 31.18% frontend cycles
idle ( +- 0.17% ) (85.00%)
21,809,486,987 stalled-cycles-backend # 21.99% backend cycles
idle ( +- 0.74% ) (84.85%)
62,524,522,174 instructions # 0.63 insn per cycle
# 0.49 stalled cycles
per insn ( +- 0.17% ) (84.84%)
14,128,484,480 branches # 539.721 M/sec
( +- 0.27% ) (85.23%)
114,841,497 branch-misses # 0.82% of all branches
( +- 0.26% ) (85.86%)
5.9904 +- 0.0258 seconds time elapsed ( +- 0.43% )
```
And perf off-CPU profiling with hotspot again shows the excessive wait time in
rwsem_down_read_slowpath when _int_malloc hits the asm_exc_page_fault code.
Any insight would be welcome, or suggestions on how to better handle this in
user code.
Thanks
--
Milian Wolff
mail at milianw.de
http://milianw.de
More information about the KWrite-Devel
mailing list