application not responding (ANR) handling

Harald Sitter sitter at kde.org
Fri Aug 2 10:27:25 BST 2024


This has been implemented more or less as described. First ANR report
arrived on sentry already :)

On Sun, Jul 14, 2024 at 7:27 PM Harald Sitter <sitter at kde.org> wrote:
>
> Ciao!
>
> A while ago I was thinking about ANR handling but then forgot about it
> again, some malfunction reminded me of it again so I thought I should
> write down some musings. Maybe y'all have some input as well.
>
> Right now we don't really know when our applications deadlock because
> kwin somewhat gracefully kills the process when it detects no answer
> to window actions, leaving no trace of the malfunction for debugging.
> Even outside that feature it's exceptionally hard for a user to
> generate an ANR report because the user either needs to SEGV the app
> manually (at which point kcrash and drkonqi kick in), or attach a
> debugger (requiring basically developer-level knowledge). All in all a
> garbage situation.
>
> It's actually a bit tricky to solve because currently it seems neither
> POSIX nor Linux have a concept of ANR defects so we need some custom
> metadata on top.
>
> Here's my thinking...
>
> - The way kwin does the killing is in a helper binary that more or
> less simply calls kill() on the stuck pid
> - The kill helper could write some trivial metadata to
> .cache/kwin/anr/$exe.$bootid.$pid.$time_at_time_of_crash.json (the
> name format is the one used by coredumpd as well FWIW)
> - It could then send ABRT instead of KILL as first signal
> - It should probably also make sure the pid actually shoved off in
> some timeout or else send KILL
> - KCrash kicks in and does the handover dance with drkonqi
> - DrKonqi can check for the ANR metadata and then mark the report ANR
> for sentry and bugzilla
>
> 3rd party software would still get ABRT and if they have a crash
> handler they'll be able to handle it, it will look like a random ABRT
> at a glance but they'll have at least the possibility of noticing that
> something is deadlocking in their software. I don't think there's a
> better solution for them right now, seeing as we have no platform way
> to tell them this ABRT was ANR. Of course if they are running outside
> a sandbox they are free to also pick up the kwin ANR metadata.
>
> When no crash handlers of any sort are installed ABRT will by default
> cause a core dump, which then ideally goes into a crash handler daemon
> like coredumpd. In fact, with coredumpd the user is then able to
> excavate such deadlock traces via drkonqi's crashed process viewer.
> Improving the debugging UX of deadlocks in general.
>
> Since systemd also sends ABRT when a service watchdog barks this also
> allows us to notice daemon deadlocks on systems where drkonqi covers
> all software (e.g. plasma-mobile).
>
> Any further thoughts on this?
>
> HS


More information about the kde-core-devel mailing list