DVCS crashes and KJob (bug 172309)

Evgeniy Ivanov powerfox at kde.ru
Sun Oct 12 23:36:39 UTC 2008


Andreas Pakulat wrote:
> On 13.10.08 00:09:01, Evgeniy Ivanov wrote:
>> Andreas Pakulat wrote:
>>> On 11.10.08 17:15:46, Evgeniy Ivanov wrote:
>>>> Hi all,
>>>> I've found the following thing in KJob and I think it is causing most
>>>> part of our DVCS crashes:
>>>> http://api.kde.org/4.x-api/kdelibs-apidocs/kdecore/html/kjob_8cpp-source.html#l00186
>>>> Look, start() is implemented in DVCSjob and it can emit result() (or
>>>> connect KProcess to slots emitting it). But it can emit result() before
>>>> loop will be execed and there will be no signals to stop it (deadlock or
>>>> crash depending on deferred deletion).
>>>> I have committed DVCS tests (vcs/dvcs/tests) which prove something
>>>> really wrong with deferred deletion, see dvcsjobTest.cpp:
>>>> First I have a deadlock in first job->exec(). I debugged and saw DVCSjob
>>>> emitted result() before loop is execed. QProcess::waitForStarted()
>>>> before starting a process fixed it.
>>> Interesting that this doesn't seem to be a problem with the svn plugin. It
>>> also has code that calls emitResult inside start() and AFAIK its also uses
>>> with KJob::exec(). However it could be that svn is just more relaxed in the
>>> case of wrong inputs, for example svn info just checks wether the given
>>> location is valid (i.e. KUrl::isValid) and if its not it might still fail
>>> further down the road...
>> QProcess::error() signal is delivered very fast (faster, then finished)
>> The problem is with error() signal or when I set failed without running.
> 
> But that sounds like a problem in your code. There's no speed difference in
> delivering signals to slots. A signal is always a direct function call
> (unless you do signal slots across thread boundaries or force delayed
> signal delivering via queued connections). So whats the problem? You start
> a process and at some point the process errors, then a slot in your job is
> called, which in turn should call emitResult after setting a failed
> message. I can't see how that would pose a problem.

Signal error() can be emitted earlier than signal finished(). For
example in case FailedToStart (which I emulate) there is nothing for qt
to do (and no wait for process had been finished).
And in this case we have a deadlock in KJob.

> 
> A job cannot be re-used. Its a fire-and-forget/fetch result thing. You can
> only start a job once, let it run to its end or error out. After that you
> can fetch data (sometimes even before that) and then throw it away.

My fault. I misunderstood "fire-and-forget", thought I can do
fire-fire-fire and forget when no cartridges left :)
But such mistake exists in tests only (fixed in dvcs and maybe still in
git tests).

>> But it's a heisenbug: we removed the problem in tests, but it's still in
>> the code.
> 
> Huh? I thought the problem was that kjob doesn't quit the event loop, when
> start() calls emitResult? So the patch should fix both the test and the
> real code. If it doesn't, then the test is useless (for the described
> problem, it still shows a potential deadlock in KJob).

The problem was in tests (and it gave the same bt). I removed bzr and hg
— can't reproduce this bug in KDevelop (with only git installed) now.
Heisenbug...

> 
> Anyway, before we post to kde-core-devel lets make sure we understand the
> problem properly.

Well, we have a broken test which can be fixed by your patch.

>> It will be cool if someone try to run a test from vcs/dvcs/tests to
>> confirm it fails without KJob's patch.
>> I tried helgrind... But my hardware didn't like it...
> 
> Hellgrind is a multi-thread debugging tool, as far as I understood dvcs
> doesn't use multiple threads. If it does thats pretty strange as QProcess
> is already asynchronous and hence there's no need for threads.

I thought menus run different threads for the actions.
Btw, it has shown few problems in duchain on start up (with remove
~/.kdevdu*.


>> I don't have any idea what's wrong now and can't reproduce it in tests.
>> Will continue next weekend. If you have any tips, what can be done you're
>> welcome to share :)
> 
> I don't know the code. Maybe I can convince myself that this bug is more
> important than playing games (I'm currently a bit addicted to a new one I
> got) tomorrow and look deeper into this.

I Have reproduced it in vcs/dvcs/test2 application. Looks similar to the
situation we have. Second timer 2 breaks the application. I'm not
familiar with threads, processes, etc. Can you please have a look test2
application?



-- 
Cheers, Evgeniy.
Key fingerprint: F316 B5A1 F6D2 054F CD18 B74A 9540 0ABB 1FE5 67A3





More information about the KDevelop-devel mailing list