Review Request: Fix crash due to deleteLater being sent to the wrong object sometimes

Simeon Bird bladud at gmail.com
Sun Dec 23 23:16:50 GMT 2012



> On Dec. 11, 2012, 7:23 a.m., Frank Reininghaus wrote:
> > Thanks for the patch!
> > 
> > > If updateItemStates was called from slotThreadFinished,
> > > m_updateItemStatesThread would be different when slotThreadFinished was
> > > over. deleteLater would be called on the *new* m_updateItemStatesThread,
> > > instead of the just-finished one, causing it to be deleted before
> > > it had finished running, and a crash.
> > 
> > Are you sure? VersionControlObserver::updateItemStates() connects the finished() signal and the deleteLater() slot of the *same* thread, so I don't see how a thread can be deleted before it's finished.
> > 
> > About the "} else if(!m_updateItemStatesThread){" changes: could you elaborate on why this could fix a crash? Moreover, could it be a problem that your new implementation may return an empty list 'actions' if lockPlugin() fails?
> > 
> > Sorry if my questions are stupid! I have little experience with the version control stuff ;-)
> 
> Simeon Bird wrote:
>     I'm not sure, but I have a 2-sigma confidence :) 
>     
>     This is how I figured it out: 
>     
>     I was getting this crash occasionally, so I tried removing the unlock/relock pair. 
>     This was slower, of course, so then I tried making the lock a QReadWriteLock. 
>     Rather to my surprise, I then got the crash all the time! And in the logs there was the message:
>     
>     QThread: Destroyed while thread is still running
>     
>     Which means that a delete event has to be passed to the QThread *before* the finished() signal is emitted. 
>     
>     Here is what I think happens: 
>     
>     updateItemStates() connects m_updateItemStatesThread's finished() to slotThreadFinished() and m_updateItemStatesThread->deleteLater() (in that order). So when the event is fired, Qt calls slotThreadFinished(), and then once it has finished, calls m_updateItemStatesThread->deleteLater().
>     
>     BUT m_updateItemStatesThread is changed in slotThreadFinished, and thus does not point to the same object that emitted finished()!
>     
>     Most of the time we have m_updateItemStatesThread=0 (which just leaks), but sometimes we call updateItemStates() from slotThreadFinished(), which assigns m_updateItemStatesThread to a new QThread, which then gets deleted too early.
>     
>     
>     As to the } else if(!m_updateItemStatesThread){ bit:
>     
>     The code tries to get the plugin lock if m_updateItemStatesThread != 0. But getting the plugin lock may fail (because m_updateItemStatesThread is holding onto it too long). If it does fail to get the lock, however, 
>     the code used the plugin anyway, which is not safe, because it may be being changed by m_updateItemStatesThread::setData at the time. 
>     I think returning an empty action list is fine, because it will just mean that some file will not have the "git add" menu items.
>     I think in practice that the plugin lock can always be obtained though, so it is a bit academic.
>     
>     You want I should put all this in the commit messages? :)
> 
> Frank Reininghaus wrote:
>     First of all, thanks for your work! I really appreciate it. The only reason why I ask so many questions is that I want to make sure that we don't push something that just hides the real bug (making it harder to find and fix it later on - I've seen such things happen) rather than fixing it.
>     
>     > Here is what I think happens: 
>     > 
>     > updateItemStates() connects m_updateItemStatesThread's finished() to slotThreadFinished() and m_updateItemStatesThread->deleteLater() (in that order). So when the event is fired, 
>     > Qt calls slotThreadFinished(), and then once it has finished, calls m_updateItemStatesThread->deleteLater().
>     >
>     > BUT m_updateItemStatesThread is changed in slotThreadFinished, and thus does not point to the same object that emitted finished()!
>     > 
>     > Most of the time we have m_updateItemStatesThread=0 (which just leaks), but sometimes we call updateItemStates() from slotThreadFinished(), which assigns m_updateItemStatesThread 
>     > to a new QThread, which then gets deleted too early.
>     
>     Just to make sure that I really understand this: You are saying that we fist have an UpdateItemStatesThread "thread1". Its finished() signal is connected to its own deleteLater() slot and to VersionControlObserver::slotThreadFinished(). When the latter is called, a new UpdateItemStatesThread "thread2" might be created in VersionControlObserver::updateItemStates(). This function connects "thread2"'s finished() signal to "thread2"'s deleteLater() slot. I don't see a problem here - "thread1"'s finished() signal is still connected to its own deleteLater() slot, not to the one of "thread2".
>     
>     If I understand your statement
>     
>     > BUT m_updateItemStatesThread is changed in slotThreadFinished, and thus does not point to the same object that emitted finished()!
>     
>     correctly, then you are saying that now "thread1"'s finished() signal invokes "thread2"'s deleteLater() slot? AFAIK, this is not possible. The connection between "thread1"'s signal and its own slot is not changed just because a member of the VersionControlObserver changes.
>     
>     > As to the } else if(!m_updateItemStatesThread){ bit:
>     >
>     > The code tries to get the plugin lock if m_updateItemStatesThread != 0. But getting the plugin lock may fail (because m_updateItemStatesThread is holding onto it too long). 
>     > If it does fail to get the lock, however, 
>     > the code used the plugin anyway, which is not safe, because it may be being changed by m_updateItemStatesThread::setData at the time. 
>     
>     But UpdateItemStatesThread::setData() is only called by the VersionControlObserver, so this can't really happen at the same time as a call to VersionControlObserver::actions() because both are supposed to run in the main thread, right? Moreover, UpdateItemStatesThread::setData() only changes the m_plugin member of the thread (by assigning a new value to the pointer), so I don't really see why we have to protect accesses to the m_plugin member of the VersionControlObserver at all, but maybe I'm missing something.
>     
>     > I think in practice that the plugin lock can always be obtained though, so it is a bit academic.
>     
>     As I said, I don't see why the lock is needed at all, maybe we should ask Peter about it.
>     
>     > You want I should put all this in the commit messages? :)
>     
>     Not necessarily, but I want to understand why a change makes sense before it is pushed :-)
>
> 
> Simeon Bird wrote:
>     > First of all, thanks for your work! I really appreciate it. The only reason why I ask so many questions is that I want to make sure that 
>     > we don't push something that just hides the real bug (making it harder to find and fix it later on - I've seen such things happen) rather > than fixing it.
>     
>     You're very welcome! Sorry for the late reply, real life intervened. And please do ask the questions, because I am new to this, and especially because in this case you are right. 
>     
>     > correctly, then you are saying that now "thread1"'s finished() signal invokes "thread2"'s deleteLater() slot? AFAIK, this is not possible. > The connection between "thread1"'s signal and its own slot is not changed just because a member of the VersionControlObserver changes.
>     
>     That is exactly what I was saying. Also, you're right, signals and slots do indeed work like that (I checked with a test program), and the event really should not be delivered to the wrong item. Furthermore, I can still get the crash, just less often with all these patches. 
>     So I think at the moment that moving the deleteLater around doesn't fix it, but it changes whatever race is happening just enough to make it less likely to occur. 
>     
>     Also there was a deadlock bug in the posted code (I forgot to change the order the mutexes are taken in setData() ), 
>     which was causing the loading to hang sometimes, hiding the crash. 
>     
>     So to summarize:
>     
>     I have this crash bug. It persists. 
>     
>     I'm getting this message:
>     
>     QThread: Destroyed while thread is still running
>     
>     from gdb, which should be completely impossible. The QThread is only deleted from once the finished() signal is emitted. 
>     QObject auto-deletion can't be happening here because the calling object is manifestly still there, blocking in setData (right?).
>     
>     > But UpdateItemStatesThread::setData() is only called by the VersionControlObserver, so this can't really happen at the same time as a call 
>     > to VersionControlObserver::actions() because both are supposed to run in the main thread, right? Moreover, 
>     > UpdateItemStatesThread::setData() only changes the m_plugin member of the thread (by assigning a new value to the pointer), so I don't 
>     > really see why we have to protect accesses to the m_plugin member of the VersionControlObserver at all, but maybe I'm missing something.
>     
>     Ah, I see what you mean. I thought when reading the code (but I didn't check this) that plugin->beginRetrieval() could internally have some state which could affect the output of plugin->actions(), and that was why the mutex was there. I don't think the git plugin has any such thing (and for that endRetrieval is a no-op). 
>     I also tried disabling the locking, and it didn't seem to make much difference, so it's up to you.
> 
> Simeon Bird wrote:
>     Wait, m_plugin is a KVersionControlPlugin, which is a QObject, and we're using it from multiple threads. 
>     Is that the real cause of the crash here?
> 
> Frank Reininghaus wrote:
>     > Wait, m_plugin is a KVersionControlPlugin, which is a QObject, and we're using it from multiple threads. 
>     > Is that the real cause of the crash here?
>     
>     Accessing a QObject from multiple threads is in principle not a problem, provided that no threads call non-thread-safe methods or read and write data at the same time.
>     
>     > I'm getting this message:
>     >
>     > QThread: Destroyed while thread is still running
>     
>     Then one could put a gdb breakpoint in the function where that message is printed (probably QThread's destructor?) and check where that call comes from. That won't help much though if it comes from a signal-slot connection between threads. But if the thread is still running, its finished() signal cannot have been emitted yet, so it should not be the finished->deleteLater connection that is responsible for this.
>     
>     > QObject auto-deletion can't be happening here because the calling object is manifestly still there, blocking in setData (right?).
>     
>     What exactly do you mean with 'QObject auto-deletion'? AFAIK, a QObject is only deleted if delete/deleteLater are used directly, or if the destructor of its parent is executed (but the UpdateItemStatesThread does not have a parent at all).
>     
>     Another idea what might be causing a crash: if the UpdateItemStates thread is deleted while the main thread is trying to lock the mutex in UpdateItemStatesThread::setData(), that could be a problem. But I don't see how that could happen.

One thing that is confusing me as I look through more code. I don't really see why there should be any locking at all in setData or, actually, in retrievedItems or ItemStates. setData is only called before the thread is started and the others are only called after. 

I think if UpdateItemStates thread were deleted while the mutex were locked, that would be ok, because the pluginMutex is static.

My worry about QObject auto-deletion is that m_plugin is deleted in ~VersionControlObserver. So if VersionControlObserver were destroyed, m_plugin would go too, which would be bad if the QThread were still running. But I don't think that's happening either, because we see VersionControlObserver hanging in setData. 

I didn't think of sticking a breakpoint in the dtor, but you're right, if it is being deleted not in a signal-slot connection that would work.
Let's see if I can figure this out this time...


- Simeon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/107656/#review23301
-----------------------------------------------------------


On Dec. 10, 2012, 4:27 a.m., Simeon Bird wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/107656/
> -----------------------------------------------------------
> 
> (Updated Dec. 10, 2012, 4:27 a.m.)
> 
> 
> Review request for Dolphin and Frank Reininghaus.
> 
> 
> Description
> -------
> 
> Three commits here:
> 
> commit bd89a3ebe4f546e2032676c4b445befd9606e877
> 
>     Fix a potential crash if the plugin lock could not be taken
>     in VersionControlObserver.
> 
> (the "} else if(!m_updateItemStatesThread){" changes: I didn't see this, and I think it is unlikely, but best to be safe.)
> 
> 
> commit b1473eb1247e9168805ac92964cab11a10d6a0e2
> 
>     Make the UpdateItemStatesThread use a QReadWriteLock for protection
>     instead of a QMutex, now the previous crash is fixed and we can.
>     
>     Makes listings a bit faster.
> 
> commit 8f17e50800a7769521e496ab324d608ec0e36df4
> 
>     VersionControlObserver:
>     
>     If updateItemStates was called from slotThreadFinished,
>     m_updateItemStatesThread would be different when slotThreadFinished was
>     over. deleteLater would be called on the *new* m_updateItemStatesThread,
>     instead of the just-finished one, causing it to be deleted before
>     it had finished running, and a crash.
>     
>     Instead call deleteLater from slotThreadFinished directly.
>     
>     BUG: 302264
>     FIXED-IN: 4.10
> 
> This should really fix bug 302264 this time. 
> 
> 
> This addresses bug 302264.
>     http://bugs.kde.org/show_bug.cgi?id=302264
> 
> 
> Diffs
> -----
> 
>   dolphin/src/views/versioncontrol/updateitemstatesthread.h f0f91d7 
>   dolphin/src/views/versioncontrol/updateitemstatesthread.cpp e07d72c 
>   dolphin/src/views/versioncontrol/versioncontrolobserver.cpp 42e00de 
> 
> Diff: http://git.reviewboard.kde.org/r/107656/diff/
> 
> 
> Testing
> -------
> 
> Bug is fixed! 
> 
> It was very easy to reproduce (for me) once the window for it was made wider by using QReadWriteLock in UpdateItemStatesThread, and is no longer there.
> 
> 
> Thanks,
> 
> Simeon Bird
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20121223/d89d35fc/attachment.htm>


More information about the kfm-devel mailing list