segv in setCaptureComplete
Hy Murveit
murveit at gmail.com
Fri Aug 13 22:09:09 BST 2021
This error occurs because the guider changes state while capture is
executing waitForFinished() waiting for the solver.
I don't understand how to set that up in TestEkosCaptureWorkflow, and would
prefer if you (Wolfgang) took that on.
In general, I don't really understand the intricacies of capture.cpp and am
uncomfortable making broad changes there.
I have sent in an MR to try and fix the bug I described, but not the
general problem.
https://invent.kde.org/education/kstars/-/merge_requests/387
Please check out that MR carefully.
Also, I did do a survey and found ~180 references to 'activeJob->' in
capture.cpp
I looked closer and saw that most were protected references, eg. with
something like
if (activeJob == nullptr)
return;
at the start of a method, and if there is no real parallel processing going
on, and no calls to wait... then this kind of protection should be
effective.
There are many methods with unprotected access of activeJob, though:
setCaptureComplete
checkDithering
processJobCompletionStage1
processJobCompletionStage2
prepareActiveJobStage1
prepareActiveJobStage2
preparePreCaptureActions
updatePrepareState
executeJob
setCurrentADU
checkLightFrameScopeCoverOpen
checkDarkFramePendingTasks
checkFlatFramePendingTasks
processPostCaptureCalibrationStage
scriptFinished
It seems likely that some/all of these should also add tests for activeJob
== nullptr
Hy
On Fri, Aug 13, 2021 at 12:55 PM Wolfgang Reissenberger <
sterne-jaeger at openfuture.de> wrote:
> What about general avoiding a null value here?
>
> > Am 13.08.2021 um 18:04 schrieb Jasem Mutlaq <mutlaqja at ikarustech.com>:
> >
> > I agree with Wolfgang that we need a better solution for this issue.
> > For the time being, checking if activeJob is null is the way to
> > resolve it.
> >
> > --
> > Best Regards,
> > Jasem Mutlaq
> >
> >
> > On Fri, Aug 13, 2021 at 4:54 PM Wolfgang Reissenberger
> > <sterne-jaeger at openfuture.de> wrote:
> >>
> >> Hm, tricky, this looks like a problem that we need to solve more
> generally. Obviously, using one simple pointer for the active job is not
> thread safe. In Java, I would solve this with a accessor function that
> throws a specific exception, that can be caught in a central place. But I’m
> not sure what the appropriate answer in C++ is...
> >>
> >> But anyway, let’s start developing a solution for it like we did it
> with all the other edge cases: step 1 is to develop a test case that (more
> or less) makes this problem reproducible. A good starting point could be
> extending TestEkosCaptureWorkflow, which I created a couple of weeks ago.
> >>
> >> Hy, could you give it a try? I could assist…
> >>
> >> Wolfgang
> >>
> >> Am 13.08.2021 um 06:02 schrieb Hy Murveit <murveit at gmail.com>:
> >>
> >> TL;DR
> >>
> >> I got a segv in bad weather the other night.
> >> I did trace down what caused it, see below, but I think I should leave
> the solution to someone more experienced in the intricacies of capture
> (e.g. Wolfgang or Jasem).
> >>
> >> Why it crashed:
> >>
> >> You can see the end of the log below (log stops on segv after these 20
> lines).
> >> You can also see the backtrace below it.
> >>
> >> Bottom line, it's clear from the gdb backtrace that it dies in
> Capture::setCaptureComplete(), and I believe it died trying to reference
> the activeJob variable
> >> which is set to nullptr because of Guiding state changed from
> "Reacquiring" to "Aborted.
> >>
> >> When the guiding state changed from "Reacquiring" to "Aborted" in
> setGuideStatus().
> >> - it called processGuidingFailed(),
> >> - which called abort()
> >> - which called stop()
> >> - which sets activeJob = nullptr.
> >>
> >> Meanwhile, while all that was happening, setCaptureComplete was waiting
> for the solver to run:
> >> QFuture<bool> result = m_ImageData->findStars(ALGORITHM_SEP);
> >> result.waitForFinished();
> >>
> >> and while it was waiting, activeJob was set to nullptr but the guiding
> status change. So, after the wait completed, setCaptureComplete
> >> probably failed onemit captureComplete(filename,
> activeJob->getFilterName(), ...)
> >> or perhaps below that on "if (activeJob->getCount())"
> >> because activeJob is now nullptr.
> >>
> >> Solution:
> >>
> >> We could simply protect the access of activeJob by testing if it's not
> null.
> >> That would certainly work for the emit captureComplete (just don't emit
> it if guiding has failed).
> >>
> >> However, I'm not sure what to do about the second access.
> >> Should it just do this both places?
> >>
> >> if (activeJob == nullptr) return IPS_OK;
> >>
> >> Anyway, I'll leave this for Jasem and/or Wolfgang.
> >>
> >> Hy
> >>
> >>
> >>
> >> [2021-08-12T02:28:23.284 PDT INFO ][ org.kde.kstars.ekos.capture] -
> "Received image 16 out of 60."
> >> [2021-08-12T02:28:23.286 PDT DEBG ][ org.kde.kstars.fits] -
> Sextract with: "1-HFR-Default"
> >> [2021-08-12T02:28:23.580 PDT DEBG ][ org.kde.kstars.indi] -
> Image received. Mode: "Guide" Size: 313920
> >> [2021-08-12T02:28:23.581 PDT DEBG ][ org.kde.kstars.fits] -
> Reading file buffer ( "306.6 KiB" )
> >> [2021-08-12T02:28:23.614 PDT DEBG ][ org.kde.kstars.ekos.guide] -
> Received guide frame.
> >> [2021-08-12T02:28:23.614 PDT DEBG ][ org.kde.kstars.ekos.guide] -
> Multistar: findTopStars 10
> >> [2021-08-12T02:28:23.615 PDT DEBG ][ org.kde.kstars.fits] -
> Sextract with: "1-Guide-Default"
> >> [2021-08-12T02:28:23.679 PDT DEBG ][ org.kde.kstars.indi] -
> Rainbow Astro RSF : "[DEBUG] CMD <:Fp#> "
> >> [2021-08-12T02:28:23.679 PDT DEBG ][ org.kde.kstars.indi] -
> Rainbow Astro RSF : "[DEBUG] RES <:FP-04.815#> "
> >> [2021-08-12T02:28:23.700 PDT DEBG ][ org.kde.kstars.ekos.scheduler] -
> Scheduler iteration never set up.
> >> [2021-08-12T02:28:23.708 PDT DEBG ][ org.kde.kstars.ekos.guide] -
> "Select # x y flux HFR SNR score"
> >> [2021-08-12T02:28:23.708 PDT DEBG ][ org.kde.kstars.ekos.guide] -
> No suitable star detected.
> >> [2021-08-12T02:28:23.708 PDT INFO ][ org.kde.kstars.ekos.guide] -
> "Failed to find any suitable guide stars. Aborting..."
> >> [2021-08-12T02:28:23.709 PDT DEBG ][ org.kde.kstars.ekos.capture] -
> Guiding state changed from "Reacquiring" to "Aborted"
> >> [2021-08-12T02:28:23.710 PDT INFO ][ org.kde.kstars.ekos.capture] -
> "Autoguiding stopped. Aborting..."
> >> [2021-08-12T02:28:23.768 PDT INFO ][ org.kde.kstars.ekos.capture] -
> "CCD capture aborted"
> >> [2021-08-12T02:28:23.770 PDT DEBG ][ org.kde.kstars.ekos.guide] -
> Reset non guiding dithering position
> >> [2021-08-12T02:28:23.771 PDT DEBG ][ org.kde.kstars.ekos.capture] -
> setMeridianFlipStage: "MF_READY"
> >> [2021-08-12T02:28:23.837 PDT INFO ][ org.kde.kstars.ekos.guide] -
> "Autoguiding aborted."
> >> [2021-08-12T02:28:23.837 PDT DEBG ][ org.kde.kstars.ekos.guide] -
> Aborting "Reacquiring"
> >>>
> >>
> >>
> >>
> >>
> >> (gdb) bt
> >> #0 0x0000aaaaaafcfe64 in Ekos::Capture::b() (this=this at entry
> =0xaaaab68c02b0)
> >> at /home/hy/Projects/kstars/kstars/ekos/capture/sequencejob.h:153
> >> #1 0x0000aaaaaafd0b70 in
> Ekos::Capture::processData(QSharedPointer<FITSData> const&)
> (this=0xaaaab68c02b0, data=...)
> >> at /home/hy/Projects/kstars/kstars/ekos/capture/capture.cpp:1700
> >> #2 0x0000fffff5416d5c in () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #3 0x0000aaaaaaddbe68 in ISD::CCD::newImage(QSharedPointer<FITSData>
> const&) (this=this at entry=0xaaaaafe245c0, _t1=...)
> >> at
> /home/hy/Projects/kstars-build/kstars/KStarsLib_autogen/FRI4DANIHA/moc_indiccd.cpp:413
> >> #4 0x0000aaaaaaea668c in ISD::CCD::handleImage(ISD::CCDChip*, QString
> const&, _IBLOB*, QSharedPointer<FITSData>)
> >> (this=this at entry=0xaaaaafe245c0, targetChip=targetChip at entry=0xaaaab1a8dca0,
> filename=..., bp=bp at entry=0xffff94014950, data=...) at
> /home/hy/Projects/kstars/kstars/indi/indiccd.cpp:1699
> >> #5 0x0000aaaaaaead8d0 in ISD::CCD::processBLOB(_IBLOB*)
> (this=0xaaaaafe245c0, bp=0xffff94014950)
> >> at /usr/include/c++/10/bits/atomic_base.h:325
> >> #6 0x0000fffff5416d5c in () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #7 0x0000aaaaaaddd1cc in ClientManager::newINDIBLOB(_IBLOB*)
> (this=<optimized out>, _t1=<optimized out>)
> >> at
> /home/hy/Projects/kstars-build/kstars/KStarsLib_autogen/FRI4DANIHA/moc_clientmanager.cpp:369
> >> #8 0x0000fffff540cbfc in QObject::event(QEvent*) () at
> /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #9 0x0000fffff5e1a480 in QApplicationPrivate::notify_helper(QObject*,
> QEvent*) ()
> >> at /lib/aarch64-linux-gnu/libQt5Widgets.so.5
> >> #10 0x0000fffff53dc56c in QCoreApplication::notifyInternal2(QObject*,
> QEvent*) () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #11 0x0000fffff53df2c8 in
> QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
> >> at /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #12 0x0000fffff543ae68 in () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #13 0x0000fffff4410c30 in g_main_context_dispatch () at
> /lib/aarch64-linux-gnu/libglib-2.0.so.0
> >> #14 0x0000fffff4410ec8 in () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
> >> #15 0x0000fffff4410f94 in g_main_context_iteration () at
> /lib/aarch64-linux-gnu/libglib-2.0.so.0
> >> #16 0x0000fffff543a304 in
> QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)
> ()
> >> at /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #17 0x0000fffff53da97c in
> QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) ()
> >> at /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #18 0x0000fffff53e3a7c in QCoreApplication::exec() () at
> /lib/aarch64-linux-gnu/libQt5Core.so.5
> >> #19 0x0000aaaaaab9956c in main(int, char**) (argc=<optimized out>,
> argv=<optimized out>)
> >> at /home/hy/Projects/kstars/kstars/main.cpp:393
> >> (gdb)
> >>
> >>
> >>
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kstars-devel/attachments/20210813/56a7029c/attachment-0001.htm>
More information about the Kstars-devel
mailing list