segv in setCaptureComplete

Jasem Mutlaq mutlaqja at ikarustech.com
Fri Aug 13 17:04:56 BST 2021


I agree with Wolfgang that we need a better solution for this issue.
For the time being, checking if activeJob is null is the way to
resolve it.

--
Best Regards,
Jasem Mutlaq


On Fri, Aug 13, 2021 at 4:54 PM Wolfgang Reissenberger
<sterne-jaeger at openfuture.de> wrote:
>
> Hm, tricky, this looks like a problem that we need to solve more generally. Obviously, using one simple pointer for the active job is not thread safe. In Java, I would solve this with a accessor function that throws a specific exception, that can be caught in a central place. But I’m not sure what the appropriate answer in C++ is...
>
> But anyway, let’s start developing a solution for it like we did it with all the other edge cases: step 1 is to develop a test case that (more or less) makes this problem reproducible. A good starting point could be extending  TestEkosCaptureWorkflow, which I created a couple of weeks ago.
>
> Hy, could you give it a try? I could assist…
>
> Wolfgang
>
> Am 13.08.2021 um 06:02 schrieb Hy Murveit <murveit at gmail.com>:
>
> TL;DR
>
> I got a segv in bad weather the other night.
> I did trace down what caused it, see below, but I think I should leave the solution to someone more experienced in the intricacies of capture (e.g. Wolfgang or Jasem).
>
> Why it crashed:
>
> You can see the end of the log below (log stops on segv after these 20 lines).
> You can also see the backtrace below it.
>
> Bottom line, it's clear from the gdb backtrace that it dies in Capture::setCaptureComplete(), and I believe it died trying to reference the activeJob variable
> which is set to nullptr because of Guiding state changed from "Reacquiring" to "Aborted.
>
> When the guiding state changed from  "Reacquiring" to "Aborted" in setGuideStatus().
> - it called processGuidingFailed(),
> - which called abort()
> - which called stop()
> - which sets activeJob = nullptr.
>
> Meanwhile, while all that was happening, setCaptureComplete was waiting for the solver to run:
>             QFuture<bool> result = m_ImageData->findStars(ALGORITHM_SEP);
>             result.waitForFinished();
>
> and while it was waiting, activeJob was set to nullptr but the guiding status change.  So, after the wait completed, setCaptureComplete
> probably failed onemit captureComplete(filename, activeJob->getFilterName(), ...)
> or perhaps below that on "if (activeJob->getCount())"
> because activeJob is now nullptr.
>
> Solution:
>
> We could simply protect the access of activeJob by testing if it's not null.
> That would certainly work for the emit captureComplete (just don't emit it if guiding has failed).
>
> However, I'm not sure what to do about the second access.
> Should it just do this both places?
>
> if (activeJob == nullptr) return IPS_OK;
>
> Anyway, I'll leave this for Jasem and/or Wolfgang.
>
> Hy
>
>
>
> [2021-08-12T02:28:23.284 PDT INFO ][   org.kde.kstars.ekos.capture] - "Received image 16 out of 60."
> [2021-08-12T02:28:23.286 PDT DEBG ][           org.kde.kstars.fits] - Sextract with:  "1-HFR-Default"
> [2021-08-12T02:28:23.580 PDT DEBG ][           org.kde.kstars.indi] - Image received. Mode: "Guide" Size: 313920
> [2021-08-12T02:28:23.581 PDT DEBG ][           org.kde.kstars.fits] - Reading file buffer ( "306.6 KiB" )
> [2021-08-12T02:28:23.614 PDT DEBG ][     org.kde.kstars.ekos.guide] - Received guide frame.
> [2021-08-12T02:28:23.614 PDT DEBG ][     org.kde.kstars.ekos.guide] - Multistar: findTopStars 10
> [2021-08-12T02:28:23.615 PDT DEBG ][           org.kde.kstars.fits] - Sextract with:  "1-Guide-Default"
> [2021-08-12T02:28:23.679 PDT DEBG ][           org.kde.kstars.indi] - Rainbow Astro RSF : "[DEBUG] CMD <:Fp#> "
> [2021-08-12T02:28:23.679 PDT DEBG ][           org.kde.kstars.indi] - Rainbow Astro RSF : "[DEBUG] RES <:FP-04.815#> "
> [2021-08-12T02:28:23.700 PDT DEBG ][ org.kde.kstars.ekos.scheduler] - Scheduler iteration never set up.
> [2021-08-12T02:28:23.708 PDT DEBG ][     org.kde.kstars.ekos.guide] - "Select      #   x      y      flux    HFR  SNR   score"
> [2021-08-12T02:28:23.708 PDT DEBG ][     org.kde.kstars.ekos.guide] - No suitable star detected.
> [2021-08-12T02:28:23.708 PDT INFO ][     org.kde.kstars.ekos.guide] - "Failed to find any suitable guide stars. Aborting..."
> [2021-08-12T02:28:23.709 PDT DEBG ][   org.kde.kstars.ekos.capture] - Guiding state changed from "Reacquiring" to "Aborted"
> [2021-08-12T02:28:23.710 PDT INFO ][   org.kde.kstars.ekos.capture] - "Autoguiding stopped. Aborting..."
> [2021-08-12T02:28:23.768 PDT INFO ][   org.kde.kstars.ekos.capture] - "CCD capture aborted"
> [2021-08-12T02:28:23.770 PDT DEBG ][     org.kde.kstars.ekos.guide] - Reset non guiding dithering position
> [2021-08-12T02:28:23.771 PDT DEBG ][   org.kde.kstars.ekos.capture] - setMeridianFlipStage:  "MF_READY"
> [2021-08-12T02:28:23.837 PDT INFO ][     org.kde.kstars.ekos.guide] - "Autoguiding aborted."
> [2021-08-12T02:28:23.837 PDT DEBG ][     org.kde.kstars.ekos.guide] - Aborting "Reacquiring"
> >
>
>
>
>
> (gdb) bt
> #0  0x0000aaaaaafcfe64 in Ekos::Capture::b() (this=this at entry=0xaaaab68c02b0)
>     at /home/hy/Projects/kstars/kstars/ekos/capture/sequencejob.h:153
> #1  0x0000aaaaaafd0b70 in Ekos::Capture::processData(QSharedPointer<FITSData> const&) (this=0xaaaab68c02b0, data=...)
>     at /home/hy/Projects/kstars/kstars/ekos/capture/capture.cpp:1700
> #2  0x0000fffff5416d5c in  () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #3  0x0000aaaaaaddbe68 in ISD::CCD::newImage(QSharedPointer<FITSData> const&) (this=this at entry=0xaaaaafe245c0, _t1=...)
>     at /home/hy/Projects/kstars-build/kstars/KStarsLib_autogen/FRI4DANIHA/moc_indiccd.cpp:413
> #4  0x0000aaaaaaea668c in ISD::CCD::handleImage(ISD::CCDChip*, QString const&, _IBLOB*, QSharedPointer<FITSData>)
>     (this=this at entry=0xaaaaafe245c0, targetChip=targetChip at entry=0xaaaab1a8dca0, filename=..., bp=bp at entry=0xffff94014950, data=...) at /home/hy/Projects/kstars/kstars/indi/indiccd.cpp:1699
> #5  0x0000aaaaaaead8d0 in ISD::CCD::processBLOB(_IBLOB*) (this=0xaaaaafe245c0, bp=0xffff94014950)
>     at /usr/include/c++/10/bits/atomic_base.h:325
> #6  0x0000fffff5416d5c in  () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #7  0x0000aaaaaaddd1cc in ClientManager::newINDIBLOB(_IBLOB*) (this=<optimized out>, _t1=<optimized out>)
>     at /home/hy/Projects/kstars-build/kstars/KStarsLib_autogen/FRI4DANIHA/moc_clientmanager.cpp:369
> #8  0x0000fffff540cbfc in QObject::event(QEvent*) () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #9  0x0000fffff5e1a480 in QApplicationPrivate::notify_helper(QObject*, QEvent*) ()
>     at /lib/aarch64-linux-gnu/libQt5Widgets.so.5
> #10 0x0000fffff53dc56c in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #11 0x0000fffff53df2c8 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) ()
>     at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #12 0x0000fffff543ae68 in  () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #13 0x0000fffff4410c30 in g_main_context_dispatch () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
> #14 0x0000fffff4410ec8 in  () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
> #15 0x0000fffff4410f94 in g_main_context_iteration () at /lib/aarch64-linux-gnu/libglib-2.0.so.0
> #16 0x0000fffff543a304 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) ()
>     at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #17 0x0000fffff53da97c in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) ()
>     at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #18 0x0000fffff53e3a7c in QCoreApplication::exec() () at /lib/aarch64-linux-gnu/libQt5Core.so.5
> #19 0x0000aaaaaab9956c in main(int, char**) (argc=<optimized out>, argv=<optimized out>)
>     at /home/hy/Projects/kstars/kstars/main.cpp:393
> (gdb)
>
>
>
>


More information about the Kstars-devel mailing list