scheduler issue

Eric Dejouhanet eric.dejouhanet at gmail.com
Tue Dec 8 08:16:31 GMT 2020


Clear, I am convinced this approach is OK.

So, for instance, even if Scheduler is not running, the algorithmical execution of the meridian flip, with, by themselves, Capture suspending, Guide suspending, Focus aborting when Mount notifies, should be described and implemented in the Scheduler strategy code?

I think this is what we discussed last year, but the implementation was not ready for that change at that time. It is probably possible now. 

eric.dejouhanet at gmail.com - https://astronomy.dejouha.net



	  Message original  	


De: sterne-jaeger at t-online.de
Envoyé: 8 décembre 2020 08:52
À: eric.dejouhanet at gmail.com
Cc: hy at murveit.com; mutlaqja at ikarustech.com; kstars-devel at kde.org
Objet: Re: scheduler issue


I’m fully with you, up to a certain extend each module should manage it’s own failures and at least try to restart its last procedure for a certain amount of times.


On the other hand, we need an orchestration layer that coordinates actions - as we currently have inside Capture and Scheduler. They are far from perfect, since the states are spreaded over the code. But the general logic is OK, that a failure in a module is handed back to the orchestration layer. 


What I would try to avoid is that one module actively controls others. What we have (and that’s good from my perspective) that modules listen to state changes from others and draw their own conclusions.


So in general, I think our architecture is appropriate, but I would really appreciate if we clean up the orchestration layer - i.e. Capture and Scheduler - and introduce dedicated state machines there.


Wolfgang



Am 08.12.2020 um 08:28 schrieb Eric Dejouhanet <eric.dejouhanet at gmail.com>:


Thanks Wolfgang, 


I acknowledge what you analyse. 


I added a test to verify that guiding would always be restarted if it was required (and disabled) at the time Focus was starting its procedure. But that makes Focus responsible for restarting Guide. So we are saying that it should be the same for Capture, with the risk that requests are mixed up. Doesn't that mean that Guide should integrate that task? Should we distinguish a failure, managed by Scheduler, and a procedure follow-up, managed by the module itself? 

eric.dejouhanet at gmail.com - https://astronomy.dejouha.net

De: sterne-jaeger at t-online.de
Envoyé: 8 décembre 2020 07:51
À: hy at murveit.com
Cc: eric.dejouhanet at gmail.com; mutlaqja at ikarustech.com
Objet: Re: scheduler issue


Dear all,
I’m not 100% sure, but for me it looks like the problem is located in the Capture module which - only in the case of a MF - is responsible for restarting the guiding procedure. 


What we can see from the logs, the Capture module receives a  GUIDE_CALIBRATION_ERROR event. For this event it calls  processGuidingFailed(). There is the critical part:


    else if (meridianFlipStage == MF_GUIDING)
    {
        if (++retries >= 3)
        {
            appendLogText(i18n("Post meridian flip calibration error. Aborting..."));
            abort();
        }
    }
    autoGuideReady = false;




The meridianFlipStage == MF_GUIDING is true, but it happens for the first time, hence it does not abort.

My explanation (without testing) is, that Capture is idling. The Scheduler also does nothing, since it only reacts upon guiding problems when it starts a job.


As a result, simply nothing happens - as we could see from the logs.


If somebody wants to fix it in the next two weeks, I could assist and advise. As a first step, I would warmly recommend creating a new test case for MF testing calibration error handling.


All the best
Wolfgang


—
Wolfgang Reissenberger


www.sterne-jaeger.de
TSA-120 + FSQ-85 | Avalon Linear + M-zero | Moravian G2-8300 + ASI 1600mm pro


Am 08.12.2020 um 07:08 schrieb Hy Murveit <murveit at gmail.com>:


Just to be clear, at this point I'm not taking ownership of this, but really just acting as a bug reporter.

If one of you wants to jump in and fix, or put it on a future TODO list, that would be great from my perspective.
On the other hand, if someone thinks I should fix it, then I'd need coaching. 
I'd prefer the former, but would do the latter if necessary.


Hy


On Mon, Dec 7, 2020 at 9:47 PM Eric Dejouhanet <eric.dejouhanet at gmail.com> wrote:

Hello Hy, 


From the code you pasted, state change is done before emitting state, so that seems proper. I think Jasem nailed it, there is a state management missing either in Capture, to report abort from guiding recovery failure, or Scheduler, to manage guiding failure. The fact it follows the meridian flip means, in my opinion, that we progressively approach the root cause of the remaining stability issues, and that we need to really isolate that flip process as much as we can. 

eric.dejouhanet at gmail.com - https://astronomy.dejouha.net

De: murveit at gmail.com
Envoyé: 7 décembre 2020 22:56
À: eric.dejouhanet at gmail.com
Répondre à: hy at murveit.com
Cc: hy at murveit.commutlaqja at ikarustech.comsterne-jaeger at t-online.de
Objet: Re: scheduler issue


Eric et al,


Not clear on how this works, but the code is copied below.
Note the emit newStatus().
What is the state change you are referring to?


FWIW, I've included the code snippet below (pretty complicated code to follow--there's a state machine in guide.cpp -- I guess that's either controlling the internal guider or PHD2), and there's a calibration state machine in internalguider.cpp. From what I can tell, I don't think it's the issue you described, but I'm not too familiar with these state machines, nor the scheduler.


Hy




From internalguider.cpp, line 505
void InternalGuider::processCalibration()
{
    pmath->performProcessing();

    if (pmath->isStarLost())
    {
        emit newLog(i18n("Lost track of the guide star. Try increasing the square size or reducing pulse duration."));
        reset();

        calibrationStage = CAL_ERROR;
        emit newStatus(Ekos::GUIDE_CALIBRATION_ERROR);
        emit calibrationUpdate(GuideInterface::CALIBRATION_MESSAGE_ONLY, i18n("Calibration Failed: Lost guide star."));
        return;
    }

    ...


From internalguider.cpp, line 542

void InternalGuider::reset()
{
    state = GUIDE_IDLE;
    //calibrationStage = CAL_IDLE;
    connect(guideFrame, SIGNAL(trackingStarSelected(int, int)), this, SLOT(trackingStarSelected(int, int)),
            Qt::UniqueConnection);
}



guide.cpp:2381
        connect(guider, &Ekos::GuideInterface::newStatus, this, &Ekos::Guide::setStatus);



guide.cpp:1935
void Guide::setStatus(Ekos::GuideState newState)
{
    if (newState == state)
    {
        // pass through the aborted state
        if (newState == GUIDE_ABORTED)
            emit newStatus(state);
        return;
    }

    GuideState previousState = state;

    state = newState;
    emit newStatus(state);

    switch (state)
    {
        ...        

        case GUIDE_IDLE:
        case GUIDE_CALIBRATION_ERROR:
            setBusy(false);
            manualDitherB->setEnabled(false);
            break;




On Mon, Dec 7, 2020 at 12:31 PM Eric Dejouhanet <eric.dejouhanet at gmail.com> wrote:

Could it be that the emission of the guide failure is done before Guide's state is changed? That was the case for Focus, and Scheduler's immediate reaction for a new autofocus was thus rejected.


Unfortunately I haven't had time to check the log yet. 

eric.dejouhanet at gmail.com - https://astronomy.dejouha.net

De: murveit at gmail.com
Envoyé: 7 décembre 2020 21:04
À: mutlaqja at ikarustech.com
Répondre à: hy at murveit.com
Cc: hy at murveit.comsterne-jaeger at t-online.deeric.dejouhanet at gmail.com
Objet: Re: scheduler issue


It was indeed after a meridian flip.
Jo also sent the .analyze file to me (attached) and here's a zoom'ed in screen shot from the time of the issue.






Hy


On Mon, Dec 7, 2020 at 11:43 AM Jasem Mutlaq <mutlaqja at ikarustech.com> wrote:

Hello Hy,


Do you know why it was calibrating? this wasn't after a meridian flip correct? What's happening is that scheduler handles calibration failures IF it was in the steps..i.e.


Track --> Focus --> Align --> Guide --> Capture. If at "Guide" calibration fails then it handles that. Right now, after capturing, the scheduler just LOGS the guide calibration results but does not handle them. Capture module should have been aborted if calibration fails, and then that would have been handled by the scheduler... but again, what would cause calibration in the middle of capture? meridian flip?


--
Best Regards,
Jasem Mutlaq







On Sun, Dec 6, 2020 at 11:30 PM Hy Murveit <murveit at gmail.com> wrote:

Eric, Jasem,



Reporting a possible scheduler bug.


Jo (@ElCorazon) sent me a log
https://www.dropbox.com/s/n8icvn90fhunjfl/log_20-53-07.txt.gz?dl=0

which I analyzed and my conclusion is that star detection caused guider calibration to fail at 01:15:54 (see below)


[2020-12-05T01:15:54.614 CST INFO ][     org.kde.kstars.ekos.guide] - "Lost track of the guide star. Try increasing the square size or reducing pulse duration."
[2020-12-05T01:15:54.617 CST DEBG ][   org.kde.kstars.ekos.capture] - Guiding state changed from "Calibrating" to "Calibration error"



and (ignoring that issue) the scheduler recognized, I suppose, that guider failed


[2020-12-05T01:15:54.624 CST DEBG ][ org.kde.kstars.ekos.scheduler] - Guide State "Calibration error"



but the scheduler didn't seem to restart the guiding calibration. Basically nothing happens until 1:57:56 when I assume Jo restarted things.


[2020-12-05T01:57:56.852 CST INFO ][ org.kde.kstars.ekos.scheduler] - Scheduler is stopping...



I assume the scheduler should try and restart the guider, but there are no .guide nor .scheduler messages between 1:15:54 and 1:57:56


Hy







More information about the Kstars-devel mailing list