D14965: Improve Scheduler robustness against INDI disconnections

Eric Dejouhanet noreply at phabricator.kde.org
Tue Aug 21 08:16:55 BST 2018


TallFurryMan created this revision.
TallFurryMan added reviewers: mutlaqja, wreissenberger.
Herald added a project: KDE Edu.
Herald added a subscriber: kde-edu.
TallFurryMan requested review of this revision.

REVISION SUMMARY
  In the case Ekos loses connection to INDI during the shutdown procedure, bypass parking procedure and proceed to execute the shutdown script.
  When a DBus error occurs while trying to control INDI devices (slewing/tracking, guiding, focusing or capturing), abort the current job, disconnect INDI (in terms of state machine) and stop Ekos.
  Make Scheduler timer verify Ekos and INDI state, so that communication failures may be recovered from immediately by restarting Ekos and restarting INDI.
  
  A few situations can lead to INDI disconnections:
  
  - A transitory network issue that closes the TCP stream, in which case reopening it returns to normal state (either with the running job continuing, or the running job aborted).
  - A serious network issue that prevents access to the INDI server, in which case Ekos will fail to restart and the Scheduler will stop.
  - A crash of one of the drivers, in which case Ekos might be able to reconnect on a new instance of the driver, or will loop trying to use the missing driver until it comes up again.
  
  Obviously, it is difficult to properly handle all situations.
  For instance when capturing, it may happen that the CCD driver remains in capture mode, with Ekos not being able to recover.
  It may happen that the disconnection does not trigger a DBus error, but is caught while the Scheduler is checking the state of the job.
  In that situation, Ekos might keep a particular state of control of a feature, but the crash might reset the properties of this feature, that state becomes invalid and unusable.
  
  Because this robustness improvement only triggers when a communication error occurs, it is not expected to have side-effects on the normal behavior of the Scheduler.
  
  Another issue is currently preventing all tests from being processed: the Profile field of the scheduler job is not properly handled.
  It is currently not possible to have different scheduler jobs using different profiles, and once it uses a particular profile, the Scheduler is unable to switch to another by itself.

TEST PLAN
  Create a scheduler job using the Simulator, with Tracking enabled, to give the tester time to kill the Simulator server.
  Start the Scheduler, and when it connects and starts to slew, use a terminal to find the PID of the INDI server ("ps -aef | grep indiserver") and kill it ("kill <pid>").
  Ekos will immediately register the disconnection, but unfortunately will not tell the Scheduler about it.
  Without the fix, the Scheduler is hung waiting for the slew to finish and must be stopped manually.
  With the fix, the Scheduler notices the DBus communication error, aborts the running job and attempts to restart Ekos and reconnect to INDI.
  Several test runs are needed to kill the Simulator during different stages of the job execution.

REPOSITORY
  R321 KStars

BRANCH
  bugfix__shutdown_parking_with_no_indi (branched from master)

REVISION DETAIL
  https://phabricator.kde.org/D14965

AFFECTED FILES
  kstars/ekos/scheduler/scheduler.cpp

To: TallFurryMan, mutlaqja, wreissenberger
Cc: kde-edu, narvaez, apol
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-edu/attachments/20180821/33cf78e2/attachment-0001.html>


More information about the kde-edu mailing list