[AR] Re: Flight Controller Features

  • From: Henry Spencer <hspencer@xxxxxxxxxxxxx>
  • To: Arocket List <arocket@xxxxxxxxxxxxx>
  • Date: Tue, 12 Jan 2016 12:11:06 -0500 (EST)

On Tue, 12 Jan 2016, Robert Watzlavick wrote:

Properly recovering from an exception is something frequently overlooked, even by experienced folks. Often, reboot the processor is the default behavior and that may not necessarily help things. For errant pointers, there's not much you can do if all processes share the same address space...

Disagree. Even if there's a big enough mess to require a restart, that doesn't have to mean a *cold* start. Every now and then, put aside a copy of all the data you'd need to restart gracefully (if perhaps not entirely seamlessly), with a checksum on it. When restarting, check the checksum, and if it's right, restart from the set-aside data rather than from a clean slate. In practice, you need two set-aside areas, so you can update one while leaving the other alone, in case you crash in the middle of the update.

This does have a considerable impact on how you write the software. It's not something you can retrofit into existing code. But done well, it can be very effective. Every one of those "program alarms" during Armstrong and Aldrin's descent to the Moon meant that the LM computer had rebooted, without ever missing a FBW control cycle.

(Once warm-start was working, the Apollo software developers developed a habit of hitting the reset button at random times during test runs. If the software fell over and died, something was badly wrong, and finding and fixing it was immediately top priority.)

Henry

Other related posts: