[AR] Re: Flight Controller Features

  • From: Dave McMillan <skyefire@xxxxxxxxxxxx>
  • To: arocket@xxxxxxxxxxxxx
  • Date: Thu, 14 Jan 2016 09:16:32 -0500



On 1/12/2016 12:11 PM, Henry Spencer wrote:


(Once warm-start was working, the Apollo software developers developed a habit of hitting the reset button at random times during test runs. If the software fell over and died, something was badly wrong, and finding and fixing it was immediately top priority.)

Netflix's "Chaos Monkey" is apparently this, automated. IIRC, Netflix actually deliberately causes some percentage of their servers to crash randomly using the Monkey 24/7, so their automated failovers are "exercised" almost constantly. It seems to have worked quite well for them.
The last presentation I saw from someone at Planetary Resources said (again, IIRC) that the Arkyd telescopes would run on multiple parallel virtual machines "voting" on outputs and running on commodity (non-space-rated) hardware, with something like the Chaos Monkey deliberately inflicting random failures so that the VMs were all getting rebooted semi-regularly (semi-randomly?). They way it was explained, a "stack" of VMs was effectively immune to single-event upsets, so SEU vulnerability was limited to the hypervisor. That was at least three years ago, though. It's been noted on Parabolic Arc that PR has been utterly silent about any test results from the one telescope they've orbited (recently re-entered) so far.

Other related posts: