[AR] Re: Flight Controller Features
- From: Dave McMillan <skyefire@xxxxxxxxxxxx>
- To: arocket@xxxxxxxxxxxxx
- Date: Thu, 14 Jan 2016 09:16:32 -0500
On 1/12/2016 12:11 PM, Henry Spencer wrote:
(Once warm-start was working, the Apollo software developers developed
a habit of hitting the reset button at random times during test runs.
If the software fell over and died, something was badly wrong, and
finding and fixing it was immediately top priority.)
Netflix's "Chaos Monkey" is apparently this, automated. IIRC,
Netflix actually deliberately causes some percentage of their servers to
crash randomly using the Monkey 24/7, so their automated failovers are
"exercised" almost constantly. It seems to have worked quite well for them.
The last presentation I saw from someone at Planetary Resources
said (again, IIRC) that the Arkyd telescopes would run on multiple
parallel virtual machines "voting" on outputs and running on commodity
(non-space-rated) hardware, with something like the Chaos Monkey
deliberately inflicting random failures so that the VMs were all getting
rebooted semi-regularly (semi-randomly?). They way it was explained, a
"stack" of VMs was effectively immune to single-event upsets, so SEU
vulnerability was limited to the hypervisor. That was at least three
years ago, though. It's been noted on Parabolic Arc that PR has been
utterly silent about any test results from the one telescope they've
orbited (recently re-entered) so far.
Other related posts: