November 5, 2018 8:08 PM, "Alexander von Gluck IV" <kallisti5@xxxxxxxxxxx
(mailto:kallisti5@xxxxxxxxxxx?to=%22Alexander%20von%20Gluck%20IV%22%20<kallisti5@xxxxxxxxxxx>)>
wrote:
November 5, 2018 5:32 PM, "Alexander von Gluck IV" <kallisti5@xxxxxxxxxxx
(mailto:kallisti5@xxxxxxxxxxx?to=%22Alexander%20von%20Gluck%20IV%22%20<kallisti5@xxxxxxxxxxx>)>
wrote:
Good evening,
I plan on performing some maintenance tonight at 7:00 pm CDT (1 hr, 30 mins)
1) OS Upgrades (Fedora 28 -> 29)
2) Docker network change to assist in hairpin NAT functionality
Impact:
* git
* website haiku-os.org -> www
* pretty much everything honestly
The OS upgrades should be uneventful.
I did another 28 -> 29 server upgrade and the outage was < 8 minutes.
Item #2 is more experimental. We have some limitations in our current
infrastructure which prevents containers from externally reaching externally
exposed services on maui. I found a Docker tweak (thanks to Alex Smith wisely
giving some better search keywords) which should enable hairpin NAT within
Docker.
The backout plan for #2 is pretty simple (just remove a docker flag).
I'll keep everyone updated via this thread.
Thanks!
-- Alexander von Gluck IV
Good evening,
Stuff is still down unfortunately. Something didn't go well (maybe the new
docker network,
flag, unsure at the moment) and the server never came back. After around 15
minutes of the
server being unavailable I went ahead and ordered a KVM from Hetzner (our
dedicated server host)
We're now approaching an hour with zero response from Hetzner support on a high
priority support
issue.
I've exercised all of the automatic tools they offer and don't have any
solutions at the moment
other than waiting for someone at Hetzner support to wake up..
Sorry for the down time, we need to figure out something better going forward.
-- Alex
Good evening,
A status update.
After around an hour and a half Hetzner finally got back to us with a KVM. By
the time I finally
got access to maui, the server was hosed. I did a few reboots (with large
delays in-between) during
the outage trying to get things back up, and I feel like one of those reboots
interrupted the OS upgrade.
For those wondering, the filesystems mount fine, but all the binaries are
returning I/O errors when
launched (trying to chroot, etc)
For the moment i'm quickly standing up a vm at an alternate hosting provider
(scaleway) (attached to
the Haiku, Inc. email + my credit card) and am going to get a "bare minimum"
setup running of
gerrit + the repos.
I want to have a long hard discussion with the sysadmin team for feedback. Most
of the outages since
going live with Maui have been related to the limitations in the physical
configuration of the server and
difficulty of access to basic troubleshooting tools. (software mdadm raids, etc)
I'll be working this evening until we get a bare minimum deployment rolled out.
-- Alex