[haiku-development] Introducing baron.haiku-os.org ...

Hi there,

some of you will already know that baron.haiku-os.org is our dedicated server 
which will become the new home for:
- haiku's subversion repository
- dev.haiku-os.org (i.e. trac) 
- www.haiku-os.org 
- at a later stage: possibly more services, depending on how well the server
  copes with the load

For the technical details of the server hardware, please see this description:
        http://www.hetzner.de/en/hosting/produkte_rootserver/eq4/

In this mail, I'd like to give an overview about the different services that 
have been established on baron, the backup plans and desaster recovery 
measures.

Ok, it's going to be a longish mail, please bear with me ... ;-)

The general idea of the setup on baron is that the different services should 
be isolated as good as possible, as to ensure that a single service going 
crazy will not cause any problems for the other services running on baron.
In order to make this possible, a set of virtual machines (VMs) has been 
created, each of which is offering a specific service.

Basically, baron and all the VMs on it can be described as independent 
systems, so I am going to group the established services by machine:

*****************************
the systems (real or virtual)
*****************************
+++++++++++++++++++++
baron (188.40.89.152)
+++++++++++++++++++++
The real machine is running the VMs as standard Linux services on qemu with 
the KVM virtualizer and it works as a router for all the network traffic 
directed to itself and all the VMs. Since this means that baron is vital for 
all the offered services (SPOF and all that), the idea is to not run any 
vulnerable services on the server directly but move all of these into a VM.

Nevertheless, it is currently running these external services (i.e. reachable 
from the outside):
- sshd allowing remote logins for system admins only (via ssh-key only, 
  password authentication has been disabled)
- rsyncd serving the haiku-r1alpha1 release files as mirror seed to the a 
  limited number of IP addresses (known mirrors) 
- apache serving the haiku-r1alpha1 release files to the outside world 
  (considering the fact that this had our 100-Mbit network connection 
  saturated for many hours on Sep. 14th and 15th, I would really like to
  close the apache service once any of the VMs is in real use.

Additionally, there are some internal services, listening on localhost only 
(so you have to login via ssh to access them):
- the ntop traffic analyzer, meant for admins to be able to get an idea what's
  going on with respect to network traffic (for baron itself and for all VMs)
- apache serving the web-frontend of the collectd system state logger, meant
  for admins to get an idea about the state/load of the server
- the postfix mail server has been set up to handle outbound mails, such that
  the server can send administrative mails to haiku-sysadmin@xxxxxxxxxxxxx,
  inbound mail never passes the firewall

++++++++++++++++++++++++++
the svn VM (188.40.89.182)
++++++++++++++++++++++++++
This VM is meant to offer all haiku code repositories, i.e. the current 
subversion repository and other repositories, too (like mercurial and/or 
git). 

These are the external services:
- sshd allowing remote login for the sysadmins and for every commiter, in 
  order to access svn+ssh://svn.haiku-os.org/srv/svn/repos/haiku
- apache serving http://svn.haiku-os.org as the base of all subversion 
  repositories (currently only one repo exists, namely 'haiku'). 
  Transformation from an anonymous checkout to a authenticated one is done
  transparently: whenever you try to write to the repository, subversion will 
  ask you to enter your login credentials.
- apache serving http://hg.haiku-os.org as the base of all mercurial 
  repositories. There are some trac-related repos for the code organization
  of our own trac installation and there are two read-only repos, haiku-trunk
  and buildtools-trunk, which are can be cloned from. Note, however, that
  these repos are not updated automatically yet, it should be trivial to do
  that, but I'd like to get subversion going first and then we can make up
  our mind about how to get most out of a mercurial service.

Please note the absence of a https:// service, as there's no need to protect 
our code from being read by anyone. Authentication is implemented via 
http-digest auth, so no password is ever transferred over the network. 
If there's high demand, we could add a https:// service *just* for the 
authentication, but I personally find the whole SSL-certificate process 
broken enough to not even bother. If you think otherwise and feel your 
password absolutely needs to be protected by an encrypted channel, please 
tell!

Additionally, there are internal services, listening on localhost only, or 
not reachable via network at all:
- apache serving the web-frontend of the collectd system state logger, meant
  for admins to get an idea about the state/load of the server
- the postfix mail server has been set up to handle outbound mails, such that
  the server can send administrative mails to haiku-sysadmin@xxxxxxxxxxxxxx
- svnmirror, a cron-job that pulls new changesets from svn.berlios.de into
  the local subversion repository every 30 minutes. Once we have switched to 
  use svn.haiku-os.org as our svn server, the syncing will be reversed, i.e.
  any local commits will be pushed to the berlios svn repo every 30 minutes.
- the usual svn hook scripts have been copied from berlios and slightly 
  adjusted:
  + the commit messages will appear to have come from the real email address
    of the commiter, not from e.g. zooey@xxxxxxxxxxxxxxxx, as that address
    is not working
  + commit messages will be sent to haiku-commits@xxxxxxxxxxxxx instead of
    haiku-commits@xxxxxxxxxxxxxxxx (all subscribers are going to be moved to
    that list before we switch to the new subversion repo)
  + every commit will be synced immediately to the readonly subverson repo
    on the dev VM (for trac)

++++++++++++++++++++++++++
the dev VM (188.40.89.174)
++++++++++++++++++++++++++
This VM is going to replace dev.haiku-os.org (haiku's trac installation).

These are the external services:
- sshd allowing remote login for the sysadmins
- apache serving a *test* installation of trac at http://vmdev.haiku-os.org. 
  It is work-in-progress.

Additionally, there are internal services, listening on localhost only, or 
not reachable via network at all:
- apache serving the web-frontend of the collectd system state logger, meant
  for admins to get an idea about the state/load of the server
- the postfix mail server handles outbound mails, such that the server can
  send administrative mails to haiku-sysadmin@xxxxxxxxxxxxx

++++++++++++++++++++++++++
the web VM (188.40.89.175)
++++++++++++++++++++++++++
This VM is going to replace www.haiku-os.org, but it doesn't even exist yet.

**********************
Regular Administration
**********************
++++++++++++++++++++++++++++
haiku-sysadmin@xxxxxxxxxxxxx
++++++++++++++++++++++++++++
All potential sysadmins (people that can sudo to root on the systems) are 
expected to be subscribed to that mailing list, such that they can follow any 
discussion and can learn and (hopefully) do something about any problems.

However, that list is not limited to sysadmins, everyone interested is 
invited to join it.

Currently, the following people have the permissions to sudo to root on all 
systems:

    Axel, Ingo, Matt, Niels, Urias and myself

So we have 6 admins spread across the US and Europe. 

N.B.: this list is open for discussion, so if you'd like to get yourself or 
someone else onto that list (or if you'd like to get yourself removed), 
please tell.

+++++++++++++
Daily Summary
+++++++++++++
Every system sends a daily summary mail to haiku-sysadmin@xxxxxxxxxxxxx, 
which contains the following info:

- info about current uptime, load, disk and network state
- any pending software updates (security patches) available via the update
  software repository
- the result of any backups
- the result of any other synchronization process (e.g. svnmirror)

++++++++++++
Backup Plans
++++++++++++
Currently, a three-fold backup plan has been implemented:

ROTATED FOLDER BACKUP (OFF-SITE, PUSH)

Every system (baron, svn and dev) runs a cron-job around 6:30 CEST that 
creates a zipped tar archive of the system folders that are considered 
important and copies that archive over to Matt Madia's server. 
Once there, each archive is being rotated into a historical archive span
such that the last 50 (or 100) days are preserved as well as the state of 
every month's 1st. This should protect us against any erraneous deletions or 
other changes which weren't noticed immediately.

The following folders are being archived: 
/etc     - as it contains vital setup information
/root    - as it contains the admin-notes
/home    - as it contains the public ssh-keys of all users
/var/log - as it contains logfiles that might be important to have if 
anything 
           goes seriously wrong

Additionally, on the svn VM, the administrative section of /srv/svn 
(everything except the repo-data itself) is being archived, too.

FULL SYSTEM BACKUP (ON-SITE, PUSH)

Once per day around 6:00 CEST, every system is rsynced more or less 
completely (leaving out only recreatable data) to a specific folder on baron. 
From that sync folder, once per week for every system (spread across 
weekdays) an archive is created and encrypted with root's gpg key and then 
copied to our
backup space (100 GB) at Hetzner. These archives are small enough such that
all should fit into those 100 GB easily. 

The purpose of these archives is to be able to restore complete systems 
swiftly.

FULL SYSTEM BACKUP (OFF-SITE, PULL)

Around 7:00 CEST, my own (private) server synchronizes an internal copy of 
every system via rsync (from the syncs folder on baron, so the VMs do not 
have to bear the traffic). Since that makes use of rsyncs hard link 
optimization feature, every day of the respective systems history should be 
available.

This is done to protect the data against any malicious attacks from within 
the server, which could potentially destroy all backups on the ftp backup 
space and on Matt's server, too. As this is a pull-only backup, there's 
hardly a way for the attacker to get into my server and destroy this data, 
too.
However, restoring the data would take a considerate amount of time, as I've 
only got a 1 Mbit upstream.

***********************
Disaster recovery plans
***********************
Of course, there's a lot of things that can go wrong, so there can't be a 
complete list, but I thought it would be a good idea to know beforehand what 
to do in the most likely failure situations. 
As a result, I am going to collect and describe recovery plans for different 
likely problems and put those, along with other administrative info, into the 
'admin-notes' folder in root's home.

Basically, all of these plans will describe the most efficient way to restore 
whole systems or single services from all the different available backups.

Phew! - that's it for now, I'm sure we'll learn a lot more during the process 
of migrating all the services to baron. Keep your fingers crossed ;-)

I'll try to keep you informed, but if you have any questions/suggestions, 
please bring them forward.

cheers,
        Oliver

------------------------------------------------------------------------------

Other related posts: