[haiku-development] Introducing baron.haiku-os.org ...
- From: Oliver Tappe <zooey@xxxxxxxxxxxxxxx>
- To: haiku-development@xxxxxxxxxxxxx, haiku-web@xxxxxxxxxxxxx
- Date: Fri, 25 Sep 2009 18:38:33 +0200
Hi there,
some of you will already know that baron.haiku-os.org is our dedicated server
which will become the new home for:
- haiku's subversion repository
- dev.haiku-os.org (i.e. trac)
- www.haiku-os.org
- at a later stage: possibly more services, depending on how well the server
copes with the load
For the technical details of the server hardware, please see this description:
http://www.hetzner.de/en/hosting/produkte_rootserver/eq4/
In this mail, I'd like to give an overview about the different services that
have been established on baron, the backup plans and desaster recovery
measures.
Ok, it's going to be a longish mail, please bear with me ... ;-)
The general idea of the setup on baron is that the different services should
be isolated as good as possible, as to ensure that a single service going
crazy will not cause any problems for the other services running on baron.
In order to make this possible, a set of virtual machines (VMs) has been
created, each of which is offering a specific service.
Basically, baron and all the VMs on it can be described as independent
systems, so I am going to group the established services by machine:
*****************************
the systems (real or virtual)
*****************************
+++++++++++++++++++++
baron (188.40.89.152)
+++++++++++++++++++++
The real machine is running the VMs as standard Linux services on qemu with
the KVM virtualizer and it works as a router for all the network traffic
directed to itself and all the VMs. Since this means that baron is vital for
all the offered services (SPOF and all that), the idea is to not run any
vulnerable services on the server directly but move all of these into a VM.
Nevertheless, it is currently running these external services (i.e. reachable
from the outside):
- sshd allowing remote logins for system admins only (via ssh-key only,
password authentication has been disabled)
- rsyncd serving the haiku-r1alpha1 release files as mirror seed to the a
limited number of IP addresses (known mirrors)
- apache serving the haiku-r1alpha1 release files to the outside world
(considering the fact that this had our 100-Mbit network connection
saturated for many hours on Sep. 14th and 15th, I would really like to
close the apache service once any of the VMs is in real use.
Additionally, there are some internal services, listening on localhost only
(so you have to login via ssh to access them):
- the ntop traffic analyzer, meant for admins to be able to get an idea what's
going on with respect to network traffic (for baron itself and for all VMs)
- apache serving the web-frontend of the collectd system state logger, meant
for admins to get an idea about the state/load of the server
- the postfix mail server has been set up to handle outbound mails, such that
the server can send administrative mails to haiku-sysadmin@xxxxxxxxxxxxx,
inbound mail never passes the firewall
++++++++++++++++++++++++++
the svn VM (188.40.89.182)
++++++++++++++++++++++++++
This VM is meant to offer all haiku code repositories, i.e. the current
subversion repository and other repositories, too (like mercurial and/or
git).
These are the external services:
- sshd allowing remote login for the sysadmins and for every commiter, in
order to access svn+ssh://svn.haiku-os.org/srv/svn/repos/haiku
- apache serving http://svn.haiku-os.org as the base of all subversion
repositories (currently only one repo exists, namely 'haiku').
Transformation from an anonymous checkout to a authenticated one is done
transparently: whenever you try to write to the repository, subversion will
ask you to enter your login credentials.
- apache serving http://hg.haiku-os.org as the base of all mercurial
repositories. There are some trac-related repos for the code organization
of our own trac installation and there are two read-only repos, haiku-trunk
and buildtools-trunk, which are can be cloned from. Note, however, that
these repos are not updated automatically yet, it should be trivial to do
that, but I'd like to get subversion going first and then we can make up
our mind about how to get most out of a mercurial service.
Please note the absence of a https:// service, as there's no need to protect
our code from being read by anyone. Authentication is implemented via
http-digest auth, so no password is ever transferred over the network.
If there's high demand, we could add a https:// service *just* for the
authentication, but I personally find the whole SSL-certificate process
broken enough to not even bother. If you think otherwise and feel your
password absolutely needs to be protected by an encrypted channel, please
tell!
Additionally, there are internal services, listening on localhost only, or
not reachable via network at all:
- apache serving the web-frontend of the collectd system state logger, meant
for admins to get an idea about the state/load of the server
- the postfix mail server has been set up to handle outbound mails, such that
the server can send administrative mails to haiku-sysadmin@xxxxxxxxxxxxxx
- svnmirror, a cron-job that pulls new changesets from svn.berlios.de into
the local subversion repository every 30 minutes. Once we have switched to
use svn.haiku-os.org as our svn server, the syncing will be reversed, i.e.
any local commits will be pushed to the berlios svn repo every 30 minutes.
- the usual svn hook scripts have been copied from berlios and slightly
adjusted:
+ the commit messages will appear to have come from the real email address
of the commiter, not from e.g. zooey@xxxxxxxxxxxxxxxx, as that address
is not working
+ commit messages will be sent to haiku-commits@xxxxxxxxxxxxx instead of
haiku-commits@xxxxxxxxxxxxxxxx (all subscribers are going to be moved to
that list before we switch to the new subversion repo)
+ every commit will be synced immediately to the readonly subverson repo
on the dev VM (for trac)
++++++++++++++++++++++++++
the dev VM (188.40.89.174)
++++++++++++++++++++++++++
This VM is going to replace dev.haiku-os.org (haiku's trac installation).
These are the external services:
- sshd allowing remote login for the sysadmins
- apache serving a *test* installation of trac at http://vmdev.haiku-os.org.
It is work-in-progress.
Additionally, there are internal services, listening on localhost only, or
not reachable via network at all:
- apache serving the web-frontend of the collectd system state logger, meant
for admins to get an idea about the state/load of the server
- the postfix mail server handles outbound mails, such that the server can
send administrative mails to haiku-sysadmin@xxxxxxxxxxxxx
++++++++++++++++++++++++++
the web VM (188.40.89.175)
++++++++++++++++++++++++++
This VM is going to replace www.haiku-os.org, but it doesn't even exist yet.
**********************
Regular Administration
**********************
++++++++++++++++++++++++++++
haiku-sysadmin@xxxxxxxxxxxxx
++++++++++++++++++++++++++++
All potential sysadmins (people that can sudo to root on the systems) are
expected to be subscribed to that mailing list, such that they can follow any
discussion and can learn and (hopefully) do something about any problems.
However, that list is not limited to sysadmins, everyone interested is
invited to join it.
Currently, the following people have the permissions to sudo to root on all
systems:
Axel, Ingo, Matt, Niels, Urias and myself
So we have 6 admins spread across the US and Europe.
N.B.: this list is open for discussion, so if you'd like to get yourself or
someone else onto that list (or if you'd like to get yourself removed),
please tell.
+++++++++++++
Daily Summary
+++++++++++++
Every system sends a daily summary mail to haiku-sysadmin@xxxxxxxxxxxxx,
which contains the following info:
- info about current uptime, load, disk and network state
- any pending software updates (security patches) available via the update
software repository
- the result of any backups
- the result of any other synchronization process (e.g. svnmirror)
++++++++++++
Backup Plans
++++++++++++
Currently, a three-fold backup plan has been implemented:
ROTATED FOLDER BACKUP (OFF-SITE, PUSH)
Every system (baron, svn and dev) runs a cron-job around 6:30 CEST that
creates a zipped tar archive of the system folders that are considered
important and copies that archive over to Matt Madia's server.
Once there, each archive is being rotated into a historical archive span
such that the last 50 (or 100) days are preserved as well as the state of
every month's 1st. This should protect us against any erraneous deletions or
other changes which weren't noticed immediately.
The following folders are being archived:
/etc - as it contains vital setup information
/root - as it contains the admin-notes
/home - as it contains the public ssh-keys of all users
/var/log - as it contains logfiles that might be important to have if
anything
goes seriously wrong
Additionally, on the svn VM, the administrative section of /srv/svn
(everything except the repo-data itself) is being archived, too.
FULL SYSTEM BACKUP (ON-SITE, PUSH)
Once per day around 6:00 CEST, every system is rsynced more or less
completely (leaving out only recreatable data) to a specific folder on baron.
From that sync folder, once per week for every system (spread across
weekdays) an archive is created and encrypted with root's gpg key and then
copied to our
backup space (100 GB) at Hetzner. These archives are small enough such that
all should fit into those 100 GB easily.
The purpose of these archives is to be able to restore complete systems
swiftly.
FULL SYSTEM BACKUP (OFF-SITE, PULL)
Around 7:00 CEST, my own (private) server synchronizes an internal copy of
every system via rsync (from the syncs folder on baron, so the VMs do not
have to bear the traffic). Since that makes use of rsyncs hard link
optimization feature, every day of the respective systems history should be
available.
This is done to protect the data against any malicious attacks from within
the server, which could potentially destroy all backups on the ftp backup
space and on Matt's server, too. As this is a pull-only backup, there's
hardly a way for the attacker to get into my server and destroy this data,
too.
However, restoring the data would take a considerate amount of time, as I've
only got a 1 Mbit upstream.
***********************
Disaster recovery plans
***********************
Of course, there's a lot of things that can go wrong, so there can't be a
complete list, but I thought it would be a good idea to know beforehand what
to do in the most likely failure situations.
As a result, I am going to collect and describe recovery plans for different
likely problems and put those, along with other administrative info, into the
'admin-notes' folder in root's home.
Basically, all of these plans will describe the most efficient way to restore
whole systems or single services from all the different available backups.
Phew! - that's it for now, I'm sure we'll learn a lot more during the process
of migrating all the services to baron. Keep your fingers crossed ;-)
I'll try to keep you informed, but if you have any questions/suggestions,
please bring them forward.
cheers,
Oliver
------------------------------------------------------------------------------
Other related posts: