[openbeosstorage] Re: DiskDevice API v2.1

From: "Ingo Weinhold" <bonefish@xxxxxxxxxxxxxxx>
To: openbeosstorage@xxxxxxxxxxxxx
Date: Thu, 10 Apr 2003 00:37:19 +0200 CEST
"Axel Dörfler"  <axeld@xxxxxxxxxxxxxxxx> wrote:
> "Ingo Weinhold" <bonefish@xxxxxxxxxxxxxxx> wrote:
> > > I thought the compatibility argument came from suggesting to 
> > > shift 
> > > the superblock forward to make room for the log, though. Wouldn't 
> > > that confuse BFS?
> > I thought of putting the log between the superblock and the bitmap, 
> > so 
> > that it wouldn't need to be moved, when the partition is resized. I 
> > don't know, how BFS locates the bitmap (an entry in the superblock 
> > or 
> > a 
> > fixed position?) -- perhaps that would even be possible without 
> > breaking compatibility.
> 
> Sure, it would be possible. Only the bitmap has a fixed location on 
> disk - it starts at block numer 1 (so the exact position depends on 
> the 
> block size used in BFS).

Er, I actually wanted to put it between the superblock and the bitmap. 
That doesn't seem to be possible, then.

> The position of the log is written down in the super block. You could 
> even change the size afterwards, if you wanted to (note, I haven't 
> checked that the original BFS would work with a different log size, 
> but 
> I would be very suprised if it didn't).

That's the FSs log, if I understand you correctly. I don't think, it's 
a good idea to mix partitioning log and the one of the FS.

> > > > I'd say it's almost impossible to do with the way we are 
> > > > booting.
> > > > We currently have almost 800 bytes to load the 2nd stage boot 
> > > > loader from a BFS disk. Now add locating (any chunks) and 
> > > > parsing 
> > > > of that log file to it, and you'll undoubtely would need about 
> > > > 16 
> > > > kB more :)
> > Firstly, I wouldn't not allow that at any time there isn't a non-
> > contiguous log (or if I would, it would need to be *very* easy to 
> > reconstruct the complete one (e.g. the first part could contain a 
> > pointer to where the second part starts)). Secondly, we have that 
> > 16 
> > KB 
> > of code, if we like. Who says, that the meta data space could only 
> > contain logging data? We can put as many code into it as we want 
> > to. 
> > I 
> > even think, we have to put code into it, since where else would the 
> > code replaying the log come from, if the system went down while 
> > resizing the boot partition?
> 
> That's right, I would have placed code like that into the spare 
> partition, so that would also go into that area when it's directly on 
> the moved partition.

OK.

> > > > And worst case could be that these 800 bytes are also split in 
> > > > half
> > > > which would be really impossible to fix.
> > > Sorry, you kinda lost me here.
> > I'd understand it like the system died while moving the partition, 
> > and 
> > only the first block of it has already been moved. I believe, this 
> > can 
> > be prevented, though.
> 
> Not really. The hard disk kinda guarantees (or at least everybody 
> assumes) that reading/writing one block is an atomic operation.
> Now, the boot block is always the block that the BIOS will jump to. 
> Updating the partition block entry to the new location is not a good 
> idea, because other operating systems shouldn't boot without being 
> aware that everything is currently nuts. But of course, that's always 
> not a good idea. 
> A (non-movable) spare partition would make sure that OpenBeOS will be 
> booted as long as the partition change is not completed (because it 
> can 
> replace the MBR with a version that will load it).
> Although, now that I wrote it, a similar solution might also be 
> possible with an area on the moved partition. But it would require an 
> untouched area on the disk that can be used while the job is being 
> performed (which would be very similar to a spare partition).

That's a good idea! And no, I don't think it would be similar to a 
spare partition, since that space is only needed temporarily while the 
partitioning jobs are being performed.

So, how about the following procedure:

1. Re-partitioning requests are issued. An info is written to a file on 
the boot partition.
2. The systems asks the partitioning system for a certain amount of 
contiguous free space on the disk.
2. a) No free space. The system asks a non-affected FS (respectively 
one after another) to reserve a contiguous chunk of space. If that also 
fails, the requests will either not be journalled or not be processed 
at all.
3. The info about the reserved space is also written to the file on the 
boot partition.
4. All drivers/code to recover the disk in case of interruption is 
written to the reserved space, as well as a description of the disk's 
partition layout.
[An interruption until this point won't have any destructive effect.]
5. The MBR is changed, so that only one partition exists, which is 
located at the reserved space.
[An interruption and reboot from this point on will cause the temporary 
partition to be booted. It contains all code and data to recover the 
disk, and so it will do.]
6. The requests are processed. For logging space on the temporary 
partition is used.
7. The requests are finished. The MBR is restored (respectively set to 
whatever it should look like now).
8. The file on the boot partition is updated. The reserved space is 
freed.

If the boot partition is not affected by the changes, the whole process 
can be simplified by logging to a file on the boot partition. For 2. 
there are some more options. E.g. resizing a partition doesn't need 
external logging, if the FS is journalled anyway. And if more than one 
partition shall be changed, the changes can be done consecutively, so 
that the log can be relocated for each job.

It may be a bit of a challenge to generalize such a process to work 
with arbitrary partitioning systems.

> > I still believe, you can't have a software RAID setup on the boot 
> > partition, if the BIOS doesn't support it. So, if the RAID layout 
> > info 
> > was in that chunk, which itself is located on the boot partition, 
> > the 
> > boot partition can't live on a RAID disk, anyway.
> 
> Also don't forget that the file system doesn't know about RAID. It's 
> the level below the file system that achieves RAID.
> You need to have a certain block of data on a hard drive in the 
> software RAID that boots the system and know about the RAID 
> structure.

I guess, something like that I wanted to say. :-)

CU, Ingo
Follow-Ups:
- [openbeosstorage] Logging/RAID
  - From: Tyler Dauwalder
References:
- [openbeosstorage] Re: DiskDevice API v2.1
  - From: Axel =?iso-8859-1?q?D=F6rfler
[openbeosstorage] Re: DiskDevice API v2.1

Other related posts: