
|
[openbeosstorage]
||
[Date Prev]
[06-2003 Date Index]
[Date Next]
||
[Thread Prev]
[06-2003 Thread Index]
[Thread Next]
[openbeosstorage] Re: DiskDevice API 2.x, Kernelland Draft
- From: "Axel Dörfler" <axeld@xxxxxxxxxxxxxxxx>
- To: openbeosstorage@xxxxxxxxxxxxx
- Date: Fri, 06 Jun 2003 16:05:08 +0200 CEST
"Ingo Weinhold" <bonefish@xxxxxxxxxxxxxxx> wrote:
> On Thu, 05 Jun 2003 12:56:21 +0200 CEST "Axel Dörfler" <axeld@pinc-
> software.de> wrote:
> > First of all, I haven't had a deep look at it yet, because I didn't
> > find the time. But I somehow wanted to answer this one now :)
> Hehe. :-)
> You know, of course, that you can't sneak out of having a look at the
> kernel stuff I proposed -- being the main kernel developer, not to
> mention team lead. ;-)
Uh oh, okay, I'll try ;-)
> > As long as we can't change the nested structure, it would be pretty
> > simple, because the partitions are easily identified by their ID -
> > and there are only two methods, moving and resizing, which can
> > easily be differentiated.
> If you mean by changing the nested structure, moving a partition
> within
> the hierarchy (e.g. make it child of another parent), that shouldn't
> be
> allowed, I think. At least it could turn out to be quite tricky.
> Otherwise changing the hierarchy, like creating and deleting
> partitions
> shouldn't be any problem.
I meant the former, and I think that should be simply forbidden :-)
> > Having those shadow partitions (although I don't like the name
> > much,
> > something like "target partitions" would make clearer that the
> > current partitions should be changed
> As I said, I was lacking a better name... :-)
> I find `target partition' a bit general/vague, though.
At least it does point in a direction ;-) target_partition might be a
bit vague, but I really wouldn't understand shadow_partition, if I'd
not knew about it.
But anyway, do I understand the procedure correctly?
First, you'd need to get all current partitions on a disk, but I
somehow don't see how this would be possible using that API? Where do I
get those partition IDs from?
Currently, the standard way to iterate over all disk devices is to
iterate over all entries under the /dev/disk path. But that will return
all disks, if partitioned or not, even if present or not (in the case
of removable media). If you got a device you'd ask for the partitions
using ioctl().
So, assuming we somehow got to those partition IDs, would it be correct
to do this:
prepare_disk_device_modifications(device);
// this will lock the disk device API (or just for this device, if
possible)
// (or even several for a software RAID)
defragment_partition(partition);
resize_partition(partition, 100*1024*1024);
// this will add the jobs to the job list
commit_disk_device_modifications(device, ...);
// this will finally trigger the modifications to be made
// and unlock the API - or will it first process all jobs
// and the unlock the API?
Where do we need the shadow partition name anyway? It doesn't seem to
be part of the user API.
Also, the real user API is the C++ API, right? So the user will never
come across all those user_ prefixes - because if he would, I would
consider dumping them.
And if we have all these int32 IDs we could think about adding a new
type for them, like partition_id.
> > > > initialize_partition() does perhaps need a bit more discussion,
> > > > since there exists the planned fs_initialize_volume() function
> > > > (<be/
> > > > kernel/fs_volume.h>), which has largely intersecting
> > > > functionality (cf.
> > > > userland_interface.h for some more thoughts).
> > > My vote is for ditching fs_initialize_volume() and adding support
> > > for registering files as disk devices. What would the arguments
> > > be
> > > for keeping fs_initialize_volume() (other than the regular file
> > > problem)?
> > No, actually, that would be kinda stupid IMO. A file system needs a
> > device (or file) to initialize its structure on. It starts at 0 and
> > has
> > the length of the whole device (partition or file). Anything else
> > would make it complicated.
> > Now, why should we hide the direct method of initializing a file
> > system, and force the user to get the whole BPartition tree, the
> > need
> > to search for the right partition, disabling the possibility of
> > creating file systems in regular files, etc.
> > I would rather remove the initialize_partition() function, and have
> > something like (I would guess it already exists):
> > status_t BDiskDeviceList::GetPartition(BPartition &partition, const
> > char *deviceName);
> > (dunno if this class would be the right container, though)
> >
> > status_t BPartition::GetDeviceName(char *deviceName);
> >
> > and then just call fs_initialize_volume() using that deviceName.
> Since initialize_partition() is more general -- it also initializes
> partitioning system, not only file systems -- it definitely cannot be
> removed. It even has a quite different semantics, for it doesn't do
> anything destructive immediately, but only operates on shadow
> partitions.
Yeah, I noticed that now :-)
> Regarding fs_initialize_volume(), it would be at best a convenience
> function, nothing more. To reply to your arguments: Your first
> paragraph just doesn't apply. Both calls initialize_partition() and
> fs_initialize_volume() end at the same FS hook, which gets a
> (partition) device path.
What I meant was that the partition must exist at this point in time;
if it does, then there is no problem.
> The latter method is not more direct than the former one. Well, more
> convenient, if you mean that, but not more direct with respect to the
> functions involved. The call cannot be directly passed to the FS, but
> has to go through the disk device manager, since it must be
> coordinated
> with other operations on the disk device. Now things get a bit
> difficult, for fs_initialize_volume() is synchronous, while the disk
> device jobs aren't. Moreover the disk device could be locked by a
> userland API user, so that the thread couldn't even get the job
> scheduled -- it could simply fail in this case, though.
>
> As I mentioned, creating file system in file could by addressed by
> providing an API to register files as disk devices. A worthwile thing
> to do, I think, since that would even allow to initialize
> partitioning
> systems.
>
> To sum it up, I would see fs_initialize_volume() mainly as a
> concenience function for one purpose, to create FSs in files. I would
> discourage application on partition devices (how did the caller get
> hold of the partition device path, anyway?).
Okay, you got me thinking :-) And I am also unsure about the
justification of my previous rant :)
What I completely missed was the fact that it really makes sense to
direct any calls to fs_initialize_volume() through the disk device
manager. It wouldn't be necessary to do so, because as long as there is
a device (in /dev/disk/...), *anybody* can write to it.
But of course, direct access could be dangerous in this case, so
directing that call through the disk device manager would add some
value.
What I also don't understand with this API is how to create a new
partition? And if you want initialize_partition() to accept a file
system and a partitioning system at the same place they had to share
the same namespace, right?
Is there a way for a user application to differentiate between the two?
I would be a little bit surprised if our general "mkfs" would create a
partition on the disk.
We might also add a "name" field to initialize_partition() since almost
all disk systems share this property (IIRC only Intel style
partitioning doesn't have it). I think I would find it cleaner if there
are two functions to do that, anyway, even if embedded in the disk
device manager context. For example, resizing a partition needs *two*
resize calls internally, one to the partition, and one to the file
system which is on that partition.
OTOH I think it would be nice to have an add_partition() function - if
we can do resize_partition(), why shouldn't we be able to do this?
What steps would be involved to create 4 partitions on a given (empty)
disk? I would like something as (may not fit perfectly in the proposed
API, though):
prepare_disk_device_modifications(device);
initialize_disk_system(device);
add_partition(device, "first", 100*1024*1024, "active=true");
add_partition(device, "second", 50*1024*1024, NULL);
...
commit_disk_device_modifications(device);
As you mentioned, the only problem which we would still have is with
the image files. Though I would really like us to be able to simply
register/publish them as a device, it would also be nice to create a
file system on them without having to register them.
For example, that would be the use for the fs_shell, but perhaps also
other things, although I don't have a good idea right now.
Or to say it this way: I like to have no real differentiation between
block devices and files - they are almost the same from the OS point of
view, why shouldn't we keep this? Registering a file as device would be
certainly a step into the right direction - but it would be even nicer
if that wouldn't be necessary at all.
We would also need to differentiate between file system images and disk
images - but I guess we're doing this anyway.
Well, I think it would be okay to not consider the fs_shell case, but
to have mkfs register files automatically - the only thing we should
support, though, is that we should continue to be able to address this
file using its standard path. I.e. something like:
$ mkfs test.image
$ makebootable test.image
Shouldn't fail even if "makebootable" needs a device to operate on.
Dunno how we should do that right now, though. Maybe it's not even a
top priority - but it'd be nice to have it.
> > We can cancel all jobs - there would pop up a requester which says
> > "Canceling the operation will recreate the initial state - this can
> > take a while", and just reverse the thing.
> > Shouldn't be too hard, at least not for moving a partition around.
> Yep, for that task it would work. OTOH, something like initializing a
> partition is not so easy to be undone. ;-) Even more fun it will be,
> when several jobs are in progress in parallel.
Sure :-)
> > How ugly it gets for resizing a partition would be the job of the
> > file system to judge on. But since we already need to have logging
> > for those jobs, reversing the operation at any point shouldn't be
> > impossible (or too ugly) at all.
> We need logging for those jobs? I planned to gracefully leave that
> out
> for R1. :-P
Well, I would also say that we don't implement any logging for this
stuff in R1. *But* the API shouldn't reflect this in any way, I guess.
It should just make sure that the user will have all the information he
needs - like "Pressing cancel will destroy all data on disk" vs.
"Pressing cancel will restore the initial state of the currently
processed job" :-)
Tyler Dauwalder <tyler@xxxxxxxxxxxxx> wrote:
> Yes. I just think it's clearer for someone who's writing their first
> fs/partition add-on if there's an explicit function there for each
> operation rather than making them dig thru documentation or headers
> elsewhere to figure out what operations they can/should support, and
> how they should support them. One could multiplex the entire fs add-
> on
> suite thru a single function, but I think that's a much less
> attractive
> way to do it than the current setup.
>
> So really, I actually prefer the no multiplexing route in both cases;
> I
> just think it's tolerable in the syscalls, since we're basically the
> only ones who'll be using them, and I can see the argument for not
> polluting the syscall namespace.
That would also be the only problem I see - but since *we* are creating
all syscall names ourselves, and we have beautiful names such as
"prepare_disk_device_modifications" I can hardly think of any other use
for that name :-)
Also, having them as separate calls increases the possibilities to
check their arguments.
> > As I said, I was lacking a better name... :-)
> > I find `target partition' a bit general/vague, though.
> I actually really like the name "shadow partitions". I think it does
> a
> pretty good job of capturing the idea of what's going on. :-)
>
> Other ideas:
>
> CreateWorkingPartition()
> CreateEditablePartition()
> CreateWorkingCopyPartition()
> CreateEditableCopyPartition()
> ...
>
> I like shadow partitions just fine, though. :-)
Well, if you insist on it... :-)
> > with other operations on the disk device. Now things get a bit
> > difficult, for fs_initialize_volume() is synchronous, while the
> > disk
> > device jobs aren't. Moreover the disk device could be locked by a
> > userland API user, so that the thread couldn't even get the job
> > scheduled -- it could simply fail in this case, though.
> Yes, this is the main concern I have. fs_initialize_volume() needs to
> play nice with the rest of the DiskDevice API if it's kept around.
> Besides, who's really going to use fs_initialize_volume() anyway?
> Isn't
> it only programs like mkbfs...? It seems to me that those should be
> rewritten to use the more native, DiskDevice API anyway. It really
> wouldn't be that much more work, and one could write a nice, general
> command-line initialization app that would work with any partition/fs
> add-on on the system that supports initalize_partition() using the
> DiskDevice API.
That was the plan anyway, as you could derive from the existance of the
fs_initialize_volume() function - it didn't exist in R5.
> > To sum it up, I would see fs_initialize_volume() mainly as a
> > concenience function for one purpose, to create FSs in files. I
> > would
> > discourage application on partition devices (how did the caller get
> > hold of the partition device path, anyway?).
> I agree.
Well, the standard method to get a device is by looking in the /dev
path - and this is perfectly legal, we can't do anything against this,
nor should we.
OTOH we should make sure, that at least our API plays nicely together.
> > We need logging for those jobs? I planned to gracefully leave that
> > out
> > for R1. :-P
> Yes, let's get this clear. First I thought we were leaving it out for
> R1, then I thought it was in, then out again, and now... :-) Which is
> it?
Logging possibility on file system level should always be there, its
implementation doesn't have to be, though (even in R2).
The partition resizing stuff itself doesn't have to be logged for R1,
IIRC :)
> > > It could be nice to have this, and it would also be a bit faster.
> > > I
> > > am not sure if it's rendering the API inconsistent there, though.
> > > One
> > > could argue that the partition related functions (not
> > > initialize())
> > > are a bit separated from the rest anyway.
> > Either way would work for me. So, if you have any preferences...
> I prefer passing the device. :-)
Okay, so let's do that, then :-)
Adios...
Axel.
|

|