[openbeosstorage] Re: DiskDevice API 2.x, Kernelland Draft

> > Here's comments on the discussion. I'll try to address the inline
> > questions and comments in the headers tomorrow.
> 
> That'd be great. :-)

Sorry, I didn't get home until late tonight and I'm beat after replying 
to this behemoth. Tomorrow. :-)

> > All in all, I like it. :-) Aside from the few questions I have 
> > below,
> > I
> > say go ahead and start on it. I'm willing to help if you're
> > interested;
> > I have finals next week, so may not find too much time this weekend,
> > but I bet I could dig up a little.
> 
> No need to sweat it. :-)
> I can start on my own, and you can join me when RL leaves you more
> time. No problem at all.

Okay. :-)

> > I'd also like to finish up the
> > kernel utils proposal I'm working on, though; those could be useful
> > along the way.
> 
> Yep, that would certainly be nice.

Okay, I will focus on the kernel utils and get that stuff done then.

> > > 1.4 Modifications
> > > 
> > > The C++ API is mapped in a straight forward way to respective
> > > syscalls.
> > > We have the `meta' calls prepare_disk_device_modifications(),
> > > commit_disk_device_modifications(),
> > > cancel_disk_device_modifications(),
> > > and is_disk_device_modified() (corresponding to
> > > BDiskDevice::CommitModifications(),...), and a syscall per
> > > modifying
> > > BPartition method.
> > 
> > I don't undertand what the shadow parameter to these functions is
> > for.
> 
> Mmh, good question. I was just a bit too enthusiastic, when adding 
> that
> parameter. It first seemed obvious to me that every call involving a
> partition would also need to specify whether that would be a shadow
> partition or not. I then removed it from almost all functions, after
> realizing, that in most cases this is totally clear, anyway. I guess, 
> I
> just overlooked these functions, since for them it is obvious, too.

Okay. :-)

> > > To save some functions, the supports_() and validate_*() functions
> > > could be multiplexed to supports_partition_operation() and
> > > validate_partition_operation(). A proposal, what the parameters 
> > > for
> > > these functions could be, is given in
> > > supports_validates_parameters.h.
> > 
> > I don't like the idea of multiplexing here, mostly because everyone
> > who
> > writes a module will have to deal with parameter interpretation. If
> > we
> > have separate functions, we write interpretation code that's run
> > after
> > the syscalls enter kernelland, and module implementors simply
> > implement
> > the necessary functions.
> 
> So, if I understand you correctly, you think, the multiplexed syscalls
> are fine, but in the module API no multiplexing should be used?

Yes. I just think it's clearer for someone who's writing their first 
fs/partition add-on if there's an explicit function there for each 
operation rather than making them dig thru documentation or headers 
elsewhere to figure out what operations they can/should support, and 
how they should support them. One could multiplex the entire fs add-on 
suite thru a single function, but I think that's a much less attractive 
way to do it than the current setup.

So really, I actually prefer the no multiplexing route in both cases; I 
just think it's tolerable in the syscalls, since we're basically the 
only ones who'll be using them, and I can see the argument for not 
polluting the syscall namespace.

> > Tyler Dauwalder <tyler@xxxxxxxxxxxxx> wrote:
> > > > When BDiskDevice::CommitModifications() (the bracketing method
> > > > for
> > > > PrepareModifications()) is invoked, the shadow hierarchy is
> > > > compared
> > > > with the original one and respective jobs (KDiskDeviceJob) are
> > > > created
> > > > and scheduled (put into a KDiskDeviceJobQueue).
> > > This seems a bit odd to me. Shouldn't the shadow hierachy just 
> > > keep
> > > track of the appropriate job queue necessary to reach the state of
> > > the
> > > shadow hierarchy from the state of the original hierachy? I don't
> > > see
> > > how one could easily infer a proper scheduling job sequence from
> > > just
> > > the beginning and ending states of the hierarchy.
> > 
> > As long as we can't change the nested structure, it would be pretty
> > simple, because the partitions are easily identified by their ID -
> > and
> > there are only two methods, moving and resizing, which can easily be
> > differentiated.
> 
> If you mean by changing the nested structure, moving a partition 
> within
> the hierarchy (e.g. make it child of another parent), that shouldn't 
> be
> allowed, I think. At least it could turn out to be quite tricky.
> Otherwise changing the hierarchy, like creating and deleting 
> partitions
> shouldn't be any problem.

No, I was also assuming a reasonable set of rules as you were. I just 
didn't think it through well enough.

> > Having those shadow partitions (although I don't like the name much,
> > something like "target partitions" would make clearer that the
> > current
> > partitions should be changed
> 
> As I said, I was lacking a better name... :-)
> I find `target partition' a bit general/vague, though.

I actually really like the name "shadow partitions". I think it does a 
pretty good job of capturing the idea of what's going on. :-)

Other ideas:

CreateWorkingPartition()
CreateEditablePartition()
CreateWorkingCopyPartition()
CreateEditableCopyPartition()
...

I like shadow partitions just fine, though. :-)

> >) would greatly simplify the handling of
> > the job queue from userland.
> > That way, as a user, you can fiddle around with the partitions to no
> > end; there is no reason you change them directly from one state to
> > another. You first shrink one partition, then move another one
> > around,
> > then shrink the first one a bit more, etc.
> 
> Yes, that's what I had in mind.

Yeah, that actually would be really nice. I have a tendency to think 
too much of how Partition Magic works, which is not always the most 
convenient way. :-)

> Regarding infering what jobs need to be executed to come from the
> current disk device state to the specified one, that shouldn't be a 
> big
> problem. Which partitions need to be created and deleted, directly
> follows from which shadow partitions don't have a current partition or
> which current partitions don't have a shadow partition. Comparing
> positions and sizes of partitions is easy, and the parameters 
> shouldn't
> cause trouble either. For defragmenting, reparing and initialization
> (interesting, if with the same disk system) simple flags will do.
> 
> The basic assumption is, that the operations on partitions commute
> nicely. At least I couldn't think of any situation where that wouldn't
> be the case. If we indeed encounter a problem with that, it should be
> easy to also keep track of the exact operations the user requested.

No, neither could I find such a case today. I just didn't try any of 
the examples I had in my head last night, and it was late enough that I 
should have. :-)

> > [...]
> > > > initialize_partition() does perhaps need a bit more discussion,
> > > > since
> > > > there exists the planned fs_initialize_volume() function (<be/
> > > > kernel/
> > > > fs_volume.h>), which has largely intersecting functionality (cf.
> > > > userland_interface.h for some more thoughts).
> > > My vote is for ditching fs_initialize_volume() and adding support
> > > for
> > > registering files as disk devices. What would the arguments be for
> > > keeping fs_initialize_volume() (other than the regular file
> > > problem)?
> > 
> > No, actually, that would be kinda stupid IMO. A file system needs a
> > device (or file) to initialize its structure on. It starts at 0 and
> > has
> > the length of the whole device (partition or file). Anything else
> > would
> > make it complicated.
> > Now, why should we hide the direct method of initializing a file
> > system, and force the user to get the whole BPartition tree, the 
> > need
> > to search for the right partition, disabling the possibility of
> > creating file systems in regular files, etc.
> > I would rather remove the initialize_partition() function, and have
> > something like (I would guess it already exists):
> > status_t BDiskDeviceList::GetPartition(BPartition &partition, const
> > char *deviceName);
> > (dunno if this class would be the right container, though)
> > 
> > status_t BPartition::GetDeviceName(char *deviceName);
> > 
> > and then just call fs_initialize_volume() using that deviceName.
> 
> Since initialize_partition() is more general -- it also initializes
> partitioning system, not only file systems -- it definitely cannot be
> removed. It even has a quite different semantics, for it doesn't do
> anything destructive immediately, but only operates on shadow
> partitions.
> 
> Regarding fs_initialize_volume(), it would be at best a convenience
> function, nothing more. To reply to your arguments: Your first
> paragraph just doesn't apply. Both calls initialize_partition() and
> fs_initialize_volume() end at the same FS hook, which gets a
> (partition) device path.
> 
> The latter method is not more direct than the former one. Well, more
> convenient, if you mean that, but not more direct with respect to the
> functions involved. The call cannot be directly passed to the FS, but
> has to go through the disk device manager, since it must be 
> coordinated
> with other operations on the disk device. Now things get a bit
> difficult, for fs_initialize_volume() is synchronous, while the disk
> device jobs aren't. Moreover the disk device could be locked by a
> userland API user, so that the thread couldn't even get the job
> scheduled -- it could simply fail in this case, though.

Yes, this is the main concern I have. fs_initialize_volume() needs to 
play nice with the rest of the DiskDevice API if it's kept around. 
Besides, who's really going to use fs_initialize_volume() anyway? Isn't 
it only programs like mkbfs...? It seems to me that those should be 
rewritten to use the more native, DiskDevice API anyway. It really 
wouldn't be that much more work, and one could write a nice, general 
command-line initialization app that would work with any partition/fs 
add-on on the system that supports initalize_partition() using the 
DiskDevice API.

> As I mentioned, creating file system in file could by addressed by
> providing an API to register files as disk devices. A worthwile thing
> to do, I think, since that would even allow to initialize partitioning
> systems.

Yes, I really like this idea.

> To sum it up, I would see fs_initialize_volume() mainly as a
> concenience function for one purpose, to create FSs in files. I would
> discourage application on partition devices (how did the caller get
> hold of the partition device path, anyway?).

I agree.

> > > > As written in KDiskDeviceJob.h, I'm not sure if we want to add a
> > > > Cancel() method to cancel a job in progress. Scheduled jobs can
> > > > easily
> > > > be canceled by removing them from the job queue. I just realize,
> > > > that
> > > > we don't provide any methods in the userland API for canceling a
> > > > job.
> > > Yes, I think we should support cancel, for things like defragging
> > > that
> > > can be interrupted. But we should also allow the job to specify if
> > > it
> > > allows cancelling or not.
> 
> Don't know, if that's necessary. Jobs, that can't be cancelled, will
> simply finish their task.

So are you proposing we just always display cancellation as an option, 
and if the task doesn't support it, things just keep quietly plugging 
away when you click the "Cancel" button? I think we should offer the 
user some indication of which tasks can be canceled gracefully and 
which can't.

> > We can cancel all jobs - there would pop up a requester which says
> > "Canceling the operation will recreate the initial state - this can
> > take a while", and just reverse the thing.
> > Shouldn't be too hard, at least not for moving a partition around.
> 
> Yep, for that task it would work. OTOH, something like initializing a
> partition is not so easy to be undone. ;-) Even more fun it will be,
> when several jobs are in progress in parallel.

Okay, I can see supporting reversing for the currently running task 
only, even in the case of multiple simultaneous tasks (they *are* 
disjoint by definition, so it shouldn't matter if they're running 
simultaneously or not), but I don't know that we can expect every 
implementation to be so thorough as to implement graceful cancellation, 
let alone reversing, for every operation. I'd like a way to know 
whether an operation is uninterruptible, cancellable, or reversible, 
and then indicate this through the UI somehow. 

> > How ugly it gets for resizing a partition would be the job of the
> > file
> > system to judge on. But since we already need to have logging for
> > those
> > jobs, reversing the operation at any point shouldn't be impossible
> > (or
> > too ugly) at all.
> 
> We need logging for those jobs? I planned to gracefully leave that out
> for R1. :-P

Yes, let's get this clear. First I thought we were leaving it out for 
R1, then I thought it was in, then out again, and now... :-) Which is 
it?

> > > > 3.3 FS Add-Ons
> > > > 
> > > > The FS interface functions are similar to the ones of partition
> > > > module
> > > > interface, except for a few differences: To fit better into the
> > > > rest
> > > > of the FS add-on interface, none of functions is passed a file
> > > > descriptor, but a path of the partition device instead.
> > > Wouldn't it be better to just open the proper device once and pass
> > > it
> > > around? This way you could also ensure the device is only accessed
> > > read-only during scanning.
> > 
> > It could be nice to have this, and it would also be a bit faster. I
> > am
> > not sure if it's rendering the API inconsistent there, though. One
> > could argue that the partition related functions (not initialize())
> > are
> > a bit separated from the rest anyway.
> 
> Either way would work for me. So, if you have any preferences...

I prefer passing the device. :-)

-Tyler

Other related posts: