Go to the FreeLists Home Page Home Signup Help Login
 



[openbeosstorage] || [Date Prev] [06-2003 Date Index] [Date Next] || [Thread Prev] [06-2003 Thread Index] [Thread Next]

[openbeosstorage] Re: DiskDevice API 2.x, Kernelland Draft

  • From: "Ingo Weinhold" <bonefish@xxxxxxxxxxxxxxx>
  • To: openbeosstorage@xxxxxxxxxxxxx
  • Date: Fri, 06 Jun 2003 03:41:14 +0200 CEST
On Wed, 04 Jun 2003 23:54:03 -0700 Tyler Dauwalder <tyler@xxxxxxxxxxxxx
> wrote:
> Here's comments on the discussion. I'll try to address the inline 
> questions and comments in the headers tomorrow.

That'd be great. :-)

> All in all, I like it. :-) Aside from the few questions I have below, 
> I 
> say go ahead and start on it. I'm willing to help if you're 
> interested; 
> I have finals next week, so may not find too much time this weekend, 
> but I bet I could dig up a little.

No need to sweat it. :-)
I can start on my own, and you can join me when RL leaves you more 
time. No problem at all.

> I'd also like to finish up the 
> kernel utils proposal I'm working on, though; those could be useful 
> along the way. 

Yep, that would certainly be nice.

[...]
> > Disk device data can be retrieved via get_disk_device_data(). If a
> > buffer large enough is supplied, the disk device manager stores the
> > complete user_disk_device_data/user_partition_data tree into it.
> > Otherwise it fails and returns the required buffer size in 
> > neededSize.
> > The caller should allocate a buffer of that size and retry. Since
> > there's no locking or something like that, that prevents the kernel
> > structures from being changed, the second call might fail again. 
> > The
> > game continues until it succeeds or some other error occurs. The
> > alternative would be to make locking of the kernel structures 
> > available
> > to the userland. I don't really like that idea, though.
> 
> No, the game idea seems fine to me. If it turns out to really not 
> work 
> well for some reason, we can bite the bullet and add locking.

Yes, that's what I think, too.

[...]
> > 1.4 Modifications
> > 
> > The C++ API is mapped in a straight forward way to respective 
> > syscalls.
> > We have the `meta' calls prepare_disk_device_modifications(),
> > commit_disk_device_modifications(), 
> > cancel_disk_device_modifications(),
> > and is_disk_device_modified() (corresponding to
> > BDiskDevice::CommitModifications(),...), and a syscall per 
> > modifying
> > BPartition method.
> 
> I don't undertand what the shadow parameter to these functions is 
> for.

Mmh, good question. I was just a bit too enthusiastic, when adding that 
parameter. It first seemed obvious to me that every call involving a 
partition would also need to specify whether that would be a shadow 
partition or not. I then removed it from almost all functions, after 
realizing, that in most cases this is totally clear, anyway. I guess, I 
just overlooked these functions, since for them it is obvious, too.

[...]
> > 3.1.1 Scanning
> > 
> > Scanning means identification of on-disk partitions and retrieving
> > information about them. This covers quite exactly the functionality
> > already implemented for the disk_scanner stuff. Identify() asks a 
> > disk
> > system whether it knows about the format of a given partition. It
> > returns a priority, a float value between 0 and 1, indicating how 
> > good
> > the disk system thinks it can deal with the partition (or a value <
> >  0,
> > if it has no clue), and a cookie that can hold arbitrary data 
> > helping
> > to speed up the further process. The disk system with the highest
> > priority is asked to Scan() the partition. It fills out missing 
> > data 
> > of
> > the KPartition (name, type, block size, flags,...) and, in case of 
> > a
> > partitioning system, adds KPartitions for the subpartitions. The 
> > cookie
> > returned by Identify() is freed in Scan(); for the other disk 
> > systems
> > FreeIdentifyCookie() is invoked.
> > 
> > Currently Scan() might set a content cookie for the supplied 
> > partition
> > and a cookie for each created subpartition. When freeing the 
> > KPartition
> > objects (or initializing with another disk system) FreeCookie() and
> > FreeContentCookie() are invoked to free those cookies. I'm not 
> > sure, 
> > if
> > these cookies are such a good idea. They should definitely contain 
> > no
> > additional information -- all information that cannot be 
> > explicitely
> > represented by KPartition are to go into the parameters and content
> > parameters fields. The intention was, that cookies could speed up
> > requests, for data could be represented in a more convenient way.
> 
> I say leave them in for now and let's see how useful they are in the 
> actual implementation. If we leave them in, we just make it clear 
> what 
> they're to be used for.

Yes, it might be a good idea to see how useful they are for the modules 
we implement, first. If they turn out to be of little help, we can drop 
them.

> > To save some functions, the supports_() and validate_*() functions
> > could be multiplexed to supports_partition_operation() and
> > validate_partition_operation(). A proposal, what the parameters for
> > these functions could be, is given in 
> > supports_validates_parameters.h.
> 
> I don't like the idea of multiplexing here, mostly because everyone 
> who 
> writes a module will have to deal with parameter interpretation. If 
> we 
> have separate functions, we write interpretation code that's run 
> after 
> the syscalls enter kernelland, and module implementors simply 
> implement 
> the necessary functions.

So, if I understand you correctly, you think, the multiplexed syscalls 
are fine, but in the module API no multiplexing should be used?

[...]


On Thu, 05 Jun 2003 12:56:21 +0200 CEST "Axel Dörfler" <axeld@pinc-
software.de> wrote:
> First of all, I haven't had a deep look at it yet, because I didn't 
> find the time. But I somehow wanted to answer this one now :)

Hehe. :-)
You know, of course, that you can't sneak out of having a look at the 
kernel stuff I proposed -- being the main kernel developer, not to 
mention team lead. ;-)

> Tyler Dauwalder <tyler@xxxxxxxxxxxxx> wrote:
> > > When BDiskDevice::CommitModifications() (the bracketing method 
> > > for
> > > PrepareModifications()) is invoked, the shadow hierarchy is 
> > > compared
> > > with the original one and respective jobs (KDiskDeviceJob) are 
> > > created
> > > and scheduled (put into a KDiskDeviceJobQueue). 
> > This seems a bit odd to me. Shouldn't the shadow hierachy just keep 
> > track of the appropriate job queue necessary to reach the state of 
> > the 
> > shadow hierarchy from the state of the original hierachy? I don't 
> > see 
> > how one could easily infer a proper scheduling job sequence from 
> > just 
> > the beginning and ending states of the hierarchy.
> 
> As long as we can't change the nested structure, it would be pretty 
> simple, because the partitions are easily identified by their ID - 
> and 
> there are only two methods, moving and resizing, which can easily be 
> differentiated.

If you mean by changing the nested structure, moving a partition within 
the hierarchy (e.g. make it child of another parent), that shouldn't be 
allowed, I think. At least it could turn out to be quite tricky. 
Otherwise changing the hierarchy, like creating and deleting partitions 
shouldn't be any problem.

> Having those shadow partitions (although I don't like the name much, 
> something like "target partitions" would make clearer that the 
> current 
> partitions should be changed

As I said, I was lacking a better name... :-)
I find `target partition' a bit general/vague, though.

>) would greatly simplify the handling of 
> the job queue from userland.
> That way, as a user, you can fiddle around with the partitions to no 
> end; there is no reason you change them directly from one state to 
> another. You first shrink one partition, then move another one 
> around, 
> then shrink the first one a bit more, etc.

Yes, that's what I had in mind.

Regarding infering what jobs need to be executed to come from the 
current disk device state to the specified one, that shouldn't be a big 
problem. Which partitions need to be created and deleted, directly 
follows from which shadow partitions don't have a current partition or 
which current partitions don't have a shadow partition. Comparing 
positions and sizes of partitions is easy, and the parameters shouldn't 
cause trouble either. For defragmenting, reparing and initialization 
(interesting, if with the same disk system) simple flags will do.

The basic assumption is, that the operations on partitions commute 
nicely. At least I couldn't think of any situation where that wouldn't 
be the case. If we indeed encounter a problem with that, it should be 
easy to also keep track of the exact operations the user requested.

> [...]
> > > initialize_partition() does perhaps need a bit more discussion, 
> > > since
> > > there exists the planned fs_initialize_volume() function (<be/
> > > kernel/
> > > fs_volume.h>), which has largely intersecting functionality (cf.
> > > userland_interface.h for some more thoughts).
> > My vote is for ditching fs_initialize_volume() and adding support 
> > for 
> > registering files as disk devices. What would the arguments be for 
> > keeping fs_initialize_volume() (other than the regular file 
> > problem)?
> 
> No, actually, that would be kinda stupid IMO. A file system needs a 
> device (or file) to initialize its structure on. It starts at 0 and 
> has 
> the length of the whole device (partition or file). Anything else 
> would 
> make it complicated.
> Now, why should we hide the direct method of initializing a file 
> system, and force the user to get the whole BPartition tree, the need 
> to search for the right partition, disabling the possibility of 
> creating file systems in regular files, etc.
> I would rather remove the initialize_partition() function, and have 
> something like (I would guess it already exists):
> status_t BDiskDeviceList::GetPartition(BPartition &partition, const 
> char *deviceName);
> (dunno if this class would be the right container, though)
> 
> status_t BPartition::GetDeviceName(char *deviceName);
> 
> and then just call fs_initialize_volume() using that deviceName.

Since initialize_partition() is more general -- it also initializes 
partitioning system, not only file systems -- it definitely cannot be 
removed. It even has a quite different semantics, for it doesn't do 
anything destructive immediately, but only operates on shadow 
partitions.

Regarding fs_initialize_volume(), it would be at best a convenience 
function, nothing more. To reply to your arguments: Your first 
paragraph just doesn't apply. Both calls initialize_partition() and 
fs_initialize_volume() end at the same FS hook, which gets a 
(partition) device path.

The latter method is not more direct than the former one. Well, more 
convenient, if you mean that, but not more direct with respect to the 
functions involved. The call cannot be directly passed to the FS, but 
has to go through the disk device manager, since it must be coordinated 
with other operations on the disk device. Now things get a bit 
difficult, for fs_initialize_volume() is synchronous, while the disk 
device jobs aren't. Moreover the disk device could be locked by a 
userland API user, so that the thread couldn't even get the job 
scheduled -- it could simply fail in this case, though.

As I mentioned, creating file system in file could by addressed by 
providing an API to register files as disk devices. A worthwile thing 
to do, I think, since that would even allow to initialize partitioning 
systems.

To sum it up, I would see fs_initialize_volume() mainly as a 
concenience function for one purpose, to create FSs in files. I would 
discourage application on partition devices (how did the caller get 
hold of the partition device path, anyway?).

> > > As written in KDiskDeviceJob.h, I'm not sure if we want to add a
> > > Cancel() method to cancel a job in progress. Scheduled jobs can 
> > > easily
> > > be canceled by removing them from the job queue. I just realize, 
> > > that
> > > we don't provide any methods in the userland API for canceling a 
> > > job.
> > Yes, I think we should support cancel, for things like defragging 
> > that 
> > can be interrupted. But we should also allow the job to specify if 
> > it 
> > allows cancelling or not. 

Don't know, if that's necessary. Jobs, that can't be cancelled, will 
simply finish their task.

> We can cancel all jobs - there would pop up a requester which says 
> "Canceling the operation will recreate the initial state - this can 
> take a while", and just reverse the thing.
> Shouldn't be too hard, at least not for moving a partition around.

Yep, for that task it would work. OTOH, something like initializing a 
partition is not so easy to be undone. ;-) Even more fun it will be, 
when several jobs are in progress in parallel.

> How ugly it gets for resizing a partition would be the job of the 
> file 
> system to judge on. But since we already need to have logging for 
> those 
> jobs, reversing the operation at any point shouldn't be impossible 
> (or 
> too ugly) at all.

We need logging for those jobs? I planned to gracefully leave that out 
for R1. :-P

> > > 3.3 FS Add-Ons
> > > 
> > > The FS interface functions are similar to the ones of partition 
> > > module
> > > interface, except for a few differences: To fit better into the 
> > > rest 
> > > of the FS add-on interface, none of functions is passed a file 
> > > descriptor, but a path of the partition device instead. 
> > Wouldn't it be better to just open the proper device once and pass 
> > it 
> > around? This way you could also ensure the device is only accessed 
> > read-only during scanning. 
> 
> It could be nice to have this, and it would also be a bit faster. I 
> am 
> not sure if it's rendering the API inconsistent there, though. One 
> could argue that the partition related functions (not initialize()) 
> are 
> a bit separated from the rest anyway.

Either way would work for me. So, if you have any preferences...

CU, Ingo






[ Home | Signup | Help | Login | Archives | Lists ]

All trademarks and copyrights within the FreeLists archives are owned by their respective owners.
Everything else ©2007 Avenir Technologies, LLC.