[haiku-development] Defining the Haiku UUID for GPT and other uses

From: André Braga <meianoite@xxxxxxxxx>
To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
Date: Thu, 14 May 2009 09:18:01 -0300
Hello,

I was thinking of what would be the best way to generate UUIDs under a
Haiku namespace.

The most straightforward way to do this would probably be:

$ uuid -v 5 "ns:DNS" "haiku-os.org"
0ddd4753-72de-578d-baa0-eb66ed464aca

and then we define this UUID as the namespace for all objects related to Haiku.

While there's no logical reason why anyone would object this (me
included!), there are some semantic issues involved, and some rather
emotional ones as well ;)

For example: when generating objects under the Haiku namespace, should
we use it as a namespace in the canonical sense, or should we just
increment the timestamp field?

Looking at the examples found on these URLs:
http://fxr.watson.org/fxr/source/sys/gpt.h?v=FREEBSD#L71
http://fxr.watson.org/fxr/source/common/sys/efi_partition.h?v=OPENSOLARIS#L77
http://en.wikipedia.org/wiki/GUID_Partition_Table#Partition_type_GUIDs

We notice that:
    -    The majority of vendors use "sequential" Version 1 UUIDs
(usually by incrementing the timestamp field) to represent different
partition labels, and leave most of the rest of the UUID bytes intact;
    -    Most other vendors use Version 4 (which would be random-number based);
    -    Apple uses its own UUID "version" (ahem.);
    -    and at least one UUID is actually an ASCII string! The UUID
for a BIOS Boot Partition is 21686148-6449-6E6F-744E-656564454649, and
when you byteswap the first 8 octets this string spells
"Hah!IdontNeedEFI". Who said those partition guys have no sense of
humour? ;D


Back to what concerns Haiku, let's explore the possibilities.
Using the previously generated UUID as a proper namespace, we could
generate further UUIDs like this (hypothetical BFS versions assumed
just for exemplification purposes):

$ uuid -v5 0ddd4753-72de-578d-baa0-eb66ed464aca "BFS v1 Volume"
2ba1f451-d675-5762-8757-62c0381257a5
$ uuid -v5 0ddd4753-72de-578d-baa0-eb66ed464aca "BFS v2 Volume"
665acffd-32ac-5bd5-bb91-b335cbe3db6e
$ uuid -v5 0ddd4753-72de-578d-baa0-eb66ed464aca "BFS v3 Volume"
78ed21f8-a44f-5bef-b22d-e78262d7d354

On the other hand, by using a simple increment (which is still legal
under UUIDv5, only it's possibly also generated by another object...
which could hypothetically lead to a UUID collision somewhere in the
Universe, given infinite time), we could #define that the following
UUID corresponds to a volume formatted with BFS:
v1: 0ddd475*4*-72de-578d-baa0-eb66ed464aca
v2: 0ddd475*5*-72de-578d-baa0-eb66ed464aca
v3: 0ddd475*6*-72de-578d-baa0-eb66ed464aca

Advantages of the proper NS: there is a well-defined and unambiguous
way of generating the UUID of any object.
Disadvantages of the proper NS: the namespace itself is lost on the
generated UUID, and it ends up looking completely random...

Advantages of the simple increment: you could consider the namespace
UUID as defining the beginning of a range of names that exist under
the Haiku namespace, and you can easily see what objects belong to
what namespace by looking at the UUID suffix.
Disadvantages: the UUIDs for said object have to be #defined, since
there's no way to derive them from the root namespace mnemonically.


Alternatively...

$ uuid -d 1ed80000-bb23-1601-802a-4861696b7521
encode: STR:     1ed80000-bb23-1601-802a-4861696b7521
        SIV:     40998376052907112114899479663169205537
decode: variant: DCE 1.1, ISO/IEC 11578:1996
        version: 1 (time and node based)
        content: time:  2001-08-31 00:00:00.000000.0 UTC
                 clock: 42 (usually random)
                 node:  48:61:69:6b:75:21 (global unicast)

This UUID has some nice properties, some delightfully easter-egg-ish:
    -    The time field relates to the foundation of the OpenBeOS
project (somethime in August 2001, but I couldn't pinpoint the exact
day if it is at all known, so I just set it to the last day of the
month);
    -    The clock field is set to 0x2a, which:
        o    Is the ASCII code for the asterisk
        o    Represents the decimal 42, otherwise known as the
AttUQoLtUaE, and also the hex code for the letter B in ASCII! ;D
    -    And the node spells "Haiku!" in ASCII, hex notation. ;D

Other advantages of using this UUID is that it's trivial to recreate
it given knowledge of how it was generated, and because of it being
Version 1 it's safe to set the timestamp field (first 4 octets) to
whatever we wish, giving us four billion unique objects as long as we
don't care much about some seven minutes in the middle of the night.
;)

Treating bb23-1601-802a-4861696b7521 as a suffix, 0x42465331 could be
used to represent BFS v1 volumes, 0x424d5347 would represent
BMessages, and versionable objects in general could follow a XXXXXXNN
convention.

In other words, by slightly subverting the (completely backwards!)
rules of UUIDv1 that mandate the node field to represent a MAC
address, and as long as we cram semantics on the first four bytes
(which is usual in the BeOS world, given the "what" field of a
BMessage), we can keep the best of both worlds. Another cool thing
still analogous to the "what" field is that, as long as we know what
types of object to expect when reading the UUID, we get to know
whether the data identified by this UUID is stored in big or little
endian "for free".

And I did mention that those look *gorgeous* in a hex dump, didn't I? :)
4246 5331 bb23 1601 802a 4861 696b 7521    BFS1.#...*Haiku!
3153 4642 bb23 1601 802a 4861 696b 7521    1SFB.#...*Haiku!

What do you think?


Cheers,
A.
Follow-Ups:
- [haiku-development] Re: Defining the Haiku UUID for GPT and other uses
  - From: Rob Judd
[haiku-development] Defining the Haiku UUID for GPT and other uses

Other related posts: