[haiku-commits] Re: haiku: hrev47567 - src/kits/interface

  • From: pulkomandy <pulkomandy@xxxxxxxxxxxxx>
  • To: haiku-commits@xxxxxxxxxxxxx
  • Date: Mon, 28 Jul 2014 21:21:23 +0200

> > At least you learned something about FBC in the process which was the only
> > reason not to fix it myself. Hopefully it'll last a bit.
> 
> No, I haven't really, it's still a black box to me. I have yet to find
> a definitive resource on the problem. I looked in "Effective C++"
> Third Edition by Scott Meyer's and "The C++ Programming Language"
> Forth Edition by Bjarne Stroustrup and neither mention the fragile
> base class (aka fragile inheritance) problem nor detail what modifies
> the size and vtable of a class and how.

That's because it isn't a problem with the language, but with the
available implementations. These books describe only the language and
not the ABI which goes with it. This is also because C++ was originally
designed as a language for complete systems (where you compile
everything in a project at once and everything works together). Java
solves some of these problems by having a notion of Interface built in
the language, allowing different compilation units to rely on a stable
interface between each other. C++ has no such thing, and doesn't
disallow the implementation details of one compilation unit to leak into
another (though inline methods, public fields which are referenced by
offset rather than name, allowing allocating objects on the stack with a
size decided at compile-time, etc).

> 
> My understanding thus far is that the problem occurs when changing the
> order of or adding or removing non-inherited virtual methods or adding
> or removing private member variables from the class. However this is
> all derived from conjecture, I would prefer to learn from a definitive
> reference on the subject, but I haven't found one yet.

Actually any member variable, not just private ones.
This is because of how gcc implements things. There is a "vtable"
(virtual function table) which is just a table of all the virtual functions
available in a class. And there is the object itself which is
essentially a struct, with the same padding rules. The visibility of
fields/methods is only checked at compile time, and doesn't matter here.

We have problems with the size of the vtable because of the way
subclassing is done. Suppose we have a base class with one virtual method, the
vtable will look like this (omitting the constructors/destructors for
simplicity):

{
A::foo // offset 0
}

Now we subclass this in class B and add one more virtual method. The
vtable looks like this:

{
A::foo, // offset 0
B::bar  // offset 1
}

Any code using class B will invoke the virtual bar like this:

object->vtable[1]()

If we subclass this again, in a class C, and override B::bar, we get
this vtable for C:

{
A::foo
C::bar
}

Notice that code invoking vtable[1] will call the method in C. This is
how overriding works.

Now some ill-advised coder decides to add a new virtual method to A.
Suddenly our vtable for C looks like this:

{
A::foo
A::bad
C::bar
}

Code using vtable[1] is now calling A::bad! This is an ABI breakage. To
avoid this problem, Be decided to use two strategies. The first one is
to reserve some methods in each class, so the vtable looks like this for
A:

{
A::foo
A::reserved0
A::reserved1
...
}

These methods are never called nor overriden. Thus, they can safely be
replaced in later versions of the class by actual methods:

{
A::foo
A::bad
A::reserved1
...
}

This way we don't shift the offsets for subclasses and everything goes
fine.

The second strategy used at Be comes in when we run out of reserved
fields. It is a bit less flexible. Many classes define a virtual method
called Perform. This method takes a "perform code", and, when used, it
looks like this:

void A::Perform(int opcode, void* args)
{
        switch(opcode)
        {
                case OP_DOSOMETHING: DoSomething(args); return;
                // more cases
        }
}

This can emulate the behavior of virtual methods (you can override
Perform, and have it call other methods in subclasses for some ops). It
replaces the vtable with a switch to dispatch the calls, which is slower
but doesn't have the problems of shifting offsets when adding new
opcodes.

> 
> For instance I added 2 bools (1 byte each) while the class size went
> from 356 bytes to 360 bytes, increasing by 4 bytes. This suggests that
> there is some padding going on, but this isn't clear, is this compiler
> dependent or something that can be relied on? If I were to add another
> bool would the class size increase again, or would it stay 360 bytes
> deducting from the padding? Perhaps it depends on where I add the bool
> as it might pad differently?

It does, just like struct padding works.

If you have

class thing {
bool foo;
int bar;
}

3 bytes of padding are inserted between foo and bar. Up to 3 bools can
be added there without changing the class size and without shifting the
other fields (the offsetof() each field is part of the ABI)

This is needed because the member lookup in gcc C++ ABI is done by
offset in the object, rather than resolving the members by name (there
could for example be an offset_of_thing_foo symbol in the compilation
unit where thing is compiled, and other compilation units would rely on
the symbol, but there is no such thing). Every compilation unit will
parse the .h with the class layout and hardcode the offsets of the
fields everywhere. So we must make sure the fields don't move. So this
is wrong:

// Changing field order breaks the ABI!
class thing {
int bar;
bool foo;
}

The other problem is allocating objects. Once again, the size of the object
could be a symbol from the compilation unit where the class is defined, but
instead it is hardcoded everywhere. So a fixed space is reserved (on the stack
or calling malloc) for the object, and attempting to use more will
crash. Be solved this by adding reserved fields to the classes so there
is some space for them to grow later on. The equivalent trick to the 
Perform method for fields would be to replace the last remaining reserved
field with something like:

class MyClassExtended {
        // Can add more fields here. This class is not part of the public
        // API so its size can change at will.
}

// This is the public class which has run out of reserved fields
class MyClass {
        MyClassExtended* more;
}

> 
> I understand how struct padding works on x86 so I don't need you to
> lecture me on the details, and I assume that class padding works
> similarly, but, that is an assumption, one that I'd like to verify but
> don't have a good resource to do so. I don't want you to explain it to
> me here, but if you could point me to a good reference I'd appreciate
> it.

The only difference is there can be inheritance involved. Just remember
that the following are equivalent as far as padding goes:

class CA {
        bool foo;
}

class CB: public CA {
        bool bar;
}

// sizeof(CB): 8

struct SA {
        bool foo;
}

struct SB {
        SA parent;
        bool bar;
}

// sizeof(SB): also 8

-- 
Adrien.

Other related posts: