[openbeos] Re: A new BMessage implementation (Message4)

  • From: Michael Phipps <mphipps1@xxxxxxxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Tue, 01 Nov 2005 18:46:28 -0500

I have a couple of thoughts on this.

I think that BMessages are absolutely critical and it is a good thing that 
someone so devoted and diligent is attacking them. :-D

I honestly think that we need to add to the benchmarks. I would like to see 
something where you pick 10 or so different system messages (i.e. messages that 
the OS creates) and create numbers of those. Example:
timer.start();
for (int i=0;i<10000;i++) {
        createQuitMessage();
        createQuitRequestedMessage();
        createWorkspaceChangedMessage();
        ...
        }
timer.stop();
timer.print();

Truthfully, I think that the initialization time for a BMessage will mean a 
WHOLE lot more than time to add fields because most of the messages out there 
don't use a whole lot of fields. But I could be wrong - that's what these tests 
are about. :-D

Secondly - I am concerned about having a different Flatten/Unflatten format. 
Only because of people (like me) who would like to upgrade their R5 system to 
Haiku by copying a couple of directories over. Losing all of my settings would 
irritate me. :-) Isn't it possible to check the version of the flattened data 
and instantiate it correctly?

Michael
        
On 2005-10-31 at 12:36:26 [-0500], mmlr@xxxxxxxx wrote:
> Hello Everyone
> 
> As you may have seen, there are currently 3 BMessage implementations in the 
> Haiku tree. The original one was written by Erik Jaesler and was based 
> completely on Templates. While this was nice, it did not perform 
> exceptionally well and is, by now, pretty much defunct. From that base I 
> wrote Message2 and Message3 some three months ago. They mostly work (Flatten 
> is broken in Message2, that's the reason for the good benchmarks) and tries 
> to use the R5 Message format. I'd like to introduce another BMessage 
> implementation (Message4). The following shall list the most important 
> advantages/disadvantages of the four implementations.
> 
> Message1: (original as of now)
> + Uses std:: containers
> + Supports R5 message format
> - Does not scale well
> - Slow
> 
> Message2:
> + Uses hash tables for lookups
> + Supports R5 message format
> + Simple
> - Extensive amount of runtime variables
> - Does not scale that well
> - Slow
> 
> Message3:
> + Uses hash tables for lookups
> + Uses one flat buffer
> + Scales very well
> + Supports (direct) unflattening of R5 message format
> + Fast
> - Flattening produces not 100% R5 message format output
> - Has many runtime variables that need to be recreated when unflattening
> - Complex
> 
> Message4:
> + Uses hash tables for lookups
> + Uses three flat buffers for header, field table and data
> + Scales good for most cases
> + Faster, especially for unflattening
> + Fairly straight forward
> - Does not support R5 message format
> - Flattened messages are bigger
> 
> The idea of Message4 was to reduce the amount of runtime variables that need 
> to be stored on flatten or recreated on unflatten operations. An example of 
> this would be the hash table. By including the hash table in the flattened 
> message, the unflatten time is greatly reduced. The flatten concept is to 
> just write the three flat buffers to the output buffer without any further 
> processing (except for storing the public what field). For unflatten only 
> copying the input buffer into the right flat buffers and restoring the what 
> field is enough. This will achieve good results in flatten/unflatten but also 
> generates bigger flattened messages.
> 
> Message4 Format:
> 
> <HeaderBuffer>
>     <MessageHeader>
>         <Format>
>         <Flags>
>         <ReplyInfo>
>         ...
>         <Hashtable>
>     </MessageHeader>
> </HeaderBuffer>
> 
> <FieldsBuffer>
>     <FieldHeader>
>         <Type>
>         <Flags>
>         <NameLength>
>         <Size>
>         <Count>
>         ...
>         <Offset>
>     </FieldHeader>
>     <FieldHeader>
>     <FieldHeader>
>     <preallocated space>
> </FieldsBuffer>
> 
> <DataBuffer>
>     <FieldData>
>         <FieldName>
>         <DataItem1>
>         <DataItem2>
>         <DataItem3>
>     </FieldData>
>     <FieldData>
>         <FieldName>
>         <DataSize1>
>         <DataItem1>
>         <DataSize2>
>         <DataItem2>
>         <DataSize3>
>         <DataItem3>
>     </FieldData>
>     <preallocated space>
> </DataBuffer>
> 
> Message Header
> 
> The message header is fixed size. It contains a hash table (currently set to 
> 10 entries = 40 bytes). It is special in that respect, that the hash table 
> does not contain pointers to the fields, but the index into the FieldHeader 
> array in the FieldsBuffer. While this is slower, it has the advantage that it 
> can be flattened and unflattened with the message and still work without 
> rebuilding it with new pointers. General state information and reply infos 
> are saved in the header too. These could be stored as members, but I think 
> it's cleaner to just copy them around inside a buffer.
> 
> Field Headers
> 
> The FieldsBuffer contains an array of FieldHeaders. The FieldHeader itself is 
> fixed size, since the variable sized name is located in the data buffer and 
> not in the header. The FieldsBuffer is resized lazily and adding fields costs 
> only very few cycles. When removing a field that is not at the top though 
> this implementation is slow, as all the indexes need to be looked through and 
> decremented if they are bigger than the one of the removed header.
> 
> Field Data
> 
> Each fields data starts with the fieldname. Then continues either with an 
> array of fixed size items (for fixed size data types) or a sequence of { 
> ssize_t dataSize; uint8 data[]; } pairs. There is no direct index or offset 
> table for variable sized data, so looking through variable sized data blocks 
> does not scale very well. The DataBuffer is resized with a growing block size 
> (size * 2) up to a maximum of 10 pages (40KB) this should reduce performance 
> problems with very big messages (archiving) but still keep the waste of 
> memory low for small messages. These limits can of course be adjusted when 
> they proof to be inefficient.
> 
> Message4 is implemented as one class (BMessage) by the way. There is no 
> BMessageBody and BMessageField, as this produces just too much overhead in 
> the end. The two header types are implemented as structs. It is fairly simple.
> 
> 
> That is about the design of the format. As you can see, it differs in the 
> layout compared to R5 where you have a message header in front and then pairs 
> of field headers and field data.
> To provide unflatten compatibility with the R5 message format I would suggest 
> writing an R5 message reader like the one Axel wrote for Dano messages.
> We should also adopt the flatten prototypes of Dano, where you can choose the 
> output format. For this an R5 and maybe even a Dano message writer can be 
> implemented. An alternative to this could be to always produce an R5 
> compatible output for Flatten, but internally (for messaging) use the native 
> format. Since we are going to extend BMessage (by new flags or extra 
> information stored in the header for example) the R5 format will not work 
> very long. So going the way Be did when it introduced their new message 
> format in Dano would be a sensible choice I think.
> 
> The implementation of Message4 is pretty much finished. The only thing 
> missing is actual message delivery. This is because of all the BMessage 
> "friends" that are involved and are accessing private BMessage data directly. 
> They need to be adapted to the new message format in that respect, that they 
> cannot access information in BMessage members. I'm currently working on 
> extending the private data accessor and removing all the direct friends. As 
> soon as all private data is accessed through the accessor, messaging should 
> work. Expect the checkin of Message4 to follow in the next few days.
> 
> Please tell me what you think about this concept. How open is everyone 
> involved to just switching the BMessage implementation? What is necessary for 
> Message4 to be used as the default (single) BMessage implementation?
> 
> Benchmarks can be found there: http://haiku.mlotz.ch/messagespeed.html. Bear 
> in mind that Message2 only wins in flatten/unflatten sometimes because it is 
> broken and does not store all the fields. The rest of the implementations 
> should work ok.

Other related posts: