I have a couple of thoughts on this. I think that BMessages are absolutely critical and it is a good thing that someone so devoted and diligent is attacking them. :-D I honestly think that we need to add to the benchmarks. I would like to see something where you pick 10 or so different system messages (i.e. messages that the OS creates) and create numbers of those. Example: timer.start(); for (int i=0;i<10000;i++) { createQuitMessage(); createQuitRequestedMessage(); createWorkspaceChangedMessage(); ... } timer.stop(); timer.print(); Truthfully, I think that the initialization time for a BMessage will mean a WHOLE lot more than time to add fields because most of the messages out there don't use a whole lot of fields. But I could be wrong - that's what these tests are about. :-D Secondly - I am concerned about having a different Flatten/Unflatten format. Only because of people (like me) who would like to upgrade their R5 system to Haiku by copying a couple of directories over. Losing all of my settings would irritate me. :-) Isn't it possible to check the version of the flattened data and instantiate it correctly? Michael On 2005-10-31 at 12:36:26 [-0500], mmlr@xxxxxxxx wrote: > Hello Everyone > > As you may have seen, there are currently 3 BMessage implementations in the > Haiku tree. The original one was written by Erik Jaesler and was based > completely on Templates. While this was nice, it did not perform > exceptionally well and is, by now, pretty much defunct. From that base I > wrote Message2 and Message3 some three months ago. They mostly work (Flatten > is broken in Message2, that's the reason for the good benchmarks) and tries > to use the R5 Message format. I'd like to introduce another BMessage > implementation (Message4). The following shall list the most important > advantages/disadvantages of the four implementations. > > Message1: (original as of now) > + Uses std:: containers > + Supports R5 message format > - Does not scale well > - Slow > > Message2: > + Uses hash tables for lookups > + Supports R5 message format > + Simple > - Extensive amount of runtime variables > - Does not scale that well > - Slow > > Message3: > + Uses hash tables for lookups > + Uses one flat buffer > + Scales very well > + Supports (direct) unflattening of R5 message format > + Fast > - Flattening produces not 100% R5 message format output > - Has many runtime variables that need to be recreated when unflattening > - Complex > > Message4: > + Uses hash tables for lookups > + Uses three flat buffers for header, field table and data > + Scales good for most cases > + Faster, especially for unflattening > + Fairly straight forward > - Does not support R5 message format > - Flattened messages are bigger > > The idea of Message4 was to reduce the amount of runtime variables that need > to be stored on flatten or recreated on unflatten operations. An example of > this would be the hash table. By including the hash table in the flattened > message, the unflatten time is greatly reduced. The flatten concept is to > just write the three flat buffers to the output buffer without any further > processing (except for storing the public what field). For unflatten only > copying the input buffer into the right flat buffers and restoring the what > field is enough. This will achieve good results in flatten/unflatten but also > generates bigger flattened messages. > > Message4 Format: > > <HeaderBuffer> > <MessageHeader> > <Format> > <Flags> > <ReplyInfo> > ... > <Hashtable> > </MessageHeader> > </HeaderBuffer> > > <FieldsBuffer> > <FieldHeader> > <Type> > <Flags> > <NameLength> > <Size> > <Count> > ... > <Offset> > </FieldHeader> > <FieldHeader> > <FieldHeader> > <preallocated space> > </FieldsBuffer> > > <DataBuffer> > <FieldData> > <FieldName> > <DataItem1> > <DataItem2> > <DataItem3> > </FieldData> > <FieldData> > <FieldName> > <DataSize1> > <DataItem1> > <DataSize2> > <DataItem2> > <DataSize3> > <DataItem3> > </FieldData> > <preallocated space> > </DataBuffer> > > Message Header > > The message header is fixed size. It contains a hash table (currently set to > 10 entries = 40 bytes). It is special in that respect, that the hash table > does not contain pointers to the fields, but the index into the FieldHeader > array in the FieldsBuffer. While this is slower, it has the advantage that it > can be flattened and unflattened with the message and still work without > rebuilding it with new pointers. General state information and reply infos > are saved in the header too. These could be stored as members, but I think > it's cleaner to just copy them around inside a buffer. > > Field Headers > > The FieldsBuffer contains an array of FieldHeaders. The FieldHeader itself is > fixed size, since the variable sized name is located in the data buffer and > not in the header. The FieldsBuffer is resized lazily and adding fields costs > only very few cycles. When removing a field that is not at the top though > this implementation is slow, as all the indexes need to be looked through and > decremented if they are bigger than the one of the removed header. > > Field Data > > Each fields data starts with the fieldname. Then continues either with an > array of fixed size items (for fixed size data types) or a sequence of { > ssize_t dataSize; uint8 data[]; } pairs. There is no direct index or offset > table for variable sized data, so looking through variable sized data blocks > does not scale very well. The DataBuffer is resized with a growing block size > (size * 2) up to a maximum of 10 pages (40KB) this should reduce performance > problems with very big messages (archiving) but still keep the waste of > memory low for small messages. These limits can of course be adjusted when > they proof to be inefficient. > > Message4 is implemented as one class (BMessage) by the way. There is no > BMessageBody and BMessageField, as this produces just too much overhead in > the end. The two header types are implemented as structs. It is fairly simple. > > > That is about the design of the format. As you can see, it differs in the > layout compared to R5 where you have a message header in front and then pairs > of field headers and field data. > To provide unflatten compatibility with the R5 message format I would suggest > writing an R5 message reader like the one Axel wrote for Dano messages. > We should also adopt the flatten prototypes of Dano, where you can choose the > output format. For this an R5 and maybe even a Dano message writer can be > implemented. An alternative to this could be to always produce an R5 > compatible output for Flatten, but internally (for messaging) use the native > format. Since we are going to extend BMessage (by new flags or extra > information stored in the header for example) the R5 format will not work > very long. So going the way Be did when it introduced their new message > format in Dano would be a sensible choice I think. > > The implementation of Message4 is pretty much finished. The only thing > missing is actual message delivery. This is because of all the BMessage > "friends" that are involved and are accessing private BMessage data directly. > They need to be adapted to the new message format in that respect, that they > cannot access information in BMessage members. I'm currently working on > extending the private data accessor and removing all the direct friends. As > soon as all private data is accessed through the accessor, messaging should > work. Expect the checkin of Message4 to follow in the next few days. > > Please tell me what you think about this concept. How open is everyone > involved to just switching the BMessage implementation? What is necessary for > Message4 to be used as the default (single) BMessage implementation? > > Benchmarks can be found there: http://haiku.mlotz.ch/messagespeed.html. Bear > in mind that Message2 only wins in flatten/unflatten sometimes because it is > broken and does not store all the fields. The rest of the implementations > should work ok.