Hello all, I have been working on the media_decoder app to better understand how the communication between the extractor and decoder occurs in R5. This all happened because I ran into a problem because I had an issue with the meta data fields. (the were not being copied) So, I went to R5 to see how/if they were copied. Well, it turns out all fields are copied verbatim, with one little gotcha. It seems the _area fields are somehow treated specially if they are not set to B_BAD_VALUE. But this email isn't about that. Initially we used the info field to pass header information to the decoder from the extractor. We abandoned that because our test case app "media_decoder" was able to succeed without passing any info in. (also there was a related issue with the bebook description of the info field which is code-wise impossible) So, I changed over to using the meta data field to store this information, since the "format" is the only argument passed in to the BMediaDecoder. This works well for the ogg/vorbis pair, ogg/ speex, etc., and seems to have no theoretical problems associated with it, with the minor issue about copying between address spaces, which seems to be able to be solved by the conveniently located meta_data_area variable. Well, I thought I had it all figured out when I went to test the copying behavior on R5. Here's what I found out: The format returned from EncodedFormat is fairly spartan. In the case of mp3, it has the following fields set: type, output.frame_rate, output.channel_count, output.buffer_size, encoding, bitrate. In the case of ogg, it has the following fields set: type, output.frame_rate, output.channel_count, output.format, output.byte_order, output.buffer_size, encoding. Notably missing from the above lists are: user_data, meta_data, or any other "miscellaneous" field where header information can be stored. From my other observations, it's clear that the only thing that matters here is really the "encoding" field which maps directly to some specific decoder. In media_decoder I used the format from EncodedFormat to create a BMediaDecoder. Because of this lack of header information, it is impossible for the BMediaDecoder to be completely set up after this construction. That is to say, it is generally unprepared to start processing encoded data. (header-less formats don't suffer from this but they are beside the point) The next step in media_decoder is to get the DecodedFormat from the track. This returns a format of RAW_AUDIO type, with some parameters set in the raw_audio type. Perhaps the one interesting thing in it is the deny_flags which are set. Still, none of meta_data or user_data, etc., are set to any interesting values. At this point I call SetOutputFormat on the BMediaDecoder. The decoder still lacks header information and can not completely initialize. (a side note: GetNextChunk is not called during SetOutputFormat) In my opinion the R5 decoders are in a pretty bad state right now. They have only the bare-bones information passed to them from the extractor through a handful of fields in the media_format, and yet they are asked to negotiate the output format for data that they haven't even seen the headers for. But moving on, the next step is to call Decode on the BMediaDecoder. Remember, we still haven't seen the headers for the encoded data yet. Finally, GetNextChunk is called on the BMediaDecoder. The input media_header is not the media-header that I provided in my call to Decode. In fact, the media_header is a pointer to a bunch of garbage data. The media_header isn't even initialized to zeros. In my current implementation for GetNextChunk I zero it immediately. It worked before I added the zeroing, and it works after too. Next I called BMediaTrack::ReadTrack(....). For mp3, this set the following fields on the media_header: type, file_pos, orig_size. for ogg, it set: type, start_time, file_pos, orig_size. I returned the chunk data. When I BMediaDecoder::Decode returned, it had modified my media_header that I provided. (remember: this is a different one from the one I saw in my GetNextChunk method) For mp3 the following fields were set: type, start_time, orig_size. This means that the mp3 decoder added the start_time for the chunk during decode, because the start_time was not available earlier after ReadTrack. This probably contributed to the R5 mp3 sync problems. Perhaps also notable is that the file_pos often does not get incremented sometimes, even though we advance through start_time. This seems to be because the file_pos always points to the start of the chunk that the samples are in. For ogg the following fields were set: size_used, start_time, file_pos, orig_size. A notable difference between ogg and mp3 is that ogg provided the size_used field set to the appropriate value. If I had used the size_used field from mp3, I would have dumped a whole bunch of nothing into my output file. (my code used the buffer_size from the raw_audio format, but using size_used should have been okay too, I think) Like mp3, file_pos is not always incremented and seems to be related to the start of the chunk the samples are in. Okay, so at this point you should probably be asking: how did it work? Perhaps we can jump into the middle of an mp3 stream and figure out things, but vorbis certainly needs codebooks from the header in order to perform any kind of sane decode. It turns out that the mp3 reader/ decoder and the ogg reader/vorbis decoder used different strategies. I examined the chunks returned from the NextChunk method for the mp3 track and the ogg track. The ogg track seemed quite odd to me as I was expecting to see ogg packets in it. There were none. (!) In fact, the implementation of the ogg extractor in r5 is to completely decode the vorbis stream and pass the raw decoded data to the "vorbis decoder". The chunks that I got in NextChunk were the same as the chunks that I got out of decode, and the same as I was writing into my file. The mp3 track on the other hand, was the exact opposite. The chunks that came through were identical to the input file. (this may make more sense in light of the structure of mp3) So, the mp3 decoder was basically operating on an unmodified file stream. So, in both these cases there was no header communication required between the two levels. I think that we may all agree that the ogg/vorbis strategy is not the path that we want. (we haven't taken it and it's working okay so far) The approach that we have used that involves using the meta data is not as invulnerable as placing header parsing and data decoding in the same module, but it does get us back the power and elegance of re-usability of the extractors and decoders. (this is already demonstrated by our ability to do useful things with ogg files that have some streams we can't yet decode) Using the meta_data will have our kit be compatible with media_decoder once I fix the copying of the media_format to preserve meta_data on copying, as R5 does it. It will also allow the codecs to give stronger and earlier feedback about stream properties and their ability to handle them. additional notes on other formats: I checked how the decoding works for an mpeg video/audio stream as well. The chunks for the audio stream contained only the audio bits, as one would expect. There was still no extra information passed through meta_data, etc. The ReadChunk method set type (to ENCODED_VIDEO !), file_pos, orig_size. The decoder also set type (to ENCODED_VIDEO !), start_time, file_pos, orig_size. I checked how the decoding works for an avi video/mp3 stream as well. The chunks for the audio stream contained only the audio bits, as one would expect. There was still no extra information passed through meta_data, etc. The ReadChunk method set start_time to some apparently useless thing (approx -9.223e20), and set file_pos, orig_size. The file_pos was not the start of the chunk, may have been the end of the chunk but was not equal to the location of the data in the file + orig_size either. The decoder set type, start_time. It did not set file_pos or orig_size. The version of media_decoder in cvs now finds the first audio stream in a file and decodes it. Andrew P.S. For those of you reading this who may possibly not know what our goals are with respect to codecs: we are not planning a binary compatible solution for the plugins. we _are_ planning a binary and source compatible solution for the public APIs, in accordance with the general openbeos principle. This exposition is for informative purposes only, and development will proceed according to our best judgement of how to implement the behavior specified and exhibited by the public APIs. To the extent that we fulfill this goal we may or may not choose to implement it in a way that resembles the R5 implementation. This is even more true since I haven't seen the R5 implementation. :-D