[openbeosmediakit] decoder discoveries

From: "Andrew Bachmann" <shatty@xxxxxxxxxxxxx>
To: "openbeos media kit" <openbeosmediakit@xxxxxxxxxxxxx>
Date: Wed, 21 Jan 2004 01:10:02 -0800 PST
Hello all,

I have been working on the media_decoder app to better understand how the 
communication 
between the extractor and decoder occurs in R5.  This all happened because I 
ran into a problem 
because I had an issue with the meta data fields.  (the were not being copied)  
So, I went to R5 to 
see how/if they were copied.  Well, it turns out all fields are copied 
verbatim, with one little 
gotcha.  It seems the _area fields are somehow treated specially if they are 
not set to 
B_BAD_VALUE.  But this email isn't about that.

Initially we used the info field to pass header information to the decoder from 
the extractor.  
We abandoned that because our test case app "media_decoder" was able to succeed 
without 
passing any info in.  (also there was a related issue with the bebook 
description of the info field 
which is code-wise impossible)

So, I changed over to using the meta data field to store this information, 
since the "format" is the 
only argument passed in to the BMediaDecoder.  This works well for the 
ogg/vorbis pair, ogg/
speex, etc., and seems to have no theoretical problems associated with it, with 
the minor issue 
about copying between address spaces, which seems to be able to be solved by 
the conveniently 
located meta_data_area variable.

Well, I thought I had it all figured out when I went to test the copying 
behavior on R5.  Here's 
what I found out:

The format returned from EncodedFormat is fairly spartan.  In the case of mp3, 
it has the 
following fields set:  type, output.frame_rate, output.channel_count, 
output.buffer_size, encoding, 
bitrate.  In the case of ogg, it has the following fields set: type, 
output.frame_rate, 
output.channel_count, output.format, output.byte_order, output.buffer_size, 
encoding.

Notably missing from the above lists are: user_data, meta_data, or any other 
"miscellaneous" 
field where header information can be stored.  From my other observations, it's 
clear that the 
only thing that matters here is really the "encoding" field which maps directly 
to some specific 
decoder.

In media_decoder I used the format from EncodedFormat to create a 
BMediaDecoder.  Because 
of this lack of header information, it is impossible for the BMediaDecoder to 
be completely set 
up after this construction.  That is to say, it is generally unprepared to 
start processing encoded 
data. (header-less formats don't suffer from this but they are beside the point)

The next step in media_decoder is to get the DecodedFormat from the track.  
This returns a 
format of RAW_AUDIO type, with some parameters set in the raw_audio type.  
Perhaps the one 
interesting thing in it is the deny_flags which are set.  Still, none of 
meta_data or user_data, etc., 
are set to any interesting values.

At this point I call SetOutputFormat on the BMediaDecoder.  The decoder still 
lacks header 
information and can not completely initialize.  (a side note: GetNextChunk is 
not called during 
SetOutputFormat)  In my opinion the R5 decoders are in a pretty bad state right 
now.  They have 
only the bare-bones information passed to them from the extractor through a 
handful of fields in 
the media_format, and yet they are asked to negotiate the output format for 
data that they 
haven't even seen the headers for.

But moving on, the next step is to call Decode on the BMediaDecoder.  Remember, 
we still 
haven't seen the headers for the encoded data yet.  Finally, GetNextChunk is 
called on the 
BMediaDecoder.

The input media_header is not the media-header that I provided in my call to 
Decode.  In fact, 
the media_header is a pointer to a bunch of garbage data.  The media_header 
isn't even 
initialized to zeros.  In my current implementation for GetNextChunk I zero it 
immediately.  It 
worked before I added the zeroing, and it works after too.

Next I called BMediaTrack::ReadTrack(....).  For mp3, this set the following 
fields on the 
media_header: type, file_pos, orig_size.  for ogg, it set: type, start_time, 
file_pos, orig_size.  I 
returned the chunk data.

When I BMediaDecoder::Decode returned, it had modified my media_header that I 
provided. 
(remember: this is a different one from the one I saw in my GetNextChunk method)

For mp3 the following fields were set: type, start_time, orig_size.  This means 
that the mp3 
decoder added the start_time for the chunk during decode, because the 
start_time was not 
available earlier after ReadTrack.  This probably contributed to the R5 mp3 
sync problems.  
Perhaps also notable is that the file_pos often does not get incremented 
sometimes, even though 
we advance through start_time.  This seems to be because the file_pos always 
points to the start 
of the chunk that the samples are in.

For ogg the following fields were set: size_used, start_time, file_pos, 
orig_size.  A notable 
difference between ogg and mp3 is that ogg provided the size_used field set to 
the appropriate 
value.  If I had used the size_used field from mp3, I would have dumped a whole 
bunch of 
nothing into my output file.  (my code used the buffer_size from the raw_audio 
format, but using 
size_used should have been okay too, I think)  Like mp3, file_pos is not always 
incremented and 
seems to be related to the start of the chunk the samples are in.

Okay, so at this point you should probably be asking: how did it work?  Perhaps 
we can jump 
into the middle of an mp3 stream and figure out things, but vorbis certainly 
needs codebooks 
from the header in order to perform any kind of sane decode.  It turns out that 
the mp3 reader/
decoder and the ogg reader/vorbis decoder used different strategies.  I 
examined the chunks 
returned from the NextChunk method for the mp3 track and the ogg track.

The ogg track seemed quite odd to me as I was expecting to see ogg packets in 
it.  There were 
none. (!)  In fact, the implementation of the ogg extractor in r5 is to 
completely decode the vorbis 
stream and pass the raw decoded data to the "vorbis decoder".  The chunks that 
I got in 
NextChunk were the same as the chunks that I got out of decode, and the same as 
I was writing 
into my file.

The mp3 track on the other hand, was the exact opposite.  The chunks that came 
through were 
identical to the input file.  (this may make more sense in light of the 
structure of mp3)  So, the 
mp3 decoder was basically operating on an unmodified file stream.

So, in both these cases there was no header communication required between the 
two levels.  I 
think that we may all agree that the ogg/vorbis strategy is not the path that 
we want. (we haven't 
taken it and it's working okay so far)  The approach that we have used that 
involves using the 
meta data is not as invulnerable as placing header parsing and data decoding in 
the same 
module, but it does get us back the power and elegance of re-usability of the 
extractors and 
decoders.  (this is already demonstrated by our ability to do useful things 
with ogg files that have 
some streams we can't yet decode)  Using the meta_data will have our kit be 
compatible with 
media_decoder once I fix the copying of the media_format to preserve meta_data 
on copying, as 
R5 does it.  It will also allow the codecs to give stronger and earlier 
feedback about stream 
properties and their ability to handle them.

additional notes on other formats:

I checked how the decoding works for an mpeg video/audio stream as well.  The 
chunks for the 
audio stream contained only the audio bits, as one would expect.  There was 
still no extra 
information passed through meta_data, etc.  The ReadChunk method set type (to 
ENCODED_VIDEO !), file_pos, orig_size.  The decoder also set type (to 
ENCODED_VIDEO !), 
start_time, file_pos, orig_size.

I checked how the decoding works for an avi video/mp3 stream as well.  The 
chunks for the 
audio stream contained only the audio bits, as one would expect.  There was 
still no extra 
information passed through meta_data, etc.  The ReadChunk method set start_time 
to some 
apparently useless thing (approx -9.223e20), and set file_pos, orig_size.  The 
file_pos was not the 
start of the chunk, may have been the end of the chunk but was not equal to the 
location of the 
data in the file + orig_size either.  The decoder set type, start_time.  It did 
not set file_pos or 
orig_size.

The version of media_decoder in cvs now finds the first audio stream in a file 
and decodes it.

Andrew

P.S. For those of you reading this who may possibly not know what our goals are 
with respect to 
codecs: we are not planning a binary compatible solution for the plugins.  we 
_are_ planning a 
binary and source compatible solution for the public APIs, in accordance with 
the general 
openbeos principle.  This exposition is for informative purposes only, and 
development will 
proceed according to our best judgement of how to implement the behavior 
specified and 
exhibited by the public APIs.  To the extent that we fulfill this goal we may 
or may not choose to 
implement it in a way that resembles the R5 implementation.  This is even more 
true since I 
haven't seen the R5 implementation. :-D
[openbeosmediakit] decoder discoveries

Other related posts: