
|
[openbeosmediakit]
||
[Date Prev]
[01-2004 Date Index]
[Date Next]
||
[Thread Prev]
[01-2004 Thread Index]
[Thread Next]
[openbeosmediakit] decoder discoveries
- From: "Andrew Bachmann" <shatty@xxxxxxxxxxxxx>
- To: "openbeos media kit" <openbeosmediakit@xxxxxxxxxxxxx>
- Date: Wed, 21 Jan 2004 01:10:02 -0800 PST
Hello all,
I have been working on the media_decoder app to better understand how the
communication
between the extractor and decoder occurs in R5. This all happened because I
ran into a problem
because I had an issue with the meta data fields. (the were not being copied)
So, I went to R5 to
see how/if they were copied. Well, it turns out all fields are copied
verbatim, with one little
gotcha. It seems the _area fields are somehow treated specially if they are
not set to
B_BAD_VALUE. But this email isn't about that.
Initially we used the info field to pass header information to the decoder from
the extractor.
We abandoned that because our test case app "media_decoder" was able to succeed
without
passing any info in. (also there was a related issue with the bebook
description of the info field
which is code-wise impossible)
So, I changed over to using the meta data field to store this information,
since the "format" is the
only argument passed in to the BMediaDecoder. This works well for the
ogg/vorbis pair, ogg/
speex, etc., and seems to have no theoretical problems associated with it, with
the minor issue
about copying between address spaces, which seems to be able to be solved by
the conveniently
located meta_data_area variable.
Well, I thought I had it all figured out when I went to test the copying
behavior on R5. Here's
what I found out:
The format returned from EncodedFormat is fairly spartan. In the case of mp3,
it has the
following fields set: type, output.frame_rate, output.channel_count,
output.buffer_size, encoding,
bitrate. In the case of ogg, it has the following fields set: type,
output.frame_rate,
output.channel_count, output.format, output.byte_order, output.buffer_size,
encoding.
Notably missing from the above lists are: user_data, meta_data, or any other
"miscellaneous"
field where header information can be stored. From my other observations, it's
clear that the
only thing that matters here is really the "encoding" field which maps directly
to some specific
decoder.
In media_decoder I used the format from EncodedFormat to create a
BMediaDecoder. Because
of this lack of header information, it is impossible for the BMediaDecoder to
be completely set
up after this construction. That is to say, it is generally unprepared to
start processing encoded
data. (header-less formats don't suffer from this but they are beside the point)
The next step in media_decoder is to get the DecodedFormat from the track.
This returns a
format of RAW_AUDIO type, with some parameters set in the raw_audio type.
Perhaps the one
interesting thing in it is the deny_flags which are set. Still, none of
meta_data or user_data, etc.,
are set to any interesting values.
At this point I call SetOutputFormat on the BMediaDecoder. The decoder still
lacks header
information and can not completely initialize. (a side note: GetNextChunk is
not called during
SetOutputFormat) In my opinion the R5 decoders are in a pretty bad state right
now. They have
only the bare-bones information passed to them from the extractor through a
handful of fields in
the media_format, and yet they are asked to negotiate the output format for
data that they
haven't even seen the headers for.
But moving on, the next step is to call Decode on the BMediaDecoder. Remember,
we still
haven't seen the headers for the encoded data yet. Finally, GetNextChunk is
called on the
BMediaDecoder.
The input media_header is not the media-header that I provided in my call to
Decode. In fact,
the media_header is a pointer to a bunch of garbage data. The media_header
isn't even
initialized to zeros. In my current implementation for GetNextChunk I zero it
immediately. It
worked before I added the zeroing, and it works after too.
Next I called BMediaTrack::ReadTrack(....). For mp3, this set the following
fields on the
media_header: type, file_pos, orig_size. for ogg, it set: type, start_time,
file_pos, orig_size. I
returned the chunk data.
When I BMediaDecoder::Decode returned, it had modified my media_header that I
provided.
(remember: this is a different one from the one I saw in my GetNextChunk method)
For mp3 the following fields were set: type, start_time, orig_size. This means
that the mp3
decoder added the start_time for the chunk during decode, because the
start_time was not
available earlier after ReadTrack. This probably contributed to the R5 mp3
sync problems.
Perhaps also notable is that the file_pos often does not get incremented
sometimes, even though
we advance through start_time. This seems to be because the file_pos always
points to the start
of the chunk that the samples are in.
For ogg the following fields were set: size_used, start_time, file_pos,
orig_size. A notable
difference between ogg and mp3 is that ogg provided the size_used field set to
the appropriate
value. If I had used the size_used field from mp3, I would have dumped a whole
bunch of
nothing into my output file. (my code used the buffer_size from the raw_audio
format, but using
size_used should have been okay too, I think) Like mp3, file_pos is not always
incremented and
seems to be related to the start of the chunk the samples are in.
Okay, so at this point you should probably be asking: how did it work? Perhaps
we can jump
into the middle of an mp3 stream and figure out things, but vorbis certainly
needs codebooks
from the header in order to perform any kind of sane decode. It turns out that
the mp3 reader/
decoder and the ogg reader/vorbis decoder used different strategies. I
examined the chunks
returned from the NextChunk method for the mp3 track and the ogg track.
The ogg track seemed quite odd to me as I was expecting to see ogg packets in
it. There were
none. (!) In fact, the implementation of the ogg extractor in r5 is to
completely decode the vorbis
stream and pass the raw decoded data to the "vorbis decoder". The chunks that
I got in
NextChunk were the same as the chunks that I got out of decode, and the same as
I was writing
into my file.
The mp3 track on the other hand, was the exact opposite. The chunks that came
through were
identical to the input file. (this may make more sense in light of the
structure of mp3) So, the
mp3 decoder was basically operating on an unmodified file stream.
So, in both these cases there was no header communication required between the
two levels. I
think that we may all agree that the ogg/vorbis strategy is not the path that
we want. (we haven't
taken it and it's working okay so far) The approach that we have used that
involves using the
meta data is not as invulnerable as placing header parsing and data decoding in
the same
module, but it does get us back the power and elegance of re-usability of the
extractors and
decoders. (this is already demonstrated by our ability to do useful things
with ogg files that have
some streams we can't yet decode) Using the meta_data will have our kit be
compatible with
media_decoder once I fix the copying of the media_format to preserve meta_data
on copying, as
R5 does it. It will also allow the codecs to give stronger and earlier
feedback about stream
properties and their ability to handle them.
additional notes on other formats:
I checked how the decoding works for an mpeg video/audio stream as well. The
chunks for the
audio stream contained only the audio bits, as one would expect. There was
still no extra
information passed through meta_data, etc. The ReadChunk method set type (to
ENCODED_VIDEO !), file_pos, orig_size. The decoder also set type (to
ENCODED_VIDEO !),
start_time, file_pos, orig_size.
I checked how the decoding works for an avi video/mp3 stream as well. The
chunks for the
audio stream contained only the audio bits, as one would expect. There was
still no extra
information passed through meta_data, etc. The ReadChunk method set start_time
to some
apparently useless thing (approx -9.223e20), and set file_pos, orig_size. The
file_pos was not the
start of the chunk, may have been the end of the chunk but was not equal to the
location of the
data in the file + orig_size either. The decoder set type, start_time. It did
not set file_pos or
orig_size.
The version of media_decoder in cvs now finds the first audio stream in a file
and decodes it.
Andrew
P.S. For those of you reading this who may possibly not know what our goals are
with respect to
codecs: we are not planning a binary compatible solution for the plugins. we
_are_ planning a
binary and source compatible solution for the public APIs, in accordance with
the general
openbeos principle. This exposition is for informative purposes only, and
development will
proceed according to our best judgement of how to implement the behavior
specified and
exhibited by the public APIs. To the extent that we fulfill this goal we may
or may not choose to
implement it in a way that resembles the R5 implementation. This is even more
true since I
haven't seen the R5 implementation. :-D
|

|