[opendtv] Re: PR: Analog Devices' JPEG2000 IC Enables Wireless High-Definition Video Distribution in the Home
- From: Craig Birkmaier <craig@xxxxxxxxx>
- To: opendtv@xxxxxxxxxxxxx
- Date: Fri, 26 Aug 2005 08:51:06 -0400
At 3:50 PM -0400 8/25/05, Manfredi, Albert E wrote:
>
>It would indeed be interesting to see comparisons of
>compression efficiencies. However, I don't think a
>wavelet-based MPEG algorithm would use block base
>coding at all.
Bert is technically correct that there are alternatives to block
based motion compensated prediction. A variety of interframe
techniques have been attempted with wavelets, but they do not provide
the performance improvement that can be realized with motion
compensated prediction.
>MPEG now uses blocks, because that's what the DCT or
>integer transform produce. So if you start out with
>blocks at the I frames, you use a block-based
>approach for the B and P frames.
No, MPEG uses blocks because they provide the lowest computational
complexity for an affordable motion compensated prediction scheme. In
truth, the first two MPEG standards did not do anything that
approaches a true motion compensated prediction technique. They are
just crude block matching routines, which make no attempt to actually
track moving objects. If a block that is unrelated to a moving object
provides a closer match, it will be used rather than the block that
actually contains the pixels in that object. MPEG-4 part 2 video and
the follow-on H.264 (part 10) essentially improve the block matching
algorithms, providing much finer (sub-pixel) positioning of blocks to
better match the actual frame from which the prediction is subtracted.
There are other ways to do motion compensated prediction. You could
do it on a pixel by pixel basis, or attempt to identify actual
objects, then track their motion through the frame. Even with more
advanced techniques the resulting predictions may still have errors,
but the whole point of making predictions is to minimize the
differences from the actual frame - it is the difference information
that is encoded and takes up the largest number of bits. The blocks
used for those predictions come from frames that are already in the
decoder's memory, so the only overhead for B frames is the motion
vectors ( and the encoded differences).
Wavelets can be used to encode the differences, but this is not very
efficient, as the entire concept behind wavelets is to run the
transform (filters) on the entire image, producing a series of
wavelets that represent decreasing frequency information. Running the
wavelet transform at an 8 x 8 or 16 x 16 block size is less efficient
than using a localized transform such as the DCT, and adds another
process to a wavelet based interframe algorithm.
>If instead your I frames are based on frequency windows,
>you create the predictions or interpolations based on
>frequency windows, not blocks. You simply adjust the
>strategy based on the structure of the I frame.
If you can actually do this Bert, you could be a rich man. The
problem is that you do not get the kind of information out of the
wavelet transform that you need. Each decomposition produces a
wavelet that contains a lower frequency basis for the ENTIRE image.
Consider a simple example such as a 640 x 480 image:
The original image has 640 x 480 samples. The first wavelet transform
is run in the horizontal domain, producing a sub-band that is 320 x
480, concentrating the lower frequency horizontal information into
that sub-band. The second pass is run in the vertical domain which
produces a sub-band that is 320 x 240 with both with equal H & V
detail. With a bit of subtraction, we are left with three sub-bands
that represent the differences between the new 320 x 240 image and
the original 640 x 480 image; one contains vertical detail
differences, one contains horizontal detail differences, and the
third diagonal detail differences. You can continue to run the
wavelet transform concentrating the low frequency information into
smaller and smaller rasters. The good news is that each decimation
produces a properly filtered image for that raster size. Thus you can
reconstruct the image to any resolution (up to the original),
ignoring the higher frequency sub-band data.
Unfortunately, these decimations do not provide much that is useful
for motion concentrated prediction. The low frequency sub-bands may
contain all of the objects, but all of the needed edge information is
contained in the highest frequency sub-band. So you are back to using
all of the image data to make useful predictions.
What wavelets can do in the temporal domain is to
concentrate/differentiate the areas of the image where there is
little or no movement versus those where there is movement. You can
think of this as a three dimensional cube with H and V spatial
frequencies on two axis and time on the third axis. Within this cube
you will see the concentration of the low frequency basis for each
frame and the changes over time along the Z (time axis). To my
knowledge, there has only been limited success using these changes
between frames to make useful motion compensated predictions.
>For example, perhaps the low frequency images show less
>variability than the higher frequency images, and can
>more easily be predicted or interpolated, or in any
>event don't need to be transmitted as often as the
>higher frequency components of the frame.
Perhaps you are correct. However, as they say, the devil is in the
details. To be sure it will be easier to do the motion compensated
predictions on one of the lower frequency basis images, but the
result will be a prediction with no high frequency edge information.
Why do you think the MPEG camp has been adding things like sub pixel
positioning to newer algorithms? Answer: because most of the image
detail that we need to see is in those high frequency edges. The
major difference between wavelets and block based frequency
transforms actually is related to the failure modes.
If we did not need to quantize any high frequency coefficients for a
DCT based algorithm, or to quantize the highest frequency sub-bands
for the wavelet algorithm we could always reconstruct the original
image. But the reality is that we do need to quantize this stuff in
order to realize any compression efficiency. It is the artifacts of
quantization that provide the most meaningful differences between the
algorithms. What follows is an excerpt from an article I published in
1994 as part of a Videography series on video compression.
> A technical report authored by John Huffman, Willian Zettler and
>David Linden of Aware, compares the image defects introduced by the
>fourier-based (DCT) and wavelet-based compression processes.
> "Fourier-based spectral techniques tend to produce aliasing, or
>periodic error, since the frequency spectrum itself is distorted,"
>they write, "whereas wavelet methods tend to produce random noise.
>Random noise is far less offensive to the human visual system than
>aliasing noise."
> Huffman elaborated on this weakness in DCT-based compression
>systems in a paper presented to the Fall SMPTE conference :
> "Fourier techniques such as the DCT can exploit the general
>low-frequency characteristics of an image. They do so, however, at
>the expense of the edge information, which is spread out across the
>frequency spectrum. A hard edge becomes much more complicated when
>represented by its spectrum due to the localized infinite frequency
>of the edge itself. An uncompensated quantization error in one band
>will reverberate throughout the spatial domain in which it has been
>constrained to operate--causing the 'ringing' so familiar in imagery
>compressed with these methods."
> Translating for the less technically inclined, what Huffman said
>is that the DCT does not introduce artifacts--it's the quantization
>of DCT coefficients that introduces errors within the blocks of the
>reconstructed image. Or in really graphic terms, if you try to code
>natural images that contain many high frequency edges, or synthetic
>imagery such as text, graphics and computer animation, quantization
>errors may come back to <I>byte<P> you.
> The presence of high frequency transitions within a DCT coding
>block--such as those that occur in text and graphics--influences all
>of the coefficients within that block. To properly decompress the
>image without artifacts each coefficient must be restored to its
>original value. Unfortunately, many coefficients are modified in the
>quantization process. The result is the periodic disturbance of the
>pixels around the high frequency transition, sometimes referred to
>as mosquito noise.
So in the end, we have two major issues:
The influence of the transform on resulting image quality when the
algorithm is stressed;
The accuracy of the interframe predictions which determines just how
much data must be encoded to represent the differences between the
predictions and the actual frames.( When there is too much difference
information something has to give - this is when we start to see
problems...
With block based DCT algorithms we tend to see edge distortions which
ultimately break down into macroblocking,
With wavelets we tend to see modulation of resolution - i.e. the
image will get softer or sharper, depending on how much of the
sub-band information must be quantized away.
In ANY case, I'm still waiting for Bert to come up with a more
efficient (and or affordable) method of motion compensated prediction
than block based motion compensated predictions.
>So an efficient moving image algorithm which uses
>the DWT as its basis can be created. One which does not
>simply repeat the entire DWT every single time.
>
Go for it Bert...
you could change the world of video encoding if you can figure this one out!
Regards
Craig
P.S. There have been successful combinations of DWT and block based
prediction. The problem is that this adds significant overhead. IF
you are going to use block based prediction, it is generally more
cost effective to use block based frequency transforms.
----------------------------------------------------------------------
You can UNSUBSCRIBE from the OpenDTV list in two ways:
- Using the UNSUBSCRIBE command in your user configuration settings at
FreeLists.org
- By sending a message to: opendtv-request@xxxxxxxxxxxxx with the word
unsubscribe in the subject line.
Other related posts: