[opendtv] Re: PR: Analog Devices' JPEG2000 IC Enables Wireless High-Definition Video Distribution in the Home

At 3:50 PM -0400 8/25/05, Manfredi, Albert E wrote:
>
>It would indeed be interesting to see comparisons of
>compression efficiencies. However, I don't think a
>wavelet-based MPEG algorithm would use block base
>coding at all.

Bert is technically correct that there are alternatives to block 
based motion compensated prediction. A variety of interframe 
techniques have been attempted with wavelets, but they do not provide 
the performance improvement that can be realized with motion 
compensated prediction.


>MPEG now uses blocks, because that's what the DCT or
>integer transform produce. So if you start out with
>blocks at the I frames, you use a block-based
>approach for the B and P frames.

No, MPEG uses blocks because they provide the lowest computational 
complexity for an affordable motion compensated prediction scheme. In 
truth, the first two MPEG standards did not do anything that 
approaches a true motion compensated prediction technique.  They are 
just crude block matching routines, which make no attempt to actually 
track moving objects. If a block that is unrelated to a moving object 
provides a closer match, it will be used rather than the block that 
actually contains the pixels in that object. MPEG-4 part 2 video and 
the follow-on H.264 (part 10) essentially improve the block matching 
algorithms, providing much finer (sub-pixel) positioning of blocks to 
better match the actual frame from which the prediction is subtracted.

There are other ways to do motion compensated prediction. You could 
do it on a pixel by pixel basis, or attempt to identify actual 
objects, then track their motion through the frame. Even with more 
advanced techniques the resulting predictions may still have errors, 
but the whole point of making predictions is to minimize the 
differences from the actual frame - it is the difference information 
that is encoded and takes up the largest number of bits. The blocks 
used for those predictions come from frames that are already in the 
decoder's memory, so the only overhead for B frames is the motion 
vectors ( and the encoded differences).

Wavelets can be used to encode the differences, but this is not very 
efficient, as the entire concept behind wavelets is to run the 
transform (filters) on the entire image, producing a series of 
wavelets that represent decreasing frequency information. Running the 
wavelet transform at an 8 x 8 or 16 x 16 block size is less efficient 
than using a localized transform such as the DCT, and adds another 
process to a wavelet based interframe algorithm.

>If instead your I frames are based on frequency windows,
>you create the predictions or interpolations based on
>frequency windows, not blocks. You simply adjust the
>strategy based on the structure of the I frame.

If you can actually do this Bert, you could be a rich man. The 
problem is that you do not get the kind of information out of the 
wavelet transform that you need. Each decomposition produces a 
wavelet that contains a lower frequency basis for the ENTIRE image. 
Consider a simple example such as a 640 x 480 image:

The original image has 640 x 480 samples. The first wavelet transform 
is run in the horizontal domain, producing a sub-band that is 320 x 
480, concentrating the lower frequency horizontal information into 
that sub-band. The second pass is run in the vertical domain which 
produces a sub-band that is 320 x 240 with both with equal H & V 
detail. With a bit of subtraction, we are left with three sub-bands 
that represent the differences between the new 320 x 240 image and 
the original 640 x 480 image; one contains vertical detail 
differences, one contains horizontal detail differences, and the 
third diagonal detail differences. You can continue to run the 
wavelet transform concentrating the low frequency information into 
smaller and smaller rasters. The good news is that each decimation 
produces a properly filtered image for that raster size. Thus you can 
reconstruct the image to any resolution (up to the original), 
ignoring the higher frequency sub-band data.

Unfortunately, these decimations do not provide much that is useful 
for motion concentrated prediction. The low frequency sub-bands may 
contain all of the objects, but all of the needed edge information is 
contained in the highest frequency sub-band. So you are back to using 
all of the image data to make useful predictions.

What wavelets can do in the temporal domain is to 
concentrate/differentiate the areas of the image where there is 
little or no movement versus those where there is movement. You can 
think of this as a three dimensional cube with H and V spatial 
frequencies on two axis and time on the third axis. Within this cube 
you will see the concentration of the low frequency basis for each 
frame and the changes over time along the Z (time axis). To my 
knowledge, there has only been limited success using these changes 
between frames to make useful motion compensated predictions.

>For example, perhaps the low frequency images show less
>variability than the higher frequency images, and can
>more easily be predicted or interpolated, or in any
>event don't need to be transmitted as often as the
>higher frequency components of the frame.

Perhaps you are correct. However, as they say, the devil is in the 
details. To be sure it will be easier to do the motion compensated 
predictions on one of the lower frequency basis images, but the 
result will be a prediction with no high frequency edge information.

Why do you think the MPEG camp has been adding things like sub pixel 
positioning to newer algorithms? Answer: because most of the image 
detail that we need to see is in those high frequency edges.  The 
major difference between wavelets and block based frequency 
transforms actually is related to the failure modes.

If we did not need to quantize any high frequency coefficients for a 
DCT based algorithm, or to quantize the highest frequency sub-bands 
for the wavelet algorithm we could always reconstruct the original 
image. But the reality is that we do need to quantize this stuff in 
order to realize any compression efficiency. It is the artifacts of 
quantization that provide the most meaningful differences between the 
algorithms. What follows is an excerpt from an article I published in 
1994 as part of a Videography series on video compression.

>   A technical report authored by John Huffman, Willian Zettler and 
>David Linden of Aware, compares the image defects introduced by the 
>fourier-based (DCT) and wavelet-based compression processes.
>   "Fourier-based spectral techniques tend to produce aliasing, or 
>periodic error, since the frequency spectrum itself is distorted," 
>they write, "whereas wavelet methods tend to produce random noise. 
>Random noise is far less offensive to the human visual system than 
>aliasing noise."
>   Huffman elaborated on this weakness in DCT-based compression 
>systems in a paper presented to the Fall SMPTE conference :
>   "Fourier techniques such as the DCT can exploit the general 
>low-frequency characteristics of an image. They do so, however, at 
>the expense of the edge information, which is spread out across the 
>frequency spectrum. A hard edge becomes much more complicated when 
>represented by its spectrum due to the localized infinite frequency 
>of the edge itself. An uncompensated quantization error in one band 
>will reverberate throughout the spatial domain in which it has been 
>constrained to operate--causing the 'ringing' so familiar in imagery 
>compressed with these methods."
>   Translating for the less technically inclined, what Huffman said 
>is that the DCT does not introduce artifacts--it's the quantization 
>of DCT coefficients that introduces errors within the blocks of the 
>reconstructed image. Or in really graphic terms, if you try to code 
>natural images that contain many high frequency edges, or synthetic 
>imagery such as text, graphics and computer animation, quantization 
>errors may come back to <I>byte<P> you.
>   The presence of high frequency transitions within a DCT coding 
>block--such as those that occur in text and graphics--influences all 
>of the coefficients within that block. To properly decompress the 
>image without artifacts each coefficient must be restored to its 
>original value. Unfortunately, many coefficients are modified in the 
>quantization process. The result is the periodic disturbance of the 
>pixels around the high frequency transition, sometimes referred to 
>as mosquito noise.

So in the end, we have two major issues:

The influence of the transform on resulting image quality when the 
algorithm is stressed;

The accuracy of the interframe predictions which determines just how 
much data must be encoded to represent the differences between the 
predictions and the actual frames.( When there is too much difference 
information something has to give - this is when we start to see 
problems...

With block based DCT algorithms we tend to see edge distortions which 
ultimately break down into macroblocking,

With wavelets we tend to see modulation of resolution - i.e. the 
image will get softer or sharper, depending on how much of the 
sub-band information must be quantized away.

In ANY case, I'm still waiting for Bert to come up with a more 
efficient (and or affordable) method of motion compensated prediction 
than block based motion compensated predictions.


>So an efficient moving image algorithm which uses
>the DWT as its basis can be created. One which does not
>simply repeat the entire DWT every single time.
>

Go for it Bert...

you could change the world of video encoding if you can figure this one out!

Regards
Craig

P.S. There have been successful combinations of DWT and block based 
prediction. The problem is that this adds significant overhead. IF 
you are going to use block based prediction, it is generally more 
cost effective to use block based frequency transforms.

 
 
----------------------------------------------------------------------
You can UNSUBSCRIBE from the OpenDTV list in two ways:

- Using the UNSUBSCRIBE command in your user configuration settings at 
FreeLists.org 

- By sending a message to: opendtv-request@xxxxxxxxxxxxx with the word 
unsubscribe in the subject line.

Other related posts: