[opendtv] Re: 20060117 Mark's (Almost) Monday Memo

From: Craig Birkmaier <craig@xxxxxxxxx>
To: opendtv@xxxxxxxxxxxxx
Date: Fri, 20 Jan 2006 09:51:44 -0500
At 8:23 AM -0500 1/19/06, John Shutt wrote:
>Is there a Moore's Law regarding codec efficiency, or is there a theoretical
>limit?  I mean it seems to be impossible to represent an entire 1920x1080
>frame with a single bit (unless the entire screen is monotone), so there
>must be a theoretical limit as to how much you can compress an image and
>still have it be a practical display.

An excellent question John. With your permission, I may incorporate 
it into my column on video compression for the March issue of BE.

And i am looking forward to other responses: this could be an 
interesting thread!

Moore's law is certainly a factor, as it provides some indication of 
what we can expect in terms of computational resources for video 
compression algorithms in the future. Equally important, it can help 
us predict the resources that will be available in low cost consumer 
appliances in the future.

As a starting point, it is important to look at both ends of the 
problem. That is, how does the ongoing progression in computation 
power impact the encoding of content, and how does it impact the 
decoding of content.

The philosophy behind most compression codecs today is that the 
encoder can be very complex, but it needs to produce a bitstream that 
can be decoded by devices with much lower complexity. It is helpful 
to look at the overall resources and complexity of modern set-top 
boxes and image processing engines in integrated appliances. This 
extends well beyond the resources that are available for video 
decoding.

For example, a few years ago most of these products had very limited 
support for local graphics and run time engines for Java and other 
applications needed to deliver enhanced services. Today we are seeing 
the same graphics engines (GPUs) that are designed into PCs, making 
their way into STBs and integrated receivers. The point I am trying 
to make, is that the problem is much larger than just encoding audio 
and video streams. We are moving into an era where the "receiver" 
will be used for localization and customization of the content that 
we view; thus the decoder and local image processing complexity will 
increase significantly. As this happens it opens up new possibilities 
for the ways in which content is encoded.

MPEG-4 provides an excellent example. Not part 2 (the original video 
codec), or part 10 (AVC/H.264), but rather the entire specification. 
We have discussed recently the notion of picture elements not being 
included (e.g. the ball in a football match). But we have not spent 
much time talking about the fact that the MPEG-4 spec can achieve 
huge gains in compression efficiency by dealing with picture objects 
that are composited in the receiver.

This aspect of MPEG-4 has not been exploited, in part because of the 
computational complexity for a receiver, and in part because of the 
complexity of extracting the objects from a "flattened composition," 
otherwise known as a finished linear video program. But much of what 
is needed to exploit the object composition model for MPEG-4 already 
exists in the production systems we use today. IF we keep track of 
all of the program elements (video, audio, graphics, 3D, etc) and use 
the metadata created by the NLE/compositing systems, we have 
virtually everything needed to produce an MPEG-4 composition. We can 
do this as simple as telling a receiver to fade the video stream to 
black, or to cross dissolve to a new video stream - both of these 
simple production techniques reek havoc with pixel based video 
encoding systems.

Jeroen has provided many glimpses inside the work being done by 
Philips to enhance the presentation of video on modern display 
systems. I'm certain he could tell us many tales about Natural 
Motion, and computational complexity behind frame rate conversions. 
So a good way to look at the problem of video encoding, is to 
consider all of the potential paths that exist to predict what future 
frames will look like. Prediction is the biggest leverage we have in 
terms of gaining compression efficiency.

Mark Schubin will tell you that even with virtually unlimited 
computation resources, we still have a difficult time building a 
"transparent" video standards converter, and de-interlacing 
algorithms still have a difficult time predicting what the 
information lost to interlaced acquisition looks like.

MPEG-2 and AVC are still VERY CRUDE in terms of the prediction 
routines that are used for motion compensated prediction. The reality 
behind AVC is that it simply provide better granularity for many of 
the block matching tools used in MPEG-2. For example, we have more 
control over the positioning of blocks and the precision of motion 
vectors. We have better ways of representing and quantizing the the 
information inside the blocks. And we have new tools to mask errors.

With few exceptions, we have not even begun to explore real motion 
compensated prediction in compression algorithms, and for good reason 
- computational complexity. MPEG-2 and MPEG-4 do not identify and 
track objects - they just try to find the most efficient block 
matches, which may (or may not) have any relationship to the actual 
objects and motion vectors. Perhaps the next big step will be to do 
real motion compensated predictions, but this approach is incredibly 
complex. We capture images on a 2D image plane, but the objects exist 
in 3D space. Thus simple issues like an object moving closer to or 
away from the camera make good motion compensated prediction more 
difficult. Now add plastic deformations and reflections into the mix 
and the calculations go through the roof. How do you predict what a 
running back looks like in 3D, or how he is deformed by a 250 pound 
linebacker? How do you deal with reflections from 3D objects and 
surfaces, when the information in the reflections is also changing.

In short, as with Moore's Law, we are nowhere near the theoretical 
limits for improvement. The reality is that each step in the Moore's 
Law progression enables us to add refinements to compression 
algorithms, and to add more resources in the decoder to enable new 
ways to create and encode content.

So the real issue is how to build evolution into the standards that 
we use to deliver digital services to the masses. We are now seeing 
that many DTV deployments - based on standards that are now a decade 
or more old are unable to deal with extensibility. If we add AVC to 
ATSC or DVB, the existing deployed receivers must be replaced to work 
with the new services.

As we move to more programmable receivers, we may be able to extend 
their useful life  by several years, but periodic upgrades are going 
to be a fact of life, which is probably the MOST COMPELLING argument 
for keeping the receiver/image processor separate from an expensive 
big screen monitor. Obviously we will also have integrated products, 
and these product may have a very limited run before they are 
upgraded. Consider what Apple has done with the iPOD - old iPODs are 
not rendered obsolete, but new capabilities are constantly being 
added, providing consumers with an incentive to upgrade.

In the emerging digital world, it is not the standards that are the 
primary drivers - it is the service that can be delivered that are 
the driving force, and the perceived value of these services by the 
consumers. In the early days of the PC revolution one could justify 
replacing the CPU every 18-24 months based on productivity alone. Now 
a PC can remain useful for 5 years or more; to motivate upgrades new 
PCs must deliver new services - hence the interest in the Family room.

>If so, then how far away from that theoretical limit is MPEG4/AVC?  Is
>MPEG4/AVC to the point that it really could be a standard that could last
>for 20 years?

How long a standard lasts is not the real issue here. JPEG was 
standardized around 1989. It has undergone several updates, and the 
JPEG-2000 standard has almost nothing in common with the original 
algorithm. But we will expect appliances to deal with the original 
JPEG standard for many decades, even as we use newer algorithms to 
encode still images in the future.

The real issue is extensibility - building upon what has come before.

The problem comes when we deploy closed systems with no provisions 
for extensibility, as has been the case for virtually all of the 
first generation of DTV standards. This problem is mitigated in part 
by keeping the volatile components separate from the less volatile, 
and cheap.  I doubt that many people in the U.K will be upset about 
buying a new Freeview receiver when they buy an HDTV and want to view 
HD content. The old box will continue to function on the old TV, at 
least until the decision is made to stop using MPEG-2.

>
>Personally, I have no quarrel with Europe's "problem" about obsolete MPEG2
>receivers.  They rolled out digital using very inexpensive boxes, and can
>slowly starve them of bits to make room for AVC simulcasts in HD.  Just as
>DVB-T allows an almost continuous sliding scale of bitrates vs. robustness,
>there is an inherent sliding scale of SD quality vs. HD quality and/or
>number of HD services.

Yup! They have taken a pragmatic approach, even as they have made 
concessions to the broadcast community. Most broadcasters in Europe 
had upgraded to digital SD before the launch of DTV. They understood 
that the public would see a significant improvement in picture 
quality without having to force everyone to buy a new TV. And now 
they are prepared to take advantage of the cost reductions and 
improvements in technology as they launch HDTV services.

Unfortunately, in the U.S. for political reasons the broadcasters 
attempted to leapfrog a generation. The result has been a very slow 
start and a system that is already out of date. This was ENTIRELY 
predictable.

I think is is foolish for anyone in the television business to think 
in terms of locking down technology for extended periods of time. The 
right approach is to design in extensibility, and decide when the 
time has come to start over again. My guess is that we can expect no 
more than 10-15 years from a product before it will be more efficient 
(cheaper) to start over.

>
>Only those who need HD will have to replace their tuners, and in many cases
>the tuner will be built into the display or else what is another $300US on
>top of a $2,000US HD display?

Why $300? Bert tells us that we can build complete ATSC receivers 
with HD for less than $100.  It should be even less for DVB receivers.

>
>Australia got the worst of it by allowing SD only boxes to be sold, but
>still demanding that MPEG2 HD also be used.  Perhaps their HD penetration is
>so low that they could allow an HD switch to MPEG4 and compensate those few
>HD adopters.  Then they would be back in harmony with the Old Country.

No, we got the worst of it by trying to deploy HDTV too soon. If you 
have a receiver or an HD capable monitor that is more than a year 
old, it probably will not work (for HD) in a few years.  But take 
heart, it will still deliver 480P quality.

Regards
Craig
 
 
----------------------------------------------------------------------
You can UNSUBSCRIBE from the OpenDTV list in two ways:

- Using the UNSUBSCRIBE command in your user configuration settings at 
FreeLists.org 

- By sending a message to: opendtv-request@xxxxxxxxxxxxx with the word 
unsubscribe in the subject line.
References:
- [opendtv] Re: 20060117 Mark's (Almost) Monday Memo
  - From: Manfredi, Albert E
- [opendtv] Re: 20060117 Mark's (Almost) Monday Memo
  - From: Bob Miller
- [opendtv] Re: 20060117 Mark's (Almost) Monday Memo
  - From: Tom Barry
- [opendtv] Re: 20060117 Mark's (Almost) Monday Memo
  - From: Bob Miller
- [opendtv] Re: 20060117 Mark's (Almost) Monday Memo
  - From: John Shutt
[opendtv] Re: 20060117 Mark's (Almost) Monday Memo

Other related posts: