[opendtv] Re: Math of oversampling

  • From: Craig Birkmaier <craig@xxxxxxxxx>
  • To: opendtv@xxxxxxxxxxxxx
  • Date: Fri, 29 Apr 2005 05:51:44 -0400

At 9:14 PM -0400 4/28/05, Tom Barry wrote:
>I think we've already once had something of this conversation on
>AVS.  I still tend to believe we see images in a blend of two (or
>more) different modes.  One would be some sort of edge and/or
>shape perception which is probably not frequency based.  But the
>other is our perception of texture, and that one likely does rely
>on frequency.  It really seems to me our perception of realistic
>sharpness depends upon both.

It is at times like this that I take a bit of pride in the fact that 
the OpenDTV list keeps going, and going, and going. The discussions 
of the past few days have been first rate - I hope everyone is 
learning as much as I have.

Tom is definitely on the right track with respect to the presence of 
specialized receptors in our foveal vision that are tuned to specific 
types of stimuli. The following is from the SPTE Task Force Report on 
Digital Imaging which I helped author in 1992.

3.2.2 Human Visual Processing
Much of the research in visual science today is focused on the 
processing of data acquired by the image receptors. A variety of 
specialized analyzers in the eye process data from small localized 
regions and accumulate the results into channels which are processed 
by the brain to create an integrated view of the physicals 
environment.

There is evidence that the brain directs the activity of the image 
receptors for processes such as establishing white balance and light 
sensitivity levels. Simple localized analyzers are used to enhance 
the data transmitted back to the brain. Some of these analyzers are 
sensitive to a particular edge orientation; there are sufficient 
analyzers at each location to represent a full set of edge 
orientations. Additional tuned analyzers cover portions of the range 
of human sensitivity for spatial frequency, spatial position, 
temporal frequency direction of motion; and binocular disparity.

The data processed by these analyzers moves to the brain through two 
types of channels; a set of fast responding channels with relatively 
transient responses to stimuli, and a set of slower channels with 
relatively sustained responses to stimuli. Transient channels process 
the output of analyzers that are tuned for low spatial and high 
temporal frequency stimuli. Sustained channels process the output of 
analyzers that are tuned for high spatial and low temporal frequency 
stimuli.

So yes, we do have a variety of image receptors that are tuned for 
the acquisition of various components of an image. One of the 
interesting findings of the research that I studied when writing the 
report is that we LEARN how to see different edge orientations. There 
was a study done with rats where they were raised in an environment 
that was devoid of certain edge orientations - I think there were 
vertical lines in the environment, but no horizontal lines.  When 
these rats were later introduced into an environment with edges in 
all orientations, they could not see the edges that they had NOT 
learned to see; they would run into things because they could not see 
them.

It is important to note that there is a huge difference between the 
perception of still images and moving images. With still images we 
have plenty of time to analyze the image and to perceive fine 
details. With moving images the amount of information can overwhelm 
the human visual system. We tend to filter out the most important 
information and used directed eye movements to acquire high 
resolution views of portions of a high resolution image. This is 
especially true for
images that cover a large portion of our field of view, such as an 
large HDTV display - we cannot sample all of the information in the 
image, and are forced to track motion and acquire high resolution 
views of a portion of the image while ignoring most of the detail in 
other portions of the image. The only problem is that we cannot 
predict what a viewer will look at, so we need to have approximately 
the same level of detail everywhere.

So while it is interesting to study the images that Tom and Jeroen 
have created for us, they only tell us part of the story. It takes 
several hundred milliseconds to acquire a high resolution view of any 
image, as the foveal receptors dart around to acquire a high 
resolution view. Thus in a motion imaging system it is likely that 
several temporal samples are contributing to the perception of 
sharpness. Detail from several frames can add to improve the 
perception of sharpness. One of the early compression systems for 
desktop computers simply threw away about 75% of the image samples. 
But it did this by moving the sampling points around in a four pixel 
region - over a four frame time period all of the sample points in 
the original image were presented. This actually worked quite well, 
enabling the perception of more image detail than simply down 
sampling to a frame 1/4 the size. On the other hand, objects that are 
moving need some motion blur - especially at lower frame rates like 
24P - in order to fool the human visual system into seeing continuous 
motion. So a motion imaging system must deal with many issues as it 
attempts to fool the human visual system into seeing sharp, high 
resolution moving images.

Regards
Craig





 
 
----------------------------------------------------------------------
You can UNSUBSCRIBE from the OpenDTV list in two ways:

- Using the UNSUBSCRIBE command in your user configuration settings at 
FreeLists.org 

- By sending a message to: opendtv-request@xxxxxxxxxxxxx with the word 
unsubscribe in the subject line.

Other related posts: