[accessibleimage] An antidote to CSUN
- From: "Will Pearson" <will-pearson@xxxxxxxxxxxxx>
- To: <uvip@xxxxxxxxxxxxxxx>, <accessibleimage@xxxxxxxxxxxxx>
- Date: Sun, 9 Apr 2006 22:03:16 +0100
Hi,
Now that all the hype of CSUN is behind us, I thought it time to begin to
explore the more serious questions, the sort that are rarely touched on at
CSUN. The first question I felt worthy of an attempt at an answer is, whether
using a screen reader can ever be as efficient as using sight? There's been
plenty of speculation on the topic, usually resulting in the answer that if
<insert application vendor or platform vendor> waived their magic wand using a
screen reader would be as efficient as sight. However, after spending several
years considering this, and other human computer interaction issues related to
screen reader use, I take a different view. My justification, whilst not
exhaustive, is below.
The first area where screen readers appear to fall short is in their ability to
communicate semantics. Communication is all about communicating thoughts,
concepts, states, etc., and communication between an interface for a piece of
software and a user is no different in this respect. The main problem is that
screen readers, through their use of speech and Braille, both of which are
serialised forms of communication, use less physical variables to encode
semantic content than sight does. There's roughly six variables that can be
used to encode semantic content, and these are:
* The position of something on the X, Y and Z axes
* The position of something in time
* The frequency of the physical wave, represented by things like color, pitch,
etc.
* The amplitude of the physical wave, or how strong it is
Using a computer with sight typically takes advantage of five of these
variables, whilst screen readers typically only use two. So, it will take
longer to communicate the same semantic content using a screen reader than it
will sight. To some extent this has supporting evidence from psychological
studies in which the listening and reading speeds of the same person were
compared. These studies found that the same individual could read something
faster than they could listen to it. There are differences between
individuals, which can account for why some screen reader users can listen to
things faster than some people can read things, but within the same individual
the evidence seems to indicate that listening to things is slower.
This serialisation of semantic content, brought about by the smaller capacity
of speech, also has implications for memory utilisation and cognitive workload.
Studies involving Functional Magnetic Resonance Imaging of the cortex have
shown greater activity in the cortical regions of the brain when listening to
speech than when reading something. Not only is there activity on the left
side of the cortex, in regions such as Brocha's Area and Wernicke's Area, which
is present for both reading and listening, but listening to speech also
produces activity in the right side of the cortex, which is thought to be
related to contextual priming. In addition to the extra neurological activity
associated with language processing, there is also a higher demand on short
term working memory. As speech is temporary, one moment it is there, the next
it is not, someone listening to speech has to remember more than someone
reading something. It is not so easy to move back to a previously listened to
word or sentence than it is to move back to a previously read word or sentence.
Navigating by listening often involves listening to words, deciding whether
they are the ones that are saught after, and if not, navigating some more and
repeating the process.
Another consideration are the distinctions between programatic focus, the
mechanism used to shift attention with a screen reader, and visual attention.
Screen readers utilise a mechanism of programatic focus to shift the user's
attention between user interface elements. This means that a user's attention
is only focused on a single point at once, something further compounded by a
screen reader's use of serialised output. Whilst visual attention is usually
focused on a single object, it can shrink and grow, similar to a zoom lens, to
encompass more or less of an object. This ability to shift attention from a
word to a paragraph and then onto the entire document provides a number of
benefits for people reading documents. The most obvious benefit is the ability
to not only navigate by word or line, but to navigate around the document based
on more granular objects, such as paragraphs, tables, images, etc. Whilst
similar functionality is available in some screen readers for a limited set of
scenarios, this functionality is not as flexible as the visual mechanism used
to shift attention. The visual mechanism can group granular objects together,
such as a table proceeded by a diagram, and can jump to those with very little
requirement for processing. In addition to granular navigation, attention can
also be shifted based on physical features, such as color or location, which
requires just the elements with those physical features to be searched, as
suggested by Treisman's Feature Integration Theory. As far as I am aware, no
equivalent functionality to this exists in a screen reader. One key difference
between programatic and visual attention is that programatic attention can only
be moved to fixed points, whilst visual attention can be moved to any point or
object. The final difference worth mentioning is that attention is not just
limited to a single point in the visual field. Whilst there are overt,
indogenous, mechanisms to control visual attention through moving the point of
fixation, attention can also be focused in the periphery of the visual field,
through covert, indogenous, mechanisms. This is a useful point, as it means
that sighted people can detect changes in the state of something that occur
away from their current point of fixation without the cognitive work involved
in moving the point of fixation
So, I, for one, am beginning to form the opinion that screen readers are not
physically capable of delivering the same levels of efficiency as sight can.
This isn't to say that blind people cannot gain the same level of efficiency,
just that it looks likely that they are unable to do this using a screen
reader. What is more, is that this is not the fault of a particular
application or platform vendor, as is often claimed, but more a problem with
the core concept of a screen reader, a concept that requires everything to be
serialised.
Will
Other related posts: