[accessibleimage] An antidote to CSUN

Hi,

Now that all the hype of CSUN is behind us, I thought it time to begin to 
explore the more serious questions, the sort that are rarely touched on at 
CSUN.  The first question I felt worthy of an attempt at an answer is, whether 
using a screen reader can ever be as efficient as using sight?  There's been 
plenty of speculation on the topic, usually resulting in the answer that if 
<insert application vendor or platform vendor> waived their magic wand using a 
screen reader would be as efficient as sight.  However, after spending several 
years considering this, and other human computer interaction issues related to 
screen reader use, I take a different view.  My justification, whilst not 
exhaustive, is below.

The first area where screen readers appear to fall short is in their ability to 
communicate semantics.  Communication is all about communicating thoughts, 
concepts, states, etc., and communication between an interface for a piece of 
software and a user is no different in this respect.  The main problem is that 
screen readers, through their use of speech and Braille, both of which are 
serialised forms of communication, use less physical variables to encode 
semantic content than sight does.  There's roughly six variables that can be 
used to encode semantic content, and these are:
* The position of something on the X, Y and Z axes
* The position of something in time
* The frequency of the physical wave, represented by things like color, pitch, 
etc.
* The amplitude of the physical wave, or how strong it is
Using a computer with sight typically takes advantage of five of these 
variables, whilst screen readers typically only use two.  So, it will take 
longer to communicate the same semantic content using a screen reader than it 
will sight.  To some extent this has supporting evidence from psychological 
studies in which the listening and reading speeds of the same person were 
compared.  These studies found that the same individual could read something 
faster than they could listen to it.  There are differences between 
individuals, which can account for why some screen reader users can listen to 
things faster than some people can read things, but within the same individual 
the evidence seems to indicate that listening to things is slower.

This serialisation of semantic content, brought about by the smaller capacity 
of speech, also has implications for memory utilisation and cognitive workload. 
 Studies involving Functional Magnetic Resonance Imaging of the cortex have 
shown greater activity in the cortical regions of the brain when listening to 
speech than when reading something.  Not only is there activity on the left 
side of the cortex, in regions such as Brocha's Area and Wernicke's Area, which 
is present for both reading and listening, but listening to speech also 
produces activity in the right side of the cortex, which is thought to be 
related to contextual priming.  In addition to the extra neurological activity 
associated with language processing, there is also a higher demand on short 
term working memory.  As speech is temporary, one moment it is there, the next 
it is not, someone listening to speech has to remember more than someone 
reading something.  It is not so easy to move back to a previously listened to 
word or sentence than it is to move back to a previously read word or sentence. 
 Navigating by listening often involves listening to words, deciding whether 
they are the ones that are saught after, and if not, navigating some more and 
repeating the process.

Another consideration are the distinctions between programatic focus, the 
mechanism used to shift attention with a screen reader, and visual attention.  
Screen readers utilise a mechanism of programatic focus to shift the user's 
attention between user interface elements.  This means that a user's attention 
is only focused on a single point at once, something further compounded by a 
screen reader's use of serialised output.  Whilst visual attention is usually 
focused on a single object, it can shrink and grow, similar to a zoom lens, to 
encompass more or less of an object.  This ability to shift attention from a 
word to a paragraph and then onto the entire document provides a number of 
benefits for people reading documents.  The most obvious benefit is the ability 
to not only navigate by word or line, but to navigate around the document based 
on more granular objects, such as paragraphs, tables, images, etc.  Whilst 
similar functionality is available in some screen readers for a limited set of 
scenarios, this functionality is not as flexible as the visual mechanism used 
to shift attention.  The visual mechanism can group granular objects together, 
such as a table proceeded by a diagram, and can jump to those with very little 
requirement for processing.  In addition to granular navigation, attention can 
also be shifted based on physical features, such as color or location, which 
requires just the elements with those physical features to be searched, as 
suggested by Treisman's Feature Integration Theory.  As far as I am aware, no 
equivalent functionality to this exists in a screen reader.  One key difference 
between programatic and visual attention is that programatic attention can only 
be moved to fixed points, whilst visual attention can be moved to any point or 
object.  The final difference worth mentioning is that attention is not just 
limited to a single point in the visual field.  Whilst there are overt, 
indogenous, mechanisms to control visual attention through moving the point of 
fixation, attention can also be focused in the periphery of the visual field, 
through covert, indogenous, mechanisms.  This is a useful point, as it means 
that sighted people can detect changes in the state of something that occur 
away from their current point of fixation without the cognitive work involved 
in moving the point of fixation

So, I, for one, am beginning to form the opinion that screen readers are not 
physically capable of delivering the same levels of efficiency as sight can.  
This isn't to say that blind people cannot gain the same level of efficiency, 
just that it looks likely that they are unable to do this using a screen 
reader.  What is more, is that this is not the fault of a particular 
application or platform vendor, as is often claimed, but more a problem with 
the core concept of a screen reader, a concept that requires everything to be 
serialised.

Will

Other related posts: