Do Screen Reader Developers Have The Skills To Design The Future?

From: "BlindNews Mailing List" <BlindNews@xxxxxxxxxxxxxxx>
To: <BlindNews@xxxxxxxxxxxxx>
Date: Wed, 3 Oct 2007 08:45:39 -0400
Blind Confidential (Blog)
Friday, September 21, 2007

Do Screen Reader Developers Have The Skills To Design The Future? 

By Chris Hofstader

Friday, September 21, 2007

One of my research projects involves designing a user interface for delivering 
mathematical equations through an auditory system. I, therefore, find myself 
thinking a lot about human short term memory and the amount of information a 
typical student can retain while listening to their computer speak an equation. 
I also need to concern myself with techniques to deliver this information in a 
unambiguous manner, a task equal in difficulty to the short term memory issues 
and possibly more important as my users can review the equation with cursor 
keys to refresh their memory but would struggle to calculate the correct 
answers to problem sets if our system cannot properly disambiguate the 
information.

I have spent a lot of time over the past few years thinking about and working 
on models to improve the efficiency with which a person can use devices which 
expose their interface through audio. This obviously includes screen readers 
and other programs that people with vision impairment employ to more easily 
perform various tasks. I also concern myself with other applications for speech 
and audio user interfaces; namely, I study the application of auditory user 
interface concepts on mainstream devices and look for ways to leverage the
market size of consumer electronics as applied to access technology. Finally, 
my research includes looking into auditory interfaces for people who may have a 
temporary disability (motorists can only safely use one hand and no vision when 
driving, military personnel cannot take their eyes off of a target or their 
hands off of their weapon, etc.).

As I've documented throughout the history of Blind Confidential, I struggle 
badly with repetitive stress injuries and have started calling my form of RSI 
"Screen Reader Syndrome" because of the disproportionately large number of 
keystrokes that a person with vision impairment must use to achieve the same 
goal as a person with sight using the same software packages.

I have a very high level of respect for the software engineers who write screen 
reader software. I have met most, if not all, of the lead technical people at 
the various vision related software businesses and all have impressed me with 
their intellect and dedication to the work they do. Doug Geoffray and his team 
have built a really solid code base and continue to deliver relatively 
interesting upgrades on a steady schedule. Mike Hill, of Dolphin, has always 
impressed me as one of the smartest guys with whom I have discussed technical 
issues. Matt Campbell certainly deserves the title of hottest newcomer to the 
biz as he continuously creates interesting solutions to very difficult 
problems. Of all of the people working on different solutions for people with 
vision impairment, I know the least about Willy Walker, the Sun Microsystems 
lead developer on the orca screen reader, but I do find his answers to 
questions and the other information he sends to the orca mailing list to b
 e very useful. I've only tried NVDA and Thunder a couple of times and don't 
know any of the folks involved in their development so I will withhold comment
on them. I have met Travis at Apple who works on VoiceOver and he also seems 
like a very smart guy.

Of all of the people in the biz, I know Glen Gordon much better than the others 
as we talked on a near daily basis for six years. In the nearly 30 years since 
I started working on software professionally, I have enjoyed the privilege of 
working with a lot of really smart people on all sorts of interesting problems. 
Glen Gordon stands at the top of my list of truly great hackers along with 
Richard Stallman and many other really smart folks.

While Glen, Doug, Mike, Willy, Travis and Matt all have excellent technical 
skills, do they and their teams have the skills necessary to take the audio 
user interface paradigm to the next level, one in which people with vision 
impairment can use software with a level of efficiency similar to that of our 
sighted peers?

If we explore the skills most necessary to build the current generation screen 
readers, we find two major skill sets: really low level operating system hacks 
and taking information from API and DOM and organizing and presenting it in a 
manner that a person with vision impairment can use effectively. Peter Korn 
would argue that the operating system hacks insert a level of instability to 
the screen reader and to the entire system and he may well be right. At the same
time, gathering information from an API or DOM will miss information that an 
application developer neglected to code properly to conform to the API or DOM 
upon which the screen reader relies. Thus, the low level techniques might 
produce instability but can often deliver information unavailable to an API 
driven solution; meanwhile, screen readers that rely on API can provide really 
excellent information, including contexts and relationships that do not lend 
themselves too well to the screen "scraping" techniques. Obviously, both systems
have their strengths and their problems. As far as I know, all of the Windows 
based screen access programs use a hybrid of operating system hacks and API/DOM 
to collect information while orca and VoiceOver both rely entirely on API and 
DOM for their data.

In my six years at HJ/FS, I hired quite a number of people into software 
engineering jobs to work on JAWS, MAGic and our other programs. In virtually 
all cases, we looked for people who had at least some low level hacking 
experience because JAWS, like its Windows counterparts, uses a lot of operating 
system hacks to collect data with which it populates its off screen model (OSM) 
and MAGic, like all Windows magnifiers, must do some very delicate bit 
twiddling at the operating system level. Thus, we looked for programmers with a 
bit of silicon under their fingernails and a solid understanding of Windows
drivers and low level graphical functionality.

The last large step forward to improve the efficiency with which a screen 
reader user can hear information came with the introduction of the Speech and 
Sounds Manager in the JAWS 5.xx series. By using the Speech and Sounds Manager, 
one can cut down on the number of syllables they need to hear while also 
hearing a sound simultaneously with text read by their synthesizer which, 
depending upon the application which the user needs, can cut down on a 
substantial amount of time required to achieve a given goal. I know that 
Serotek System Access uses some sound augmentations when in a browser, that HPR 
did some of this in the past and I've heard people tell me of some now defunct 
screen readers doing a bit of this as well. No one, to my knowledge, though
has implemented a system nearly as comprehensive as the one in JAWS which one 
can use in many areas of their computer usage to deliver more than a single 
dimension of semantic information at any given instance.

Before Speech and Sounds Manager, JAWS defined the state of the art with its 
incredible collection of information augmentations gathered from various DOM in 
the Office suites and other applications that exposed a rich API. In most 
cases, these added data items did not appear anywhere on the screen but 
contained very useful information for users of these applications. For example, 
in the time prior to JAWS' adding DOM support and information augmentation to 
its support for Microsoft Excel, a person with a vision impairment could open 
and even edit Excel files but, especially when trying to read an Excel 
worksheet that someone else had made, they had to spend a lot of time poking 
around just to find which cells had data and what the row and column headers 
might say to identify what the value in the cell might mean. All of these 
initial augmentations were delivered in a textual format read by the speech 
synthesizer. Thus, JAWS users could learn more from and with a greater level
  of efficiency work with spreadsheets and other interesting applications.

These augmentations provided a screen reader user with a lot of extra semantic 
information about the document of interest. It cut down on the amount of time 
and keystrokes a user had to spend while working with said document as the 
augmentations provided them with a way of ignoring information that they had no 
interest in and for finding the items of greatest interest to them within a 
specific task.  In the years that have followed, most of the Dom based methods 
of improving efficiency through delivering additional meaning to the user
and the quick keys method of navigating a web page more rapidly than had 
previously been possible have been imitated by most other screen readers on all 
platforms. The Speech and Sounds Manager remains the only major method of 
increasing the semantically interesting information in any given amount of time 
that resides entirely in JAWS.

Unfortunately, I have not seen any truly innovative user interface improvements 
in any screen reader release since the JAWS 5.xx series. Certainly, Window-Eyes 
and System Access have added a large number of new features in each of their 
releases but, for the most part, they have been catching up to the 2003 
releases of JAWS. Meanwhile, FS hasn't done much to raise the bar that its 
competitors must reach to catch up in the past three or four years.

In terms of innovation, FS seems to include incremental new features of little 
interest and the other screen reader vendors, on Windows, GNU/Linux and 
Macintosh, seem hell bent on creating a JAWS clone. In conversations both Will 
Pearson and I have had with people at various screen reader companies, the 
notion of increasing the number of semantic dimensions delivered to a screen 
reader user in a single instant has been called a "gimmick" and some 
individuals have told us that, "it can't be important, none of our users have 
asked for it."

Many years ago, when HJ still made JAWS, we commissioned a market research 
project to help us determine what our users actually wanted. One of the results 
most difficult for us to understand was the line that said that less than 2% of 
blind computer users wanted to use Excel. I recall discussing this with Eric 
Damery and we concluded that blind users would use Excel if it worked 
reasonably well with JAWS. Thus, although the market research told us that no 
one cared about a spreadsheet, we hired a contractor to write scripts for Excel,
I worked closely with the contractor on features and such and today, about 
eight years later, many people who use JAWS and most other screen readers also 
use a spreadsheet. Thus, the argument that "no one has requested a given 
feature" continues to be baseless as the majority of screen reader users don't 
know they want something until it shows up in their AT. It's a classic chicken 
and egg problem.

What user interface structures might help improve the efficiency with which a 
blink can interact with their computer? A number of different theorists and 
researchers could provide a lengthy list of ideas ranging from concepts like 
synthetic vision to 3D audio to a method with which a screen reader user can 
quickly move their attention from one conceptual group to another (the method 
which a sighted person employs unconsciously by moving their gaze. There are a 
fairly large number of other ideas bouncing around the research community but
absolutely none of the screen reader vendors seem to spend any time or effort 
seeking the next major step forward for their users.

At this time, I cannot blame these companies for their lack of enthusiasm for 
finding a more efficient user experience. Many of the products out there spend 
most of their time trying to catch up or jump past JAWS and, perhaps more to 
the point, none of these companies have people with the design skills to invent 
a model that will improve user efficiency.

Thus, the titular question of this article, do the screen reader vendors have 
people with the skills necessary to move the state of the art forward? I think 
not. I do think that all of the screen reader vendors act in good faith and 
believe they make the right decisions regarding user interface but, 
unfortunately, they do not have anyone on their staffs dedicated to studying 
such problems, suggesting and designing new UI metaphors and improving the 
efficiency of absorbing information delivered by a screen reader.

The missing skills can be a bit obscure. The first necessary skill would be in 
human computer interaction (HCI) with a strong background in non-visual 
interfaces. It would also be valuable to have people who understand cognitive 
psychology, learning theory, psycho-linguistics and other disciplines that can 
be applied to defining the next step in audio user interface design. Such 
people do exist and many have computer programming in their skill set as they
tend to demonstrate their models in software simulations.

Today, the only groups I am aware of who are exploring multi-dimensional audio 
interfaces for use by people with vision impairment are the people like David 
Greenwood who make really cool audio only real time action games. Shades of 
Doom, Greenwood's most famous title, plays up to 32 simultaneous sounds and a 
user can understand what is going and, react to the situation, kill the mutants
and save the world from the mad scientist. Obviously, the information delivered 
by a action/adventure game would differ substantially from that delivered by a 
screen reader in a word processor but Greenwood's work and that of the other 
audio game hackers proves that blinks can understand much more information than 
the single syllable or pause produced by a speech synthesizer.

Will the screen reader vendors try to move the state of the art forward? I 
certainly hope so. Audio user interfaces will start to appear in mainstream 
products. People with a number of smart appliances, blind or otherwise, will 
not want to look at a display every time they want to change the state of a 
device in their house. These people will want to issue verbal commands and 
receive audio feedback. These people will also expect their systems to function
very efficiently as a smart home and smart appliances that take longer than 
their predecessors to function will be rejected out of hand. The screen reader 
companies do have a lot of knowledge about blind users and their needs and, in 
my opinion, if they added people to their staffs who could help them develop 
systems that deliver richer information, they will find themselves on the 
cutting edge of design for non-visual interfaces for both people with 
disabilities and for the mainstream consumer.

-- End

posted by BlindChristian at 9:59 AM

http://blindconfidential.blogspot.com/2007/09/do-screen-reader-developers-have-skills.html
BlindNews Mailing List
Subscribe: BlindNews-Request@xxxxxxxxxxxxx with "subscribe" as subject

Unsubscribe: BlindNews-Request@xxxxxxxxxxxxx with "unsubscribe" as subject

Moderator: BlindNews-Moderators@xxxxxxxxxxxxx

Archive: http://GeoffAndWen.com/blind

RSS: http://GeoffAndWen.com/BlindNewsRSS.asp

More information about RSS feeds will be published shortly.
Do Screen Reader Developers Have The Skills To Design The Future?

Other related posts: