Blind Confidential (Blog) Friday, September 21, 2007 Do Screen Reader Developers Have The Skills To Design The Future? By Chris Hofstader Friday, September 21, 2007 One of my research projects involves designing a user interface for delivering mathematical equations through an auditory system. I, therefore, find myself thinking a lot about human short term memory and the amount of information a typical student can retain while listening to their computer speak an equation. I also need to concern myself with techniques to deliver this information in a unambiguous manner, a task equal in difficulty to the short term memory issues and possibly more important as my users can review the equation with cursor keys to refresh their memory but would struggle to calculate the correct answers to problem sets if our system cannot properly disambiguate the information. I have spent a lot of time over the past few years thinking about and working on models to improve the efficiency with which a person can use devices which expose their interface through audio. This obviously includes screen readers and other programs that people with vision impairment employ to more easily perform various tasks. I also concern myself with other applications for speech and audio user interfaces; namely, I study the application of auditory user interface concepts on mainstream devices and look for ways to leverage the market size of consumer electronics as applied to access technology. Finally, my research includes looking into auditory interfaces for people who may have a temporary disability (motorists can only safely use one hand and no vision when driving, military personnel cannot take their eyes off of a target or their hands off of their weapon, etc.). As I've documented throughout the history of Blind Confidential, I struggle badly with repetitive stress injuries and have started calling my form of RSI "Screen Reader Syndrome" because of the disproportionately large number of keystrokes that a person with vision impairment must use to achieve the same goal as a person with sight using the same software packages. I have a very high level of respect for the software engineers who write screen reader software. I have met most, if not all, of the lead technical people at the various vision related software businesses and all have impressed me with their intellect and dedication to the work they do. Doug Geoffray and his team have built a really solid code base and continue to deliver relatively interesting upgrades on a steady schedule. Mike Hill, of Dolphin, has always impressed me as one of the smartest guys with whom I have discussed technical issues. Matt Campbell certainly deserves the title of hottest newcomer to the biz as he continuously creates interesting solutions to very difficult problems. Of all of the people working on different solutions for people with vision impairment, I know the least about Willy Walker, the Sun Microsystems lead developer on the orca screen reader, but I do find his answers to questions and the other information he sends to the orca mailing list to b e very useful. I've only tried NVDA and Thunder a couple of times and don't know any of the folks involved in their development so I will withhold comment on them. I have met Travis at Apple who works on VoiceOver and he also seems like a very smart guy. Of all of the people in the biz, I know Glen Gordon much better than the others as we talked on a near daily basis for six years. In the nearly 30 years since I started working on software professionally, I have enjoyed the privilege of working with a lot of really smart people on all sorts of interesting problems. Glen Gordon stands at the top of my list of truly great hackers along with Richard Stallman and many other really smart folks. While Glen, Doug, Mike, Willy, Travis and Matt all have excellent technical skills, do they and their teams have the skills necessary to take the audio user interface paradigm to the next level, one in which people with vision impairment can use software with a level of efficiency similar to that of our sighted peers? If we explore the skills most necessary to build the current generation screen readers, we find two major skill sets: really low level operating system hacks and taking information from API and DOM and organizing and presenting it in a manner that a person with vision impairment can use effectively. Peter Korn would argue that the operating system hacks insert a level of instability to the screen reader and to the entire system and he may well be right. At the same time, gathering information from an API or DOM will miss information that an application developer neglected to code properly to conform to the API or DOM upon which the screen reader relies. Thus, the low level techniques might produce instability but can often deliver information unavailable to an API driven solution; meanwhile, screen readers that rely on API can provide really excellent information, including contexts and relationships that do not lend themselves too well to the screen "scraping" techniques. Obviously, both systems have their strengths and their problems. As far as I know, all of the Windows based screen access programs use a hybrid of operating system hacks and API/DOM to collect information while orca and VoiceOver both rely entirely on API and DOM for their data. In my six years at HJ/FS, I hired quite a number of people into software engineering jobs to work on JAWS, MAGic and our other programs. In virtually all cases, we looked for people who had at least some low level hacking experience because JAWS, like its Windows counterparts, uses a lot of operating system hacks to collect data with which it populates its off screen model (OSM) and MAGic, like all Windows magnifiers, must do some very delicate bit twiddling at the operating system level. Thus, we looked for programmers with a bit of silicon under their fingernails and a solid understanding of Windows drivers and low level graphical functionality. The last large step forward to improve the efficiency with which a screen reader user can hear information came with the introduction of the Speech and Sounds Manager in the JAWS 5.xx series. By using the Speech and Sounds Manager, one can cut down on the number of syllables they need to hear while also hearing a sound simultaneously with text read by their synthesizer which, depending upon the application which the user needs, can cut down on a substantial amount of time required to achieve a given goal. I know that Serotek System Access uses some sound augmentations when in a browser, that HPR did some of this in the past and I've heard people tell me of some now defunct screen readers doing a bit of this as well. No one, to my knowledge, though has implemented a system nearly as comprehensive as the one in JAWS which one can use in many areas of their computer usage to deliver more than a single dimension of semantic information at any given instance. Before Speech and Sounds Manager, JAWS defined the state of the art with its incredible collection of information augmentations gathered from various DOM in the Office suites and other applications that exposed a rich API. In most cases, these added data items did not appear anywhere on the screen but contained very useful information for users of these applications. For example, in the time prior to JAWS' adding DOM support and information augmentation to its support for Microsoft Excel, a person with a vision impairment could open and even edit Excel files but, especially when trying to read an Excel worksheet that someone else had made, they had to spend a lot of time poking around just to find which cells had data and what the row and column headers might say to identify what the value in the cell might mean. All of these initial augmentations were delivered in a textual format read by the speech synthesizer. Thus, JAWS users could learn more from and with a greater level of efficiency work with spreadsheets and other interesting applications. These augmentations provided a screen reader user with a lot of extra semantic information about the document of interest. It cut down on the amount of time and keystrokes a user had to spend while working with said document as the augmentations provided them with a way of ignoring information that they had no interest in and for finding the items of greatest interest to them within a specific task. In the years that have followed, most of the Dom based methods of improving efficiency through delivering additional meaning to the user and the quick keys method of navigating a web page more rapidly than had previously been possible have been imitated by most other screen readers on all platforms. The Speech and Sounds Manager remains the only major method of increasing the semantically interesting information in any given amount of time that resides entirely in JAWS. Unfortunately, I have not seen any truly innovative user interface improvements in any screen reader release since the JAWS 5.xx series. Certainly, Window-Eyes and System Access have added a large number of new features in each of their releases but, for the most part, they have been catching up to the 2003 releases of JAWS. Meanwhile, FS hasn't done much to raise the bar that its competitors must reach to catch up in the past three or four years. In terms of innovation, FS seems to include incremental new features of little interest and the other screen reader vendors, on Windows, GNU/Linux and Macintosh, seem hell bent on creating a JAWS clone. In conversations both Will Pearson and I have had with people at various screen reader companies, the notion of increasing the number of semantic dimensions delivered to a screen reader user in a single instant has been called a "gimmick" and some individuals have told us that, "it can't be important, none of our users have asked for it." Many years ago, when HJ still made JAWS, we commissioned a market research project to help us determine what our users actually wanted. One of the results most difficult for us to understand was the line that said that less than 2% of blind computer users wanted to use Excel. I recall discussing this with Eric Damery and we concluded that blind users would use Excel if it worked reasonably well with JAWS. Thus, although the market research told us that no one cared about a spreadsheet, we hired a contractor to write scripts for Excel, I worked closely with the contractor on features and such and today, about eight years later, many people who use JAWS and most other screen readers also use a spreadsheet. Thus, the argument that "no one has requested a given feature" continues to be baseless as the majority of screen reader users don't know they want something until it shows up in their AT. It's a classic chicken and egg problem. What user interface structures might help improve the efficiency with which a blink can interact with their computer? A number of different theorists and researchers could provide a lengthy list of ideas ranging from concepts like synthetic vision to 3D audio to a method with which a screen reader user can quickly move their attention from one conceptual group to another (the method which a sighted person employs unconsciously by moving their gaze. There are a fairly large number of other ideas bouncing around the research community but absolutely none of the screen reader vendors seem to spend any time or effort seeking the next major step forward for their users. At this time, I cannot blame these companies for their lack of enthusiasm for finding a more efficient user experience. Many of the products out there spend most of their time trying to catch up or jump past JAWS and, perhaps more to the point, none of these companies have people with the design skills to invent a model that will improve user efficiency. Thus, the titular question of this article, do the screen reader vendors have people with the skills necessary to move the state of the art forward? I think not. I do think that all of the screen reader vendors act in good faith and believe they make the right decisions regarding user interface but, unfortunately, they do not have anyone on their staffs dedicated to studying such problems, suggesting and designing new UI metaphors and improving the efficiency of absorbing information delivered by a screen reader. The missing skills can be a bit obscure. The first necessary skill would be in human computer interaction (HCI) with a strong background in non-visual interfaces. It would also be valuable to have people who understand cognitive psychology, learning theory, psycho-linguistics and other disciplines that can be applied to defining the next step in audio user interface design. Such people do exist and many have computer programming in their skill set as they tend to demonstrate their models in software simulations. Today, the only groups I am aware of who are exploring multi-dimensional audio interfaces for use by people with vision impairment are the people like David Greenwood who make really cool audio only real time action games. Shades of Doom, Greenwood's most famous title, plays up to 32 simultaneous sounds and a user can understand what is going and, react to the situation, kill the mutants and save the world from the mad scientist. Obviously, the information delivered by a action/adventure game would differ substantially from that delivered by a screen reader in a word processor but Greenwood's work and that of the other audio game hackers proves that blinks can understand much more information than the single syllable or pause produced by a speech synthesizer. Will the screen reader vendors try to move the state of the art forward? I certainly hope so. Audio user interfaces will start to appear in mainstream products. People with a number of smart appliances, blind or otherwise, will not want to look at a display every time they want to change the state of a device in their house. These people will want to issue verbal commands and receive audio feedback. These people will also expect their systems to function very efficiently as a smart home and smart appliances that take longer than their predecessors to function will be rejected out of hand. The screen reader companies do have a lot of knowledge about blind users and their needs and, in my opinion, if they added people to their staffs who could help them develop systems that deliver richer information, they will find themselves on the cutting edge of design for non-visual interfaces for both people with disabilities and for the mainstream consumer. -- End posted by BlindChristian at 9:59 AM http://blindconfidential.blogspot.com/2007/09/do-screen-reader-developers-have-skills.html BlindNews Mailing List Subscribe: BlindNews-Request@xxxxxxxxxxxxx with "subscribe" as subject Unsubscribe: BlindNews-Request@xxxxxxxxxxxxx with "unsubscribe" as subject Moderator: BlindNews-Moderators@xxxxxxxxxxxxx Archive: http://GeoffAndWen.com/blind RSS: http://GeoffAndWen.com/BlindNewsRSS.asp More information about RSS feeds will be published shortly.