[ossrp-control] Re: What Is A Screen Reader?

MessageHi Jamal,

I agree.  My original statement wasn't intended as advocating for an auditory 
interface that exactly mimicked the visual interface, which is 
psycho-acoustically impossible.  It would be useful to come as close as we can 
in some situations, such as for drag and drop interactions, but on the whole it 
isn't necessary.  My contention was that the semantics encoded in a display can 
often convey additional information to the user beyond simple text to speech 
substitution.  Yes, this is sometimes available programatically, but at the 
moment there's two main problems with a programatic approach.  Firstly, the 
full semantics are not often available, and secondly, creating programatic 
access methods for each COM automation interface , and an according auditory 
interface, is a resource hungry activity that not even the commercial AT 
vendors have the resources to take full advantage of.

So, whilst programatic access should be a primary means of extracting 
semantics, there needs to be fall back strateegies to extract semantics encoded 
in a visual dispaly should the desired semantics not be available 
programatically.

Increasing the amount of semantics conveyed to the user should increase both 
accuracy and efficiency.  It can enable things like true and full  replication 
of visual skim reading, something that hasn't yet been implemented in any 
screen reader, a reduction of the trial and error approach to working out what 
actions a control may invoke, and other benefits.

Semantics are at the core of communication, and should the intended semantics 
not be conveyed accurately, or not conveyed at all, then communication is 
either absent, inefficient, or error prone.

Will
  ----- Original Message ----- 
  From: Jamal Mazrui 
  To: ossrp-control@xxxxxxxxxxxxx ; uvip@xxxxxxxxxxxxxxx 
  Sent: Monday, May 02, 2005 3:15 PM
  Subject: [ossrp-control] Re: What Is A Screen Reader?


  I like the definition with a couple of qualifications--ones that may have 
been intended but omitted for the purpose of simplification and brevity.

  The visual display is not the only source of semantic meaning for a screen 
reader.  Often, this has been the case, thus necessitating the need for an off 
screen model that intercepts and interprets drawing of text to the screen, 
capturing it when it still is characters rather than a picture of text.  
Fortunately, however, graphical operating systems have increasingly made use of 
underlying object models with interfaces that may be tapped by screen readers 
and other applications.  

  Under Windows, these have mostly been COM (Component Object Model) 
technologies, e.g., the building blocks of Microsoft Word, Excel, and Internet 
Explorer.  In such cases, the underlying semantic meaning may be more abstract 
than its visual representation, and a screen reader can choose to represent its 
meaning in ways that are better optimized for modes of perception other than 
sight, e.g., audio or braille.  This is, as I understand it, the primary 
mechanism by which the open source screen reader will seek to retrieve semantic 
meaning (not COM necessarily, but an underlying, probably .NET-based object 
model).  A good analogy here is HTML and cascading style sheets, where 
underlying content and physical  presentation are separated as much as 
possible, thus permitting rendering by various types of browsers and devices.

  A second qualification to the definition below is that some visual 
presentation is essentially decorative in nature and does not convey semantic 
meaning that a screen reader needs to render.  A screen reader should err on 
the side of making all possible information availble in case it does contain 
useful meaning to the blind user, but the goal should not be comprehensive 
mimicking of visual expression for the sake of a supposed equality that is 
thereby established.  Unless one is specifically involved in graphical arts, 
which would be difficult to ever make truly nonvisually accessible, the purpose 
of a computer is primarily functional rather than expressive in nature.  As 
such, the goal of a screen reader, in my opinion, should be competitive 
productivity by all reasonable, nonvisual means.  This sometimes involves 
strategies that maximize use of semantic meaning by other modalities in 
significantly different ways than more indirect attempts to convey the visual 
interface that sighted users experience.

  Regards,
  Jamal
  -----Original Message-----
  From: ossrp-control-bounce@xxxxxxxxxxxxx 
[mailto:ossrp-control-bounce@xxxxxxxxxxxxx] On Behalf Of Will Pearson
  Sent: Saturday, April 30, 2005 9:59 PM
  To: ossrp-control@xxxxxxxxxxxxx; uvip@xxxxxxxxxxxxxxx
  Subject: [ossrp-control] What Is A Screen Reader?


  Hi,

  I thought I'd share my, rather academic, view of what a screen reader is.  It 
offers a little glimmer into what screen readers could potentially do, and some 
of the pitfalls that the current crop of screen readers have fallen into.  All 
this is from the viewpoint of human computer interaction, psychology and 
communications theory.

  OK.  So, what is a screen reader?  Well, it's actually a lot more than people 
often assume it is.  It's not just something that grabs the text from the 
screen and reads it to you, well, at least it shouldn't be, it is in fact the 
interface by which user and machine communicate semantic meaning, relating to 
thoughts, concepts, actions and states.

  So, how did I arrive at this view?  As some of you may know, I've been 
researching into semantics and their role in software interfaces for a while 
now.  During this time, it's become apparent that software interfaces are just 
intended to communicate semantic meaning, but as we're not capable of extr 
sensory perception and telepathy with the computer, we need some way to 
indicate our thoughts, concepts, actions, etc. to the computer, and vice versa. 
 The way this is visually done is by placing elements on the screen, such as 
icons, buttons, etc. and having their shape, colour, position on screen and 
relationships to one another act as encoding channels by which the semantic 
meaning is conveyed.  Users then just point to an object, conveying the 
semantics of which element they would like to interact with, and either click 
it or select an action to perform on it from a menu.  All this is just a form 
of physical encoding of the semantic meaning between user and machine and vice 
versa.

  So, as a screen reader is a replacement for the visual interface, it's role 
is simply to act as an interface between user and machine and convey the 
semantic meaning generated by the machine.  However, there's a nasty twist, and 
that is that a screen reader has to get the semantic meaning that it is to 
communicate to the user from somewhere.  As the screen reader has no access to 
the internals of the machine, it's only available source of semantics that the 
machine wishes to convey is the visual interface, which uses encoding 
techniques such as colour, shape, position and spatial relationship to convey 
it's semantics.  So, a screen reader should really be about extracting the 
semantics from the visual display and encoding them in a non-visual form 
suitable for a blind user, and this is where current screen readers fall down.  
To maintain accurate and efficient communication with the user, all the 
semantics that are conveyed visually need to be conveyed to the user.  This 
includes things like spatial positioning and spatial relationships between 
interface elements, things that are currently lost to the user when they are 
using one of the current screen readers.  If this were to happen, then the 
number of errors, and according back-tracks and reissuing of commands that go 
along with errors, would decrease, and screen reader users would be more 
efficient beasts.

  I haven't gone into design specifics, as they're for another day, and these 
can dramatically affect efficiency as well, but that's my thoughts of what a 
screen reader should be doing.  In focusing on the semantics, then it's likely 
that through the use of semantic translation access to all those difficult 
accessibility problems could be increased.

  Will

Other related posts: