[ossrp-control] Re: What Is A Screen Reader?

Hi Will,

I see what you're saying about some of the semantics being encoded in the spatial relationships of controls. I totally agree that this information needs to be exposed to the user. What I disagree with is trying to expose this information by mapping spatial relationships on the screen to spatial relationships in audio.

For instance, consider your example of a button being near to a list on the screen. What this spatial positioning implies is that the button may have some effect on the list. What an audio display should do is make it easy for the user to understand this cause-effect relationship (if it truly exists) and to bring it about. So, while the user is browsing the list, it should be very easy for the user to trigger that button and hear how it has changed the list. If the application does not expose this relationship, then it is up to the audio interface to do so somehow (e.g. with a script for the application).

Also, consider drag and drop. Drag and drop is considered a benefit of GUIs because it allows a user to quickly associate object and action (e.g. file plus recycling bin equals delete the file, document plus text editor equals open the document, selected text plus point in document equals move the text). Since the user can click anywhere on the screen to choose an object and mouse the mouse just a short distance to the desire action (or vice versa), drag and drop is efficient.

But in audio, is a spatial drag and drop really what's desired? From looking at the research literature, it is pretty clear that picking up and moving one sound object on top of another is not an easy task for people to do. I can't think of many situations where a person has everyday experience picking up one buzzing object and placing it on top of another one floating in space without feeling or seeing either one.

Fortunately, dragging and dropping is not what's important. The heart of the operation, the semantics of drag and drop, is the association of object to action, not the moving of some object across a space (visual or audio) to another object. In audio, this operation might be manifested more naturally as referencing. We use references fluently all the time in conversation with one another (e.g. "Hey Will. Here's a box and there's a table to your left. Put that there.") Wouldn't this be an easier way of associating objects and actions in a computer audio display? My guess is, choosing an object then, sometime later, choosing an action to act on that object will prove much less frustrating than trying to align two sounds in a virtual sound space.

Your point about trying to cram ever more information into the speech stream is also well taken. There are definitely other ways of communicating information to the user like using audio icons or earcons that screen readers have not explored. Using space to communicate some types of information to the user is certainly appropriate. But I'm not sure that mapping semantic relationships to spatial relationships is the best way to go in audio.

This is a good discussion. Don't think of what I'm saying as attacking your ideas. I'm just offering some counterpoint to your comments.

Pete

Will Pearson wrote:

Hi Pete,

An interesting point.  I think, that for some tasks, the semantic
information encoded in the physical appearance of a screen or document isn't
necessary to make it accessible, i.e. whether someone can interact with it,
but it can help in making it more usable, i.e. quicker to use, more
accurate, etc.

A web page, and most other forms of documents use spatial relationships
between text blocks to convey that multiple text blocks are members of the
same larger grouping.  This is one of the facilitators of visual skim
reading, where by someone can read the initial part of a text grouping, find
out it's not what they were wanting, and then jump to the next grouping
block.  The semantics of spatial alignment are used to indicate columns and
rows, which is how we identify information as belonging to the same column
or row, which can make searching, or skipping, of tabular data much more
efficient and easier.

There's also examples of the spatial relationships between GUI controls
being used to convey semantic information about the actions of those
controls.  For example, buttons placed next to a list box may indicate that
those buttons perform an action on the list box, maybe selecting it to
display different content, or that they perform an action on the selected
index of the listbox.  This insemantic information may not be encoded in the
label associated with a control.

Yes, all this semantic information can be encoded in text.  However,
increasing the size of a text buffer, given that text is serial in nature,
will increase the time taken to convey the semantic information to the user.
Ultimately, this will mean that, at least in terms of efficiency, blind
users are not as much in parity with sighted users as they may  have been.

I agree that for quite a few tasks spatial semantics aren't needed, as the
dev who designed the visual interface or input interface didn't take
advantage of encoding semantic meaning through spatial positioning and
relationships.  However, where it is used it helps to make a user more
accurate and more efficient, avoiding the often trial and error approach
taken when a user hasn't got the full semantics about the nature of a
control.

Will
----- Original Message ----- From: "Peter Parente" <parente@xxxxxxxxxx>
To: <ossrp-control@xxxxxxxxxxxxx>
Sent: Monday, May 02, 2005 4:04 PM
Subject: [ossrp-control] Re: What Is A Screen Reader?





Hello everyone,

I'm new here and interested in ways of making existing applications
accessible in audio, both for users with and without vision. I noticed
the thread about "what is a screen reader" and have to agree with both
Jamal's and Darrell's thoughts. There are certainly times when a user
needs to know exactly how information is being presented on the screen
(e.g. designing GUIs, doing document layout) and other times when the
visual representation is not important to the task (e.g. sending an
email, reading a web page, managing files). Interestingly, most work on
screen reading has been invested in the former area (i.e. trying to
convey exactly what's on the screen) even though many of the common
tasks users perform on computers today are not dependent on a visual
interface.

On a related note, people on this list might be interested in reading
some information about my dissertation project, Clique
(http://www.cs.unc.edu/~parente). It's an audio display system that
concentrates on making the latter set of tasks, those that are not
intimately tied to vision, accessible and usable in audio. It's been in
development for only about 5 months now, so it's certainly rough around
the edges.

I just learned about OSAT, so please don't think of Clique as a
competitive project, but rather a complimentary venture. I'd welcome any
feedback you'd like to give.

Regards,
Pete
To post to the list, send a message to:
ossrp-control@xxxxxxxxxxxxx
To unsubscribe, send a message to:
ossrp-control-request@xxxxxxxxxxxxx
and set the subject field of the message to "unsubscribe" (without the


quotes






To post to the list, send a message to:
ossrp-control@xxxxxxxxxxxxx
To unsubscribe, send a message to:
ossrp-control-request@xxxxxxxxxxxxx
and set the subject field of the message to "unsubscribe" (without the quotes



To post to the list, send a message to:
ossrp-control@xxxxxxxxxxxxx
To unsubscribe, send a message to:
ossrp-control-request@xxxxxxxxxxxxx
and set the subject field of the message to "unsubscribe" (without the quotes

Other related posts: