[jawsscripts] Re: Getting the real ANSI/Unicode code for the current character?

  • From: "Octavian Rasnita" <orasnita@xxxxxxxxx>
  • To: <jawsscripts@xxxxxxxxxxxxx>
  • Date: Sun, 25 May 2014 01:15:41 +0300

That problem could be fixed by configuration. For example the user may 
choose that the hanzi/kanji are read in Japanese or Chinese, or that the 
cyrillic scripts are read in Russian or Bulgarian etc.

But yes, even for Latin chars that may have diacritics added which are used 
in European languages, Jaws can't speak them well.
Jaws adds confusion by reading chars like "ș" as "s" if the users don't 
configure each of those chars to be read in a different way, to be able to 
distinguish between different letters, but even after those settings, Jaws 
considers those chars like a kind of punctuation marks that split the words 
so those words are read a little strange.

But as I said, unfortunately we can't expect a very good Unicode support 
from screen readers, especially from Jaws that earns a big part of its 
profit from USA government.

--Octavian

----- Original Message ----- 
From: "Soronel Haetir" <soronel.haetir@xxxxxxxxx>
To: <jawsscripts@xxxxxxxxxxxxx>
Sent: Saturday, May 24, 2014 7:03 PM
Subject: [jawsscripts] Re: Getting the real ANSI/Unicode code for the 
current character?


>I can fully understand why the synthesizers don't have such support.
> For the most part what script is being used doesn't say very much
> about what language the text is in.  The synthesizer developers could
> of course set defaults and say that a particular synthesizer will
> treat all western European characters as English and Cyrillic
> characters as Russian for example but I could easily see that
> producing just as much confusion as the current situation of needing
> to invoke different synthesizers to deal with different scripts and
> often even ranges within the same script (there are plenty of western
> European characters the typical English synthesizer won't handle for
> example).
>
> But I see the situation with the lack of supplemental character
> support being quite a bit worse really.  At least if jaws could handle
> those characters then you could add them to a .spl file so you could
> find out what they are through char by char navigation.  As things
> stand right now jaws is simply incapable of telling you what char a
> supplemental character is.
>
> On 5/23/14, Octavian Rasnita <orasnita@xxxxxxxxx> wrote:
>> Yep, plus Jaws doesn't finally offer a good support for Unicode also 
>> because
>>
>> most voice synthesizers it uses (or maybe all) offer  a very limited 
>> Unicode
>>
>> support.
>> So even if Jaws would be able to understand all Unicode chars, if it will
>> pass them to Eloquence, then Eloquence will not know how to speak them, 
>> so
>> we will hear a nice question mark or just silence.
>> It would be nice if English Eloquence would be able to read
>> kirilic/greek/hiragana/katakana/kanji/arabic chars.
>>
>> Unicode is an encoding that allow using more scripts used by more 
>> languages
>>
>> in the same piece of text. I think that nowadays the screen readers 
>> should
>> be also able to offer accessibility to that kind of text.
>>
>> Beeing able to find the real ANSI or Unicode code for a certain character
>> might help us a little, but Jaws can read only what the editors offer,
>> so...
>>
>> But I guess we can't ask an extraordinary good Unicode support from Jaws 
>> if
>>
>> even some popular programming languages don't offer a good enough 
>> support.
>>
>> Perl has a very good support. It supports Unicode 6.3 for some time. Java
>> also has a good support, but only Java 7 supports Unicode 6.0.
>> Python 2.7 doesn't have a very good Unicode support. Python 3 has 
>> improved
>> this a little but very few people use Python 3.
>> PHP has made some improvements but it still doesn't have a very good 
>> Unicode
>>
>> support, and Ruby also.
>> DotNet 4.5 supports Unicode 6.0, but only under Windows 8. Under earlier
>> versions of Windows it supports only Unicode 5.0.
>> And after this, by offering a good support it doesn't mean to just 
>> support
>> the newest Unicode version, but also offering helpful features to work 
>> with
>>
>> Unicode in some functions, libraries, regular expressions etc.
>>
>> It was recommended to use hex editors to read Unicode codes, but
>> unfortunately these editors are not very accessible. Even TextPad which 
>> is
>> very accessible, when it is used as a hex editor it doesn't offer a very
>> good accessibility because when using the left/right arrow keys the 
>> cursor
>> moves from the code for a char to the code for the next/previous char, 
>> but
>> only the first digit from the current code is read, so we need to read it
>> using Jaws cursor. And strangely but something similar I found in other 
>> hex
>>
>> editors I tried in the past.
>>
>> So we should create our own hex dumpers that print the chars we analyse. 
>> Or
>>
>> do you know a better solution? (an accessible hex editor or?)
>>
>> --Octavian
>>
>> ----- Original Message -----
>> From: "Soronel Haetir" <soronel.haetir@xxxxxxxxx>
>> To: <jawsscripts@xxxxxxxxxxxxx>
>> Sent: Wednesday, May 21, 2014 9:33 PM
>> Subject: [jawsscripts] Re: Getting the real ANSI/Unicode code for the
>> current character?
>>
>>
>>> Note that there actually is at least one real problem with jaws'
>>> ability to get the character value at the cursor position.  Jaws is
>>> simply unable to deal with Unicode supplemental characters (those with
>>> values between U+10000 and U+10ffff).  If you try to get the character
>>> value for such a character jaws will always return 0.  I have informed
>>> FS about this but didn't get any response back beyond the automated
>>> notice.  Even putting entries in the .spl file doesn't enable jaws to
>>> do anything with them.
>>>
>>> On 5/20/14, Octavian Rasnita <orasnita@xxxxxxxxx> wrote:
>>>> UTF-8 doesn't need the BOM at all. Notepad always add the BOM while
>>>> other
>>>> editors like TextPad can let the user choose if he wants to use a BOM 
>>>> or
>>>> not. UTF-16 need a BOM to distinguish between Little Endian and Big
>>>> Endian.
>>>>
>>>> But the final result, the text displayed by the editor, as Soronel 
>>>> said,
>>>>
>>>> is
>>>> transformed in a certain Unicode format preferred by the editor, so 
>>>> Jaws
>>>> doesn't have the chance to find what was the original encoding of that
>>>> file
>>>> before it was loaded in the editor.
>>>>
>>>> --Octavian
>>>>
>>>> ----- Original Message -----
>>>> From: "Jamal Mazrui" <Jamal.Mazrui@xxxxxxx>
>>>> To: <jawsscripts@xxxxxxxxxxxxx>
>>>> Sent: Tuesday, May 20, 2014 11:42 PM
>>>> Subject: [jawsscripts] Re: Getting the real ANSI/Unicode code for the
>>>> current character?
>>>>
>>>>
>>>>>A text file can indicate its encoding via a byte order mark.
>>>>> http://en.wikipedia.org/wiki/Byte_order_mark
>>>>>
>>>>> -----Original Message-----
>>>>> From: jawsscripts-bounce@xxxxxxxxxxxxx
>>>>> [mailto:jawsscripts-bounce@xxxxxxxxxxxxx] On Behalf Of Soronel Haetir
>>>>> Sent: Monday, May 19, 2014 4:38 PM
>>>>> To: jawsscripts@xxxxxxxxxxxxx
>>>>> Subject: [jawsscripts] Re: Getting the real ANSI/Unicode code for the
>>>>> current character?
>>>>>
>>>>> The encoding of a plain text file is determined by means other than 
>>>>> the
>>>>> file itself.  For instance by the user telling the editor what 
>>>>> encoding
>>>>>
>>>>> to
>>>>> treat the file as.  There are statistical tests that can be used to
>>>>> detect
>>>>> various Unicode encodings but those tests are never exact; it is 
>>>>> always
>>>>> possible to produce both false positives and false negatives, whether
>>>>> you
>>>>> are testing for a file being in some Unicode format or for it not 
>>>>> being
>>>>>
>>>>> in
>>>>> any such format.
>>>>>
>>>>> Non-plain text files (such as rich text, ms word, HTML and lots of
>>>>> others)either have a standard encoding or the ability to specify
>>>>> internally what encoding the file uses.
>>>>>
>>>>> Note that my search for character 355 (which is apparently the
>>>>> character
>>>>> in quotes) indicates that it is 'Latin small letter t with cedilla'.
>>>>> If
>>>>> the ANSI file is saved using a code page that actually has that
>>>>> character
>>>>> as a one-byte value and you then load it into notepad I would expect 
>>>>> it
>>>>>
>>>>> to
>>>>> then say '355' because I would expect notepad to translate the file to
>>>>> unicode while working on the text and only translate it back to the
>>>>> desired code page at the time it is saved.  I did not bother 
>>>>> performing
>>>>> the calculation but I would not be at all surprised if 0x163 (the hex
>>>>> value for 355) requires two bytes.
>>>>>
>>>>> If you want to see what is actually saved on disk use a hex dump
>>>>> program.
>>>>> When using an text editor the contents on screen have only a weak
>>>>> relationship with the actual disk contents.
>>>>>
>>>>> On 5/19/14, Jonathan C. Cohn <joncohn@xxxxxxx> wrote:
>>>>>> Ah, are you sure they are not the same? ASCII which I believe is
>>>>>> essentially ANSI does not have this character in its mappings. How
>>>>>> does a text editor actually determine the encoding used by a specific
>>>>>> file is this in the directory meta-data or is it what UNIX would call
>>>>>> a “magic” number at the beginning of the file.
>>>>>>
>>>>>> Best wishes,
>>>>>>
>>>>>> Jonathan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On May 19, 2014, at 3:08 AM, Octavian Rasnita <orasnita@xxxxxxxxx>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Do you know if there is a function in Jaws scripts that can help us
>>>>>>> create a  script which speaks the real ANSI/Unicode code for the
>>>>>>> current character?
>>>>>>>
>>>>>>> The feature offered by Jaws when using NumPad 5 pressed 3 times
>>>>>>> quickly is broken and it doesn't give right results.
>>>>>>>
>>>>>>> For example, I created 2 text files, and both of them contain just
>>>>>>> the character "ţ".
>>>>>>> One of the files is ANSI encoded and it has a size of 1 byte, and 
>>>>>>> the
>>>>>>> second is UTF-8 encoded and it has a size of 2 bytes.
>>>>>>>
>>>>>>> The problem is that when I load both files in an editor, Notepad or
>>>>>>> TextPad, and I press NumPad 5 3 times quickly on that character, 
>>>>>>> Jaws
>>>>>>> speaks "Character 355" in both cases, even though there are 2
>>>>>>> different chars, with different codes.
>>>>>>> It is obvious that an ANSI character cannot have a code above 255,
>>>>>>> but Jaws doesn't care too much about this kind of details. :-)
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>> --Octavian
>>>>>>>
>>>>>>> __________ďż˝
>>>>>>>
>>>>>>> View the list's information and change your settings at
>>>>>>> //www.freelists.org/list/jawsscripts
>>>>>>>
>>>>>>
>>>>>> __________ďż˝
>>>>>>
>>>>>> View the list's information and change your settings at
>>>>>> //www.freelists.org/list/jawsscripts
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Soronel Haetir
>>>>> soronel.haetir@xxxxxxxxx
>>>>> __________
>>>>>
>>>>> View the list's information and change your settings at
>>>>> //www.freelists.org/list/jawsscripts
>>>>>
>>>>>
>>>>> __________�
>>>>>
>>>>> View the list's information and change your settings at
>>>>> //www.freelists.org/list/jawsscripts
>>>>>
>>>> __________�
>>>>
>>>> View the list's information and change your settings at
>>>> //www.freelists.org/list/jawsscripts
>>>>
>>>>
>>>
>>>
>>> --
>>> Soronel Haetir
>>> soronel.haetir@xxxxxxxxx
>>> __________�
>>>
>>> View the list's information and change your settings at
>>> //www.freelists.org/list/jawsscripts
>>>
>>
>> __________�
>>
>> View the list's information and change your settings at
>> //www.freelists.org/list/jawsscripts
>>
>>
>
>
> -- 
> Soronel Haetir
> soronel.haetir@xxxxxxxxx
> __________�
>
> View the list's information and change your settings at
> //www.freelists.org/list/jawsscripts
> 

__________�

View the list's information and change your settings at 
//www.freelists.org/list/jawsscripts

Other related posts: