[liblouis-liblouisxml] Re: [liblouis] r715 committed - the last batch of files converted to utf-8.

  • From: "Vic Beckley" <vic.beckley3@xxxxxxxxx>
  • To: <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Tue, 3 Jul 2012 17:33:55 -0400

John,

The unicodedefs.cti is very helpful. I found that the trademark symbol is
actually \x2122. Even before converting the cy-cy-g1.utb table to UTF-8 in
revision 714, the trademark symbol in that file is coming back as \zFFFF99.
This would be \x0099. According to the unicodedefs.cti file and what I found
on the web, that character is not used. Do you think this is an error in the
table or in lou_translate when it is converting to \xhhhh format? Could
there be more than one Unicode code point used for the trademark sign?


Best regards from Ohio, U.S.A.,

Vic
E-mail: vic.beckley3@xxxxxxxxx


-----Original Message-----
From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of John J.
Boyer
Sent: Tuesday, July 03, 2012 4:13 PM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: [liblouis] r715 committed - the last
batch of files converted to utf-8.

To clarify, tables are "source" files and should be human-readable, just 
as program source code is human-readable. When you need a non-ascii 
character in Java, for example, you use the \uhhhh encoding. libloluis 
just uses x instead of u. 

I have not seen anyone except Mesar advocating for UTF-8 in the tables. 

There is a liblouis table called unicode.cti which can be used to find 
the hex values of Unicode characters. It contains comments which explain 
verbally what each character is.

John

On Tue, Jul 03, 2012 at 06:29:28PM +0100, Mesar Hameed wrote:
> Hi Vic,
> 
> On Tue 03/07/12,12:16, Vic Beckley wrote:
> > So in my example of the trademark symbol, which shows up as a^ in UTF-8,
> 
> No, it should not show up as a followed by a caret symbol, it should
simply be the trademark symbol itself.
> open the file with notepad plus plus,.
> 
> 
> > how
> > would you correctly write this using the \xhhhh format? If you were
writing
> > the table and wanted to define this symbol using UTF-8, 
> > how would you find out what it was.
> 
> Just a small correction, \xhhhh or sometimes also written as u+hhhh is
called the unicode code point of the symbol.
> then how this is stored on the computer, is called the encoding.
> 
> so utf7, utf8, utf16 and utf32 are all different computer formats for
encoding unicode, and are related to how many bytes 
> are used for the minimal representation of each codepoint.
> For a more detailed explenation, please have a look on wikipedia, both at
"unicode", "utf8" etc.
> 
> So to your question:
> if you use a screenreader, your screenreader probably has a shortcut key
for telling you the codepoint of the character your cursor is 
> currently standing on.
> 
> For nvda and orca, in desktop layout, this is the numpad 2, pressed three
quick times.
> for example orca is telling me trademark, 2122
> If you are using a braille display and the character is not defined in
your current table, you will see \x2122
> 
> If you were a sighted table writer, you probably have to go to the online
unicode standard, and look in the long list of characters for the 
> \xhhhh representation for the character you wanted.
> 
> hope this helps,
> Mesar
> For a description of the software, to download it and links to
> project pages go to http://www.abilitiessoft.com

-- 
John J. Boyer; President, Chief Software Developer
Abilitiessoft, Inc.
http://www.abilitiessoft.com
Madison, Wisconsin USA
Developing software for people with disabilities

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: