[liblouis-liblouisxml] Re: Conflicting priorities

  • From: "John J. Boyer" <john.boyer@xxxxxxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Tue, 14 Jan 2014 23:25:36 -0600

The reason why liblouis uses 16-bi`t or 32-bit unsigned integers internally is 
for speed of processing, that is, performance. liblouisutdml uses UTF-8 and 
converts it for liblouis.

John

On Tue, Jan 14, 2014 at 09:57:23PM +0100, Mesar Hameed wrote:
> Hi,
> 
> As far as I remembr, the standard just defines that all char arrays are
> null terminated, so that strln and friends can b used safely (without
> running into unassigned memory.
> There are two linked problems, storage and data representation.
> strlen on a null terminated ascii string will give you the number of
> characters in that string.
> strlen on a utf-8 or other encoded unicode string will return number of
> bytes used, but not how many actual printed characters there are in the
> string.
> 
> What usually happens for robust code is that a program or a library decides 
> to Operate
> internally with a fixed format, for example utf-8 encoded strings, so
> when new functionality has to be added or a bug squashed, everyone knows
> what arguments each function has, and in what format.
> For input and output, conversion from alternative ncodings to the internal
> representation should be done at the earliest and latest stage
> respectively, before its passed along to any of the internal code.
> 
> If/when we might do such a change, I would probably vote for utf-8 for
> all internal code.
> 
> thanks,
> Mesar
> On Tue 14/01/14,12:41, Keith Creasy wrote:
> > Hello all.
> > 
> > In my more recent C programming, actually C++, I've moved toward using the 
> > ANSI String rather than throwing chars around. Would this be a good move 
> > for when we get to refactoring LibLouis and LibLouisUTDML after the March 
> > release? Maybe that would eliminate some of the confusion and tedium of 
> > various character representations. 
> > 
> > Keith
> > 
> > 
> > -----Original Message-----
> > From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx 
> > [mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Michael 
> > Whapples
> > Sent: Tuesday, January 14, 2014 4:48 AM
> > To: liblouis-liblouisxml@xxxxxxxxxxxxx
> > Subject: [liblouis-liblouisxml] Re: Conflicting priorities
> > 
> > UTF-8, UTF-16 and UTF-32 are byte encodings of unicode characters. They 
> > define how you will store a given unicode character in bytes. This is used 
> > for low level operations like storing and transmitting unicode strings.
> > 
> > Aaron is talking about unicode characters which combine with other 
> > characters (eg. an accent sign which applies to the previous character) vs 
> > the single accented character unicode character. All of these are dealing 
> > with characters, you may store them in any encoding.
> > 
> > If liblouis does no normalisation (which I suspect it does not) then you 
> > would have to define the character combinations in the tables.
> > 
> > If liblouis did do normalisation then there would only be one 
> > representation to define in the tables.
> > 
> > PS. I wish you would stop referring to liblouis using UTF-16, this is 
> > factually incorrect as the 16-bit unicode liblouis can be compiled for is a 
> > fixed width encoding, so limited to which characters it can represent, and 
> > so is UCS-2. UTF-16 is a variable width encoding and is capable of 
> > representing the full unicode set.
> > 
> > Michael Whapples
> > On 14/01/2014 00:54, John J. Boyer wrote:
> > > I'm still confused about what kind of Unicode Aaron is talking about. 
> > > liblouis itself uses either UTF-16 or UTF-32 depending on how it is 
> > > compiled. it does not recognize a letter followed by the accent Unicode 
> > > value, although this could be handled with a translation table. 
> > > liblouisutdml requires UTF-8.
> > >
> > > John
> > >
> > > On Mon, Jan 13, 2014 at 03:23:57PM -0800, John Gardner wrote:
> > >> John, I sure am interested in math.  And I'd like to have the choices 
> > >> of graphics placement, enlargement, etc that I've described.  
> > >> ViewPlus embossers do the graphics to dots transformation though, and 
> > >> my expectation is that other embosser manufacturers are gonna have to 
> > >> do the same.  Since they won't, then my recommendation is that the 
> > >> tactile graphics be either placed at the end so they can be split off 
> > >> and put into whatever crappy software that embosser manufacturers 
> > >> make, or better still, be split into a completely separate folder.  I 
> > >> think this should be done in utdml, and I am working on a proposal 
> > >> for improving it, so my preference is to postpone that project for the 
> > >> moment.
> > >>
> > >> I'm not very interested in back translation, but getting emphasis 
> > >> right and getting other math languages working is a priority for me.  
> > >> Is this your question?
> > >>
> > >> You missed the point about those spaces.  White space is supposed to 
> > >> be ignored in token content.  However I have discovered many 
> > >> instances where regular spaces are used in mtext tokens.  I may be 
> > >> wrong, but I think this is wrong.  However in LEAN, I have filtered 
> > >> the white space out of tokens except for the mtext ones.  I am 
> > >> finding a lot of usage of extra spaces in mi and mn elements, 
> > >> presumably for readability.  And the Nemeth is leaving in those 
> > >> spaces, so the equations are wrong.  They aren't wrong if I make them 
> > >> from LEAN.
> > >>
> > >> John G
> > >>
> > >> -----Original Message-----
> > >> From: John J. Boyer [mailto:john.boyer@xxxxxxxxxxxxxxxxx]
> > >> Sent: Monday, January 13, 2014 2:25 PM
> > >> To: John Gardner
> > >> Subject: Conflicting priorities
> > >>
> > >> John,
> > >>
> > >> APH is most interested in emphasis for BrailleBlaster and in 
> > >> back-translation for the Braille Plus 18. They are interested in 
> > >> Nemeth also. Personally, my real interest is math and tactile 
> > >> graphics. I'm guessing that this is also your greatest interest.
> > >>
> > >> I hope you have solved the problem of getting the wrong Unicode values.
> > >> If you tell me the Unicode value for the unwanted space I can modify 
> > >> the nemeth.cti table and send it to you as an attachment.
> > >>
> > >> John
> > >>
> > >> --
> > >> John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc.
> > >> http://www.abilitiessoft.com
> > >> Madison, Wisconsin USA
> > >> Developing software for people with disabilities
> > >>
> > 
> > For a description of the software, to download it and links to project 
> > pages go to http://www.abilitiessoft.com
> > For a description of the software, to download it and links to
> > project pages go to http://www.abilitiessoft.com



-- 
John J. Boyer; President, Chief Software Developer
Abilitiessoft, Inc.
http://www.abilitiessoft.com
Madison, Wisconsin USA
Developing software for people with disabilities

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: