Hello all. In my more recent C programming, actually C++, I've moved toward using the ANSI String rather than throwing chars around. Would this be a good move for when we get to refactoring LibLouis and LibLouisUTDML after the March release? Maybe that would eliminate some of the confusion and tedium of various character representations. Keith -----Original Message----- From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx [mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Michael Whapples Sent: Tuesday, January 14, 2014 4:48 AM To: liblouis-liblouisxml@xxxxxxxxxxxxx Subject: [liblouis-liblouisxml] Re: Conflicting priorities UTF-8, UTF-16 and UTF-32 are byte encodings of unicode characters. They define how you will store a given unicode character in bytes. This is used for low level operations like storing and transmitting unicode strings. Aaron is talking about unicode characters which combine with other characters (eg. an accent sign which applies to the previous character) vs the single accented character unicode character. All of these are dealing with characters, you may store them in any encoding. If liblouis does no normalisation (which I suspect it does not) then you would have to define the character combinations in the tables. If liblouis did do normalisation then there would only be one representation to define in the tables. PS. I wish you would stop referring to liblouis using UTF-16, this is factually incorrect as the 16-bit unicode liblouis can be compiled for is a fixed width encoding, so limited to which characters it can represent, and so is UCS-2. UTF-16 is a variable width encoding and is capable of representing the full unicode set. Michael Whapples On 14/01/2014 00:54, John J. Boyer wrote: > I'm still confused about what kind of Unicode Aaron is talking about. > liblouis itself uses either UTF-16 or UTF-32 depending on how it is compiled. > it does not recognize a letter followed by the accent Unicode value, although > this could be handled with a translation table. liblouisutdml requires UTF-8. > > John > > On Mon, Jan 13, 2014 at 03:23:57PM -0800, John Gardner wrote: >> John, I sure am interested in math. And I'd like to have the choices >> of graphics placement, enlargement, etc that I've described. >> ViewPlus embossers do the graphics to dots transformation though, and >> my expectation is that other embosser manufacturers are gonna have to >> do the same. Since they won't, then my recommendation is that the >> tactile graphics be either placed at the end so they can be split off >> and put into whatever crappy software that embosser manufacturers >> make, or better still, be split into a completely separate folder. I >> think this should be done in utdml, and I am working on a proposal >> for improving it, so my preference is to postpone that project for the >> moment. >> >> I'm not very interested in back translation, but getting emphasis >> right and getting other math languages working is a priority for me. >> Is this your question? >> >> You missed the point about those spaces. White space is supposed to >> be ignored in token content. However I have discovered many >> instances where regular spaces are used in mtext tokens. I may be >> wrong, but I think this is wrong. However in LEAN, I have filtered >> the white space out of tokens except for the mtext ones. I am >> finding a lot of usage of extra spaces in mi and mn elements, >> presumably for readability. And the Nemeth is leaving in those >> spaces, so the equations are wrong. They aren't wrong if I make them from >> LEAN. >> >> John G >> >> -----Original Message----- >> From: John J. Boyer [mailto:john.boyer@xxxxxxxxxxxxxxxxx] >> Sent: Monday, January 13, 2014 2:25 PM >> To: John Gardner >> Subject: Conflicting priorities >> >> John, >> >> APH is most interested in emphasis for BrailleBlaster and in >> back-translation for the Braille Plus 18. They are interested in >> Nemeth also. Personally, my real interest is math and tactile >> graphics. I'm guessing that this is also your greatest interest. >> >> I hope you have solved the problem of getting the wrong Unicode values. >> If you tell me the Unicode value for the unwanted space I can modify >> the nemeth.cti table and send it to you as an attachment. >> >> John >> >> -- >> John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc. >> http://www.abilitiessoft.com >> Madison, Wisconsin USA >> Developing software for people with disabilities >> For a description of the software, to download it and links to project pages go to http://www.abilitiessoft.com For a description of the software, to download it and links to project pages go to http://www.abilitiessoft.com