The reason why liblouis uses 16-bi`t or 32-bit unsigned integers internally is for speed of processing, that is, performance. liblouisutdml uses UTF-8 and converts it for liblouis. John On Tue, Jan 14, 2014 at 09:57:23PM +0100, Mesar Hameed wrote: > Hi, > > As far as I remembr, the standard just defines that all char arrays are > null terminated, so that strln and friends can b used safely (without > running into unassigned memory. > There are two linked problems, storage and data representation. > strlen on a null terminated ascii string will give you the number of > characters in that string. > strlen on a utf-8 or other encoded unicode string will return number of > bytes used, but not how many actual printed characters there are in the > string. > > What usually happens for robust code is that a program or a library decides > to Operate > internally with a fixed format, for example utf-8 encoded strings, so > when new functionality has to be added or a bug squashed, everyone knows > what arguments each function has, and in what format. > For input and output, conversion from alternative ncodings to the internal > representation should be done at the earliest and latest stage > respectively, before its passed along to any of the internal code. > > If/when we might do such a change, I would probably vote for utf-8 for > all internal code. > > thanks, > Mesar > On Tue 14/01/14,12:41, Keith Creasy wrote: > > Hello all. > > > > In my more recent C programming, actually C++, I've moved toward using the > > ANSI String rather than throwing chars around. Would this be a good move > > for when we get to refactoring LibLouis and LibLouisUTDML after the March > > release? Maybe that would eliminate some of the confusion and tedium of > > various character representations. > > > > Keith > > > > > > -----Original Message----- > > From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx > > [mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Michael > > Whapples > > Sent: Tuesday, January 14, 2014 4:48 AM > > To: liblouis-liblouisxml@xxxxxxxxxxxxx > > Subject: [liblouis-liblouisxml] Re: Conflicting priorities > > > > UTF-8, UTF-16 and UTF-32 are byte encodings of unicode characters. They > > define how you will store a given unicode character in bytes. This is used > > for low level operations like storing and transmitting unicode strings. > > > > Aaron is talking about unicode characters which combine with other > > characters (eg. an accent sign which applies to the previous character) vs > > the single accented character unicode character. All of these are dealing > > with characters, you may store them in any encoding. > > > > If liblouis does no normalisation (which I suspect it does not) then you > > would have to define the character combinations in the tables. > > > > If liblouis did do normalisation then there would only be one > > representation to define in the tables. > > > > PS. I wish you would stop referring to liblouis using UTF-16, this is > > factually incorrect as the 16-bit unicode liblouis can be compiled for is a > > fixed width encoding, so limited to which characters it can represent, and > > so is UCS-2. UTF-16 is a variable width encoding and is capable of > > representing the full unicode set. > > > > Michael Whapples > > On 14/01/2014 00:54, John J. Boyer wrote: > > > I'm still confused about what kind of Unicode Aaron is talking about. > > > liblouis itself uses either UTF-16 or UTF-32 depending on how it is > > > compiled. it does not recognize a letter followed by the accent Unicode > > > value, although this could be handled with a translation table. > > > liblouisutdml requires UTF-8. > > > > > > John > > > > > > On Mon, Jan 13, 2014 at 03:23:57PM -0800, John Gardner wrote: > > >> John, I sure am interested in math. And I'd like to have the choices > > >> of graphics placement, enlargement, etc that I've described. > > >> ViewPlus embossers do the graphics to dots transformation though, and > > >> my expectation is that other embosser manufacturers are gonna have to > > >> do the same. Since they won't, then my recommendation is that the > > >> tactile graphics be either placed at the end so they can be split off > > >> and put into whatever crappy software that embosser manufacturers > > >> make, or better still, be split into a completely separate folder. I > > >> think this should be done in utdml, and I am working on a proposal > > >> for improving it, so my preference is to postpone that project for the > > >> moment. > > >> > > >> I'm not very interested in back translation, but getting emphasis > > >> right and getting other math languages working is a priority for me. > > >> Is this your question? > > >> > > >> You missed the point about those spaces. White space is supposed to > > >> be ignored in token content. However I have discovered many > > >> instances where regular spaces are used in mtext tokens. I may be > > >> wrong, but I think this is wrong. However in LEAN, I have filtered > > >> the white space out of tokens except for the mtext ones. I am > > >> finding a lot of usage of extra spaces in mi and mn elements, > > >> presumably for readability. And the Nemeth is leaving in those > > >> spaces, so the equations are wrong. They aren't wrong if I make them > > >> from LEAN. > > >> > > >> John G > > >> > > >> -----Original Message----- > > >> From: John J. Boyer [mailto:john.boyer@xxxxxxxxxxxxxxxxx] > > >> Sent: Monday, January 13, 2014 2:25 PM > > >> To: John Gardner > > >> Subject: Conflicting priorities > > >> > > >> John, > > >> > > >> APH is most interested in emphasis for BrailleBlaster and in > > >> back-translation for the Braille Plus 18. They are interested in > > >> Nemeth also. Personally, my real interest is math and tactile > > >> graphics. I'm guessing that this is also your greatest interest. > > >> > > >> I hope you have solved the problem of getting the wrong Unicode values. > > >> If you tell me the Unicode value for the unwanted space I can modify > > >> the nemeth.cti table and send it to you as an attachment. > > >> > > >> John > > >> > > >> -- > > >> John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc. > > >> http://www.abilitiessoft.com > > >> Madison, Wisconsin USA > > >> Developing software for people with disabilities > > >> > > > > For a description of the software, to download it and links to project > > pages go to http://www.abilitiessoft.com > > For a description of the software, to download it and links to > > project pages go to http://www.abilitiessoft.com -- John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc. http://www.abilitiessoft.com Madison, Wisconsin USA Developing software for people with disabilities For a description of the software, to download it and links to project pages go to http://www.abilitiessoft.com