[liblouis-liblouisxml] Re: [liblouis] r715 committed - the last batch of files converted to utf-8.

  • From: Mesar Hameed <mesar.hameed@xxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Tue, 3 Jul 2012 15:44:04 +0100

Hi John B,

We are in the following situation:

1. to proceed with straightforward multilanguage support, tables should be 
using utf8

2. Christian pointed out, that latin 1 based languages find it very ugly to 
convert oumlouts a gravs, etc to \xhhhh format, since you would get 
an always or word opcode that uses a word which may contain several \xhhhh as 
an operand.
Note, this ugglyness is already enforced onto non latin languages.

3. we need to have opcodes accepting utf-8 arguments, not just ascii.
I fully understand your point about text editors, but on the other hand, we are 
more adaptable than the person wishing to add support to 
their language, or to correct the dot representation.
For them to find and understand the \xhhhh is probably harder than it needs to 
be.
It would be simpler for them to type:
letter <letter> <dots>
where they can physically see their letter on the screen.


4. for a release, we either revert my conversions, force the \xhhhh on all 
languages (uggly, and a lot of manual work) or implement utf8 
support in operands.

Personally I would vote for implementing utf8 support, since it has to be done 
in any case.
Also there is little sence to force Latin1 tables to rewrite their nice looking 
words/always opcodes since they would have to be rewritten 
again as soon as utf8 support was implemented.

All this, keeping in mind John Gardners email.

What do you feel is best to do?

Thanks,
Mesar
On Mon 02/07/12,13:50, John J. Boyer wrote:
> It is certainly ok, with me. Let's get the tables workking with what we 
> have now by using the \xhhhh notation.
> 
> John B.
> 
> On Mon, Jul 02, 2012 at 10:08:45AM -0700, John Gardner wrote:
> > Hello Mesar, on behalf of the BrailleBlaster steering committee, I
> > understand and appreciate your concern about UTF8.  Clearly we should have
> > started that way.
> > I have a suggestion for how to settle this issue.  First of all, let's make
> > the tables work with liblouis today.  My suggestion is to put everything
> > beyond 127 into the /x notation.  At some later time, I would very much like
> > to convert to using UTF8.  Vic Beckley has begun working part-time for
> > ViewPlus on liblouis items specific to the company, and he is presently
> > waiting on us to give him a new list of things to do.  In the meantime, I
> > have asked him to work with you to do this improvement.
> > 
> > Okay with all concerned?  Thanks!!!
> > 
> > John Gardner
> > 
> > 
> > -----Original Message-----
> > From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
> > [mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Mesar Hameed
> > Sent: Monday, July 02, 2012 9:53 AM
> > To: liblouis-liblouisxml@xxxxxxxxxxxxx
> > Subject: [liblouis-liblouisxml] Re: [liblouis] r715 committed - the last
> > batch of files converted to utf-8.
> > 
> > 128 to 255 is the range for codepages, which is now depricated due to its
> > inherent problems.
> > For example we cant write german+russian texts, or swedish+greek etc.
> > unicode will make sure that we will be able to support all combinations of
> > languages at once.
> > 
> > I believe that it can be done, but in a round about and error prone way.
> > 
> > Thanks,
> > Mesar
> > On Mon 02/07/12,10:16, John J. Boyer wrote:
> > > My mistake. I meant to say 0 to 255. 
> > > 
> > > John
> > > 
> > > On Mon, Jul 02, 2012 at 04:10:03PM +0100, Mesar Hameed wrote:
> > > > Hi John,
> > > > 
> > > > On Mon 02/07/12,09:52, John J. Boyer wrote:
> > > > > UTF-8 in the opcode arguments would be a bad idea. Since the 
> > > > > beginning, the character argument has accepted characters from 0 to
> > 127 as valid.
> > > > > UTF-8 conflicts with this.
> > > > 
> > > > According to the standard 0 to 127 is exactly the same for ascii and for
> > utf-8, so this is not a conflict.
> > > > 
> > > > Wikipedia reitterates this:
> > > > 
> > > > "The first 128 characters of Unicode, which correspond one-to-one 
> > > > with ASCII, are encoded using a single octet with the same binary value
> > as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well."
> > > > 
> > > > from: http://en.wikipedia.org/wiki/UTF-8
> > > > 
> > > > I could dig up a more reputable source if its required.
> > > > 
> > > > Thanks,
> > > > Mesar
> > > > For a description of the software, to download it and links to 
> > > > project pages go to http://www.abilitiessoft.com
> > > 
> > > --
> > > John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc.
> > > http://www.abilitiessoft.com
> > > Madison, Wisconsin USA
> > > Developing software for people with disabilities
> > > 
> > > For a description of the software, to download it and links to project 
> > > pages go to http://www.abilitiessoft.com
> > For a description of the software, to download it and links to project pages
> > go to http://www.abilitiessoft.com
> > 
> > For a description of the software, to download it and links to
> > project pages go to http://www.abilitiessoft.com
> 
> -- 
> John J. Boyer; President, Chief Software Developer
> Abilitiessoft, Inc.
> http://www.abilitiessoft.com
> Madison, Wisconsin USA
> Developing software for people with disabilities
> 
> For a description of the software, to download it and links to
> project pages go to http://www.abilitiessoft.com
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: