[liblouis-liblouisxml] Re: Question about space characters (was: Re: Re: Standardized table headers)

  • From: "John J. Boyer" <johnjboyer@xxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Mon, 16 Feb 2009 10:23:45 -0600

Lars,

One possibility would be to use a table list. First of all, the blank 
character should always be the first one with a dot pattern of 0. You 
could make up two small tables, one consisting of the space character 
and then the newline and carriage return characters all set to 0 and the 
other of the space character and then the newline and carriage return 
characters set to virtual dot patterns, say 1a and 1d respectively. You 
could then use them with your No-No-g2.ctb table like this:
crlf.utb,No-No-g2.ctb 

John

On Mon, Feb 16, 2009 at 02:48:02PM +0100, Lars Bj�rndal wrote:
> Hi, John!
> 
> You wrote:
> 
> > Eitan,
> >
> > there is supposed to be a standard header consisting of liblouis: plus a
> > description. It would be fine with me if you put this on all the tables. 
> > Note that the tables adopted from brltty need some work to put them in
> > standard form other than just the headers. The upper and lower cases for
> > letters should be defined together using the uplow opcode, so that
> > liblouis can associate them. Look at chardefs.cti to see how this is
> > done. I suppose most languages have the Latin letters, so this block of
> > opcodes could just be pasted or included in most tables. Also the space
> > or blank should be the first character defined with a dot pattern of 0. 
> > Line feed and carriage return should be defined after it, also with a
> > dot pattern of 0. Tab can be defined as dot 9. For liblouisxml use, the
> > escape character, hex 1b , should be defined as dots 1b (just a
> > mnemonic). The unbreakable space, hex a0 should be defined as dot a. All
> > these standard definitions could be put together and included in most
> > tables. 
> 
> This is a good point to discuss how to handle new line characters etc.
> I would prefere to use one table for all projects that use liblouis.
> As you know, the HTCom project from Handy Tech, uses the liblouis to
> translate and backtranslate grade 2 for many languages, including
> Norwegian. It seems that they sends the whole string (the entire file,
> or a big part of it) through the liblouis dll at a time while
> translating/backtranslating. When you are working with xml/html, it
> seems reasonable to convert new line character into a space. Working
> with poor text, however, it may, at least in some situations, be
> better to keep the line feed characters. So what do you think the best
> solution is to this problem? Maybe the best would be to tell the
> programers who implement liblouis to translate one line at a time, or
> should we instead use a separate table for this purpose?
> 
> Thanks
> 
> Lars
> 
> > On Sat, Feb 14, 2009 at 04:17:16PM +0200, Eitan Isaacson wrote:
> >> Hi,
> >> 
> >> I don't read every single message on the list, so this might have
> >> already been discussed. But wouldn't it be cool if there was a
> >> standard string in the header of all table files that described the
> >> table in plain English? I see a lot of tables, but not all have a
> >> "liblouis:" line with a description. If all tables had this, it would
> >> allow us to automatically generate a human readable list of tables in
> >> a user interface. The language would also have to be consistent,
> >> sometimes I see caps, and other times not.
> >> 
> >> Should I just go ahead and edit all the tables?
> >> 
> >> Cheers,
> >>   Eitan.
> For a description of the software and to download it go to
> http://www.jjb-software.com

-- 
My websites:
http://www.godtouches.org
http://www.jjb-software.com
Location: Madison, WI, USA

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: