[liblouis-liblouisxml] Re: liblouis Documentation in texinfo

  • From: Christian Egli <christian.egli@xxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Thu, 13 Nov 2008 12:19:53 +0100

Hi John

On Wed, 2008-11-12 at 16:35 -0600, John J. Boyer wrote:
> Thanks for your work. I am looking into making texinfo documentation for 
> liblouisxml also. 

That's fantastic. I've already started converting the liblouisxml guide
to texinfo. I just haven't posted it because it is not quite finished. I
attach my current version.

> How did you cerate the texinfo version of the liblouis 
> guide? 

I basically just copy and pasted the content of the html version from
the browser into a text editor and started doing the markup (I've done
that before for other manuals so I could draw from that experience).

> I'll probably be writing to you offlist for help with texinfo and 
> maybe autotools.

Sure, the automake integration should be similar to the one in liblouis,
in fact I just attached the changed Makefile.am. For completeness sake I
also added an entry in the Changelog file (also attached).

> The documentation was originally created in xhtml so that it could be
> translated with formatting by liblouisxml. 

Ah OK, now I understand. Is the html or the text produced by texinfo
fully accessible?

Thanks
Christian
-- 
Christian Egli
Swiss Library for the Blind and Visually Impaired
Grubenstrasse 12, CH-8045 Zürich, Switzerland
\input texinfo
@c %**start of header
@setfilename liblouisxml-guide.info
@include version.texi
@settitle Liblouisxml Programmer's and User's Guide

@dircategory Misc
@direntry
* Liblouisxml: (liblouisxml). An xml to Braille Translation Library.
@end direntry

@c Version and Contact Info
@set MAINTAINERSITE 
@uref{http://www.jjb-software.com/liblouisxml-guide.html,maintainers webpage}
@set AUTHOR John J. Boyer
@set MAINTAINER John J. Boyer
@set MAINTAINEREMAIL @email{john.boyer@xxxxxxxxxxxxxxxx}
@set MAINTAINERCONTACT @uref{mailto:john.boyer@xxxxxxxxxxxxxxxx,contact the 
maintainer}
@c %**end of header
@finalout

@c Macro definitions

@c Opcode.
@macro setting{name, args}
@tindex \name\
@item \name\ \args\
@end macro

@copying
This manual is for liblouisxml (version @value{VERSION},
@value{UPDATED}), an xml to Braille Translation Library.

This file may contain code borrowed from the Linux screenreader
@acronym{BRLTTY}, Copyright @copyright{} 1999-2006 by the
@acronym{BRLTTY} Team.

Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc.
@uref{www.viewplus.com} and JJB Software, Inc.
@uref{www.jjb-software.com}.

@quotation
This file is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser (or library) General Public License
(LGPL) as published by the Free Software Foundation; either version 3,
or (at your option) any later version.

This file is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser (or Library) General Public License LGPL for more details.

You should have received a copy of the GNU Lesser (or Library) General
Public License (LGPL) along with this program; see the file COPYING.
If not, write to the Free Software Foundation, 51 Franklin Street,
Fifth Floor, Boston, MA 02110-1301, USA.
@end quotation
@end copying

@titlepage
@title  Liblouisxml Programmer's and User's Guide

@subtitle Release @value{VERSION}
@author by John J. Boyer

@c The following two commands start the copyright page.
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage

@c Output the table of contents at the beginning.
@contents

@ifnottex
@node Top, Introduction, (dir), (dir)
@top Liblouis Programmer's and User's Guide

@insertcopying
@end ifnottex

@menu
* Introduction::                
* Programming with liblouisxml::  
* Transcribing with the xml2brl program::  
* Customization Configuring liblouisxml::  
* Connecting with the xml Document - Semantic-Action Files::  
* Implementing Braille Mathematics Codes::  
* Settings Index::              
* Function Index::              

@detailmenu
 --- The Detailed Node Listing ---

Programming with liblouisxml

* License::                     
* Overview::                    
* Files and Paths::             
* lbx_version::                 
* lbx_initialize::              
* lbx_translateString::         
* lbx_translateFile::           
* lbx_translateTextFile::       
* lbx_backTranslateFile::       
* lbx_free::                    

Transcribing with the xml2brl program

* Transcribing Microsoft Word Files with msword2brl::  

Customization: Configuring liblouisxml

* outputFormat::                
* translation::                 
* xml::                         
* style::                       

@end detailmenu
@end menu

@node Introduction, Programming with liblouisxml, Top, Top
@chapter Introduction

liblouisxml is a software component which can be incorporated into
software packages to provide the capability of translating any file in
the computer lingua franca xml format into properly transcribed
braille. This includes translation into grade two, if desired,
mathematical codes, etc. It also includes formatting according to a
built-in style sheet which can be modified by the user. The first
program into which liblouisxml has been incorporated is
@command{xml2brl}. This program will translate an xml or text file
into an embosser-ready braille file. It is not necessary to know xml,
because MSWord and other word processors can export files in this
format. If the word processor has been used correctly
@command{xml2brl} will produce an excellent braille file.

There is a Mac GUI application incorporating liblouisxml called louis.
For a link to it go to @uref{www.jjb-software.com/downloads}. A
similar Windows application is in the works.

Computer programmers who wish to use liblouisxml in their software can
find the information they need in the section Programming with
liblouisxml (@pxref{Programming with liblouisxml}). Those who wish to
change the output generated by liblouisxml should read the section
Configuring liblouisxml (@pxref{Customization Configuring
liblouisxml}). If you encounter a type of xml file with which liblouis
is not familiar you can learn how to tell it how to process that file
by reading Connecting with the xml document: Semantic-Action Files
(@pxref{Connecting with the xml Document - Semantic-Action Files}).
Finally, if you wish to implement a new braille mathematics code read
Implementing Braille Mathematics Codes (@pxref{Implementing Braille
Mathematics Codes}).

You will also find it advantageous to be acquainted with the companion
library liblouis, which is a braille translator and back-translator
(@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and
User's Guide}).

@node Programming with liblouisxml, Transcribing with the xml2brl program, 
Introduction, Top
@chapter Programming with liblouisxml

@menu
* License::                     
* Overview::                    
* Files and Paths::             
* lbx_version::                 
* lbx_initialize::              
* lbx_translateString::         
* lbx_translateFile::           
* lbx_translateTextFile::       
* lbx_backTranslateFile::       
* lbx_free::                    
@end menu

@node License, Overview, Programming with liblouisxml, Programming with 
liblouisxml
@section License

liblouisxml xml to Braille Translation Library

This file may contain code borrowed from the Linux screenreader
BRLTTY, Copyright (C) 1999-2006 by the BRLTTY Team.

Copyright (C) 2004-2007
ViewPlus Technologies, Inc. www.viewplus.com
and
JJB Software, Inc. www.jjb-software.com
All rights reserved

This file is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.

In addition to the permissions and restrictions contained in the GNU
General Public License (GPL), the copyright holders grant two explicit
permissions and impose one explicit restriction. The permissions are:

1) Using, copying, merging, publishing, distributing, sublicensing,
and/or selling copies of this software that are either compiled or
loaded as part of and/or linked into other code is not bound by the
GPL.

2) Modifying copies of this software as needed in order to facilitate
compiling and/or linking with other code is not bound by the GPL.

The restriction is:

3. The translation, semantic-action and configuration tables that are
read at run-time are considered part of this code and are under the
terms of the GPL. Any changes to these tables and any additional
tables that are created for use by this code must be made publicly
available.

All other uses, including modifications not required for compiling or
linking and distribution of code which is not linked into a combined
executable, are bound by the terms of the GPL.

This file is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; see the file COPYING. If not, write to the
Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA
02110-1301, USA.

@node Overview, Files and Paths, License, Programming with liblouisxml
@section Overview

liblouisxml is an "extensible renderer," designed to translate a wide
variety of xml documents into braille, but with a special emphasis on
technical material. The overall operation of liblouisxml is controlled
by a configuration file. The way in which a particular type of xml
document is to be rendered is specified by a semantic-action file for
that document type. Braille translation is done by the liblouis
braille translation and back-translation library (@pxref{Top, ,
Overview, liblouis-guide, Liblouis Programmer's and User's Guide}).
Its operation, in turn is controlled by translation table files. All
these files are plain text and can be created and edited in any text
editor. Configuration settings can also be specified on the command
line of the console-mode transcription program @command{xml2brl}.

The general operation of liblouisxml is as follows. It uses the
libxml2 library to construct a parse tree of the xml document. After
the parse tree is constructed, a function called
@code{examine_document} looks it over and determines whether math
translation tables, etc. are needed. @code{examine_document} also
constructs a prototype semantic-action file, if one does not exist
already. When it is finished, another function, called
@code{transcribe_document}, does the actual braille transcription. It
calls @code{transcribe_math} to handle MathML subtrees,
@code{transcribe_chemistry} for chemical formula subtrees,
@code{transcribe_graphic} for SVG graphics, etc. Entities are
translated to Unicode, if they are not already. Sequences of symbols
indicate superscripts, return to the baseline, subscripts, start and
end of fractions, etc. The Braille translator and back-translator
library liblouis is used to do the braille translation.

The @code{transcribe_math} function works in conjunction with the
latest version of liblouis and a special math translation table to
transcribe most mathematical expressions into fairly good Nemeth Code.
Much refinement is still necessary. Other braille mathematical codes
can be handled by modifying the translation table.

The functions which are not needed at the moment, such as
@code{transcribe_chemistry}, are only skeletons. However, I hope that
@code{transcribe_graphics} can be expanded in the near future to use
the graphics capability of the Tiger tactile graphics embossers.

The latest versions of liblouisxml and liblouis can be downloaded from
@uref{www.jjb-software.com}. Note that liblouisxml will only work with
the latest version of liblouis.

liblouisxml can be compiled to use either 16-bit or 32-bit Unicode
internally. This is inherited from liblouis, so liblouis must be
compiled first and then liblouisxml. Wherever 16 bits are mentioned in
this document, read 32 if you have compiled the library for 32 bits.

@node Files and Paths, lbx_version, Overview, Programming with liblouisxml
@section Files and Paths

As stated in the previous section, liblouisxml uses three kinds of
files, configuration files, semantic-action files, and liblouis
translation tables. The first two are discussed later in this
documentation. liblouis translation tables are discussed in the
liblouis guide (@pxref{Top, , Overview, liblouis-guide, Liblouis
Programmer's and User's Guide}) which is distributed with liblouis.
These files can be placed on various paths, which are determined at
compile time. One of these paths should be to the @file{lbx_files}
directory provided by liblouisxml, which contains the principal
configuration file (@file{canonical.cfg}) and the semantic-action
files. Another should be to the tables directory in the liblouis
distribution. Note that liblouisxml also generates some files, all of
which are placed on the current directory. These files are new
prototype semantic-action files, additions to old semantic-action
files, temporary files, and log files. The first two can be used to
extend the capability of liblouisxml to process xml documents. The
latter two are useful for debugging.

Paths are set by changing a few lines of code in the @file{paths.c}
module. If you are preparing liblouisxml for Windows a function which
finds the name of the "Program Files" directory for your locale is
called automatically. You can then modify the line containing the term
@samp{yourSubDir} as needed.

If you are preparing liblouisxml for a Unix-type system look for the
line that says @samp{Set Unix Paths}. The following three lines
establish a path to the @file{lbx_files} directory in your home
directory. As distributed, this directory contains the semantic-action
files and some configuration files. You can chose to copy the tables
from the liblouis distribution into it as well, or you can modify the
following three lines to point to the actual location of the tables.
You can also chose to place both the @file{lbx_files} and the tables
directory in @file{/etc}.

The function @code{addPath} takes care of adding path to liblouisxml
properly. You can specify many more than two paths.

@node lbx_version, lbx_initialize, Files and Paths, Programming with liblouisxml
@section lbx_version

@findex lbx_version
@example
char *lbx_version (void)
@end example

This function returns a pointer to a character string containing the
version of liblouisxml, plus other information such as the release
date and perhaps notable changes.

@node lbx_initialize, lbx_translateString, lbx_version, Programming with 
liblouisxml
@section lbx_initialize

@findex lbx_initialize
@example
void * lbx_initialize (
     const char *const configFilelist, 
     const char const *logFileName, 
     const char *const settingsString)
@end example

This function initializes the libxml2 library, runs
@file{canonical.cfg} and processes configuration settings given in
@code{configSettings} and the configuration files given in
@code{configFilelist}. This is a list of configuration file names
separated by commas. If the first character is a comma it is taken to
be a string containing configuration settings and is processed like
the @code{configSettings} string. Such a string must conform to the
format of a configuration file. Newlines should be represented with
ASCII 10. If @code{logfilename} is not @code{null}, a log file is
produced on the current directory. If it is @code{null} any messages
are printed on stderr. The function returns a pointer to the
@code{UserData} structure. This pointer is @code{void} and must be
cast to @code{(UserData *)} in the calling program. To access the
information in this structure you must include @file{louisxml.h}. This
function is used by @command{xml2brl}.

@node lbx_translateString, lbx_translateFile, lbx_initialize, Programming with 
liblouisxml
@section lbx_translateString

@findex lbx_translateString
@example
int lbx_translateString (
    const char *const configfilelist, 
    char * inbuf, 
    widechar *outbuf, 
    int *outlen, 
    unsigned int mode)
@end example

This function takes a well-formed xml expression in @code{inbuf} and
translates it into a string of 16-bit (or 32-bit if this has been
specified in liblouis) braille characters in @code{outbuf}. The xml
expression must be immediately followed by a zero or null byte.
Leading whitespace is ignored. If it does not then begin with the
characters @samp{<?xml} an xml header is added. If it does not begin
with @samp{<} it is assumed to be a text string and is translated
accordingly. The header is specified by the xmlHeader line in the
configuration file. If no such line is present, a default header
specifying UTF-8 encoding is used. The @code{mode} parameter specifies
whether you want the library to be initialized. If it is 0 everything
is reset, the @file{canonical.cfg} file is processed and the
configuration file and/or string (see previous section) are processed.
If @code{mode} is 1 liblouisxml simply prepares to handle a new document. For
more on the @code{mode} parameter see the next section.

Which 16-bit character in @code{outbuf} represents which dot pattern
is indicated in the liblouis translation tables. The
@code{configfilelist} parameter points to a configuration file or
string. Among other things, this file specifies translation tables. It
is these tables which control just how the translation is made,
whether in Grade 2, Grade 1, the Nemeth Code of Braille Mathematics or
something else.

Note that the @code{*outlen} parameter is a pointer to an integer.
When the function is called, this integer contains the maximum output
length. When it returns, it is set to the actual length used. The
function returns 1 if no errors were encountered and a negative number
if a complete translation could not be done.

@node lbx_translateFile, lbx_translateTextFile, lbx_translateString, 
Programming with liblouisxml
@section lbx_translateFile

@findex lbx_translateFile
@example
int lbx_translateFile (
    char *configfilelist, 
    char *inputFileName, 
    char *outputFileName, 
    unsigned int mode)
@end example

This function accepts a well-formed xml document in
@code{inputFilename} and produces a braille translation in
@code{outputFilename}. As for @code{lbx_translateString}, the
@code{mode} parameter specifies whether the library is to be
initialized with new configuration information or simply prepared to
handle a new document. In addition, the @code{mode} parameter can
specify that a document is in html, not xhtml. @file{liblouisxml.h}
contains an enumeration type with the values @code{dontInit} and
@code{htmlDoc}. These can be combined with an or (@samp{|}) operator. The
input file is assumed to be encoded in UTF-8, unless otherwise
specified in the xml header. The encoding of the output file may be
UTF-8, UTF-16, UTF-32 or Ascii-8. This is specified by the
@code{outputEncoding} line in the configuration file,
@code{configfilelist}. The function returns 1 if the translation was
successful.

@node lbx_translateTextFile, lbx_backTranslateFile, lbx_translateFile, 
Programming with liblouisxml
@section lbx_translateTextFile

@findex lbx_translateTextFile
@example
int lbx_translateTextFile (
    char *configfilelist, 
    char *inputFileName, 
    char *outputFileName, 
    unsigned int mode)
@end example

This function accepts a text file in @code{inputFilename} and produces
a braille translation in @code{outputFilename}. The input file is
assumed to be encoded in Ascii8. Blank lines indicate the divisions
between paragraphs. Two blank lines cause a blank line between
paragraphs (or headers). The output file may be in UTF-8, UTF-16, or
Ascii8, as specified by the @code{outputEncoding} line in the
configuration file, @code{configfilelist}. As for
@code{lbx_translateString}, the @code{mode} parameter specifies
whether complete initialization is to be done or simply initialization
for a new document.

@node lbx_backTranslateFile, lbx_free, lbx_translateTextFile, Programming with 
liblouisxml
@section lbx_backTranslateFile

@findex lbx_backTranslateFile
@example
int lbx_backTranslateFile (
    char *configfilelist, 
    char *inputFileName, 
    char *outputFileName, 
    unsigned int mode)
@end example

This function accepts a braille file in @code{inputFilename} and
produces a back-translation in @code{outputFilename}. The input file
is assumed to be encoded in Ascii8. The output file is in either plain
text or html, according to the setting of @code{backFormat} in the
configuration file. Html files are encoded in UTF8. In plain-text,
blank lines are inserted between paragraphs. The output file may be in
UTF-8, UTF-16, or Ascii8, as specified by the @code{outputEncoding}
line in the configuration file, @code{configfilelist}. The mode
parameter specifies whether or not the library is to be initialized
with new configuration information, as described in the section on
@code{lbx_translateString} (@pxref{lbx_translateString}).

@node lbx_free,  , lbx_backTranslateFile, Programming with liblouisxml
@section lbx_free

@findex lbx_free
@example
void lbx_free (void)
@end example

This function should be called at the end of the application to free
all memory allocated by liblouisxml and liblouis. If you wish to
change configuration files during your application, use a @code{mode}
parameter of 0 on the function call using the new configuration
information.

@node Transcribing with the xml2brl program, Customization Configuring 
liblouisxml, Programming with liblouisxml, Top
@chapter Transcribing with the xml2brl program

At the moment, actual transcription with liblouisxml is done with the
command-line (or console) program @command{xml2brl}. The line to type
is:

@example
xml2brl [OPTIONS] [-f config-file] [infile] [outfile]
@end example

The brackets indicate that something is optional. You will see that
nothing is required except the program name itself, @command{xml2brl}.
The various optional parts control how the program will behave, as
follows:

@table @option

@item -h 
This option causes @command{xml2brl} to print a help message
describing usage and exit.

@item -l 
This option will cause @command{xml2brl} and liblouisxml to print
error messages to @file{xml2brl.log} instead of stderr. The file will
be in the current directory. This option is particularly useful if
@command{xml2brl} is called by a GUI script or Web application.

@item -f configfile 
This specifies the configuration file which tells @command{xml2brl}
how to do the transcription. (It may be a list of file names separated
by commas.) This file specifies such things as the number of cells per
line, the number of lines per page, The translation tables to be used,
how paragraphs and headings are to be formatted, etc. If this part of
the command line is omitted, @command{xml2brl} assumes that the
configuration file is named @file{default.cfg} and is in the current
directory. If the configuration file name contains a pathname
@command{xml2brl} will consider this as a path on which to look for
files that it needs (@pxref{Files and Paths}).

@item -Csetting=value 
This option enables you to specify configuration settings on the
command line instead of changing the configuration file. You can use
as many @option{-C} options as you wish. Any settings can be specified
except those having to do with styles. The settings may be in any
order. They override any settings in @file{canonical.cfg} or in the
configuration file used by @command{xml2brl}.

@item -b 
back-translate. The input file must be a braille file, such as
@file{.brf}. The output file is a back-translation of this file. It
may be in either plain-text or xhtml (html), according to the setting
of backFormat in the outputFormat section of the configuration file.
Html files will contain page numbers and emphasis. To get good html,
the liblouis table must have the entry @samp{space \e 1b} so that it
will pass through escape characters. The @file{html.sem} file must
also contain the line @samp{pagenum pagenum}. Text output files simply
have a blank line between paragraphs. Encoding of text files is
controlled by the outputEncoding setting. Html files are always in
UTF-8.

@item -r 
Reformat. The input file must be a braille file, such as @file{.brf}.
The output is a braille file formatted according to the configuration
file. It is advisable to set backFormat to html, since this will
preserve print page numbers and emphasis. This program can be useful
for changing the line length and page length of a braille file, for
example, from 40 to 32 cells. It is also an excellent way to check the
accuracy of liblouis tables. The original page numbers at the tops and
bottoms of pages are discarded, and new ones are generated.

@item -p 
Poorly formatted input translation. Infile is any text file such as may
have been obtained by extracting the text in a pdf file. The input
file may also be an xml or html file which is so poorly formatted that
better braille can be obtained by ignoring the formatting.
@command{xml2brl} tries to guess paragraph breaks. The output is
generally reasonably formatted, that is, with reasonable paragraph
breaks.

@item -t 
The document is an h(t)ml file, not xhtml. This option is useful with
files downloaded from the Web in source form. Without it, the program
will first try to parse the file as an xml document, producing lots of
error messages. It will then try the html parser. With this option, it
goes directly to the html parser. See also the formatFor configuration
(@pxref{formatFor setting}) file setting, which enables you to format
the braille output for viewing in a browser.

@item infile 
This is the name of the input file containing the material to be
transcribed. The file may be either an xml file or a text file. The
@option{-b}, @option{-r} and @option{-p} options discussed above
provide for other types of files and processing. Typical xml files are
those provided by @uref{www.bookshare.org} or those derived from a
word processor by saving in xml format. If a text file is used
paragraphs and headings should be separated by blank lines. In such a
file there is no way to distinguish between paragraphs and headings,
so they will all be formatted as paragraphs, as specified by the
configuration file. However, if you want a blank line in the braille
transcription use two consecutive blank lines in the text file.

@item outfile 
This is the name of the output file. It will be transcribed as
specified by the configuration file and the configuration settings.
The following paragraphs provide more information on both the input
and output files.

@end table

@command{xml2brl} is set up so that it can be used in a "pipe". To do
this, omit both infile and outfile. Input is then taken from the
standard input unit.

The first file name encountered (a word not preceded by a minus sign)
is taken to be the input file and the second to be the output file. If
you wish input to be taken from stdin and still want to specify an
output file use two minus signs (@samp{--}) for the input file.

If only the program name is typed @command{xml2brl} assumes that the
configuration file is @file{default.cfg}, input is from the standard
input unit, and output is to the standard output unit.

@menu
* Transcribing Microsoft Word Files with msword2brl::  
@end menu

@node Transcribing Microsoft Word Files with msword2brl,  , Transcribing with 
the xml2brl program, Transcribing with the xml2brl program
@section Transcribing Microsoft Word Files with msword2brl

msword2brl: Type:
msword2brl infile outfile

Infile must be a Microsoft Word file. The script first calls the
@command{antiword} program, so you must have this installed on your
machine. @command{antiword} is called with @option{-x db}, which
causes the output to be in docbook format. This is piped to
@command{xml2brl}. The output file from @command{xml2brl} contains
much of the formatting, including emphasis, of the word file.

@node Customization Configuring liblouisxml, Connecting with the xml Document - 
Semantic-Action Files, Transcribing with the xml2brl program, Top
@chapter Customization: Configuring liblouisxml

The operation of liblouisxml is controlled by two types of files:
semantic-action files and configuration files. The former are
discussed in the section Connecting with the xml Document -
Semantic-action Files (@pxref{Connecting with the xml Document -
Semantic-Action Files}). The latter are discussed in this section. A
third type of file, braille translation tables, is discussed in the
liblouis documentation (@pxref{Top, , Overview, liblouis-guide,
Liblouis Programmer's and User's Guide}). Another section of the
present document which may be of interest is Implementing Braille
Mathematical Codes (@pxref{Implementing Braille Mathematics Codes}).

liblouisxml (with liblouis) can be used as the braille transcription
component in any number of applications with different overall
purposes and user interfaces. However, as of now the principal
application is @command{xml2brl}, which is a console application for
Mac and Linux. (There is also a Mac GUI application called louis.) The
information below therefore applies to @command{xml2brl} as much as to
liblouisxml.

Before discussing configuration files in detail it is worth noting
that the application program has access to the information in the
configuration files by calling the liblouisxml function
@code{lbx_initialize}. This function returns a pointer to a data
structure containing the configuration information.

@command{xml2brl} uses the configuration file @file{default.cfg}
unless a different one is specified via the @option{-f} command-line
option. The configuration file name may include a full path. In this
case, liblouisxml will consider this to be the user path. (This can be
changed at compile time (@pxref{Files and Paths}). If just a file name
(or list) is given, liblouisxml will consider the current directory as
the user path.

The configuration "file" specified with the @option{-f} option need
not be a single filename. It can be several file names separated by
commas. Only the first filename may have a path component. This path
is taken as the user path, as discussed in the previous paragraph.
This file-list feature is also found in liblouis. It enables you to
combine configuration files on the command line. For example, a file
list may consist of one file specifying the output format used in your
establishment, a comma, and then the name of a stylesheet.

After the path, if any, has been evaluated, but before reading any of
the files, liblouisxml reads in a file called @file{canonical.cfg}.
This file specifies values for all possible settings. It is needed to
complete the initialization of the program. You may alter the values
in the distribution @file{canonical.cfg}, but you should not delete
any settings. If a configuration file read in later contains a
particular setting name, the value specified simply replaces the one
specified in @file{canonical.cfg}.

As you will see by looking at @file{canonical.cfg}, it contains four
main sections, outputFormat, translation, xml and styles. In addition,
a configuration file can contain an include entry. This causes the
file named on that line to be read in at the point where the line
occurs. The sections need not follow each other in any particular
order, nor is the order of settings within each section important. In
this document and in the @file{canonical.cfg} file, where section and
setting names consist of more than one word, the first letter of each
word following the initial one is capitalized. This is merely for
readability. The case of the letters in these names is ignored by the
program. Section and setting names may not contain spaces.

Here, then, is an explanation of each section and setting in the
@file{canonical.cfg} file. When you look at this file you will see
that the section names start at the left margin, while the settings
are indented one tab stop. This is done for readability. it has no
effect on the meaning of the lines. You will also see lines beginning
with a number sign (@samp{#}), which are comments. Blank lines can
also be used anywhere in a configuration file. In general, a section
name is a single word or combination of unspaced words. However, each
style has a section of its own, so the word @samp{style} is followed
by the name of the style. Setting lines begin with the name of the
setting, followed by at least one space or tab, followed by the value
of the setting. A few settings have two values.

@menu
* outputFormat::                
* translation::                 
* xml::                         
* style::                       
@end menu

@node outputFormat, translation, Customization Configuring liblouisxml, 
Customization Configuring liblouisxml
@section outputFormat

This section specifies the format of the output file (or string, if no
file name is given).

@table @code

@setting{cellsPerLine, 40}
The number of cells in a braille line.

@setting{LinesPerPage, 25}
The number of lines on a braille page

@setting{interpoint, no}
Whether or not the output will be used to produce interpoint braille.
This affects the placement of page numbers and may affect other things
in the future. The only two values recognized are @samp{yes} and
@samp{no}.

@setting{lineEnd, \\r\\n} 
This specifies the control characters to be placed at the end of each
output line. These characters vary from one intended use of the output
to another. Most embossers require the carriage-return and line-feed
combination specified above. However, a braille display may work best
with just one or the other. Any valid control characters can be
specified.

@setting{pageEnd, \\f} 
The control Character to be given at the end of a page. Here it is a
forms-feed character, but it can be something else if deeded.

@setting{fileEnd, ^z}
The control character to be placed at the end of the file, here a
control-z.

@setting{printPages, yes}
Whether or not to show print page numbers if they are given in the xml
input. The two valid values are @samp{yes} and @samp{no}.

@setting{braillePages, yes}
Whether or not to format the output into pages. Here the value is
@samp{yes}, for use with an embosser. However the user of a braille
display may wish to specify @samp{no}, so as not to be bothered with
page numbers and forms feed characters. If no is specified the lines
will still be of the length given in callsPerLine, but the value of
linesPerPage will be ignored.

@setting{paragraphs, yes}
Whether or not to format the output into paragraphs, using appropriate
styles. If @samp{no} is specified, what would be a paragraph is output
simply as one long line. Applications that wish to do their own
formatting may specify @samp{no}.

@setting{BeginingPageNumber, 1}
This is the number to be placed on the first Braille page if
braillePages is yes. This is useful when producing multiple Braille
volumes.

@setting{printPageNumberAt, top}
If print page numbers are given in the xml input file they will be
placed at the top of each braille page in the right-hand corner. A
page separator line will also be produced on the braille page where
the print page break actually occurs. You may also specify
@samp{bottom} for this setting.

@setting{braillePageNumberAt, bottom}
The braille page number will be placed in the bottom right-hand corner
of each page. If interpoint yes has been specified only odd pages will
receive page numbers. If you specify @samp{top} for this setting then
@samp{bottom} must be specified for printPageNumberAt.

@setting{hyphenate, no}
If @samp{yes} is specified words will be hyphenated at the ends of
lines if a hyphenation table is available. In contracted English
Braille hyphenation is not generally used, but it can save
considerable space. The hyphenation table is specified as part of the
table list in the literaryTextTable setting of the translation
section.

@setting{outputEncoding, ascii8}
This specifies that the output is to be in the form of 8-bit ASCII
characters. This is generally used if the output is intended directly
for a braille embosser or display. The other values of encoding are
@samp{UTF8}, @samp{UTF16} and @samp{UTF32}. These are useful if the
application will process the output further, such as for generating
displays of braille dots on a screen.

@setting{inputTextEncoding, ascii8}
This setting is used to specify the encoding of an input text file.
The valid values are @samp{UTF8} and @samp{ascii8}.

@anchor{formatFor setting}
@setting{formatFor, textDevice}
This setting specifies the type of device the output is intended for.
@samp{textDevice} is any device that accepts plain text, including
embossers. You can also specify @samp{browser}. In this case the
output will be formatted for viewing in a browser. If the input file
contains links, they will be preserved and can be used in the normal
way. The text will be translated into braille with the correct line
length. Math and computer material will be translated appropriately.
These files work well in lynx and Internet Explorer, not so well in
elinks and Firefox.

@setting{backFormat, plain}
This setting specifies the format of back-translated files.
@samp{Plain} specifies plain-text, while @samp{html} specifies xhtml.
The latter is always encoded in UTF-8. Plain-text files can be encoded
in ascii8, UTF-8 or UTF-16. Html is strongly recommended, since it
will preserve print page numbering and emphasis.

@setting{backLineLength, 70}
This setting specifies the length of lines in back-translated files,
whether in plain-text or html. This is mainly for human readability.
Lines may sometimes be somewhat longer.

@setting{interline, no}
This setting specifies whether interlining is desired. If it is set to
@samp{yes}, the first line in the output will be a braille
translation, the next line will be its back-translation according to
the interlineBackTable. Back-translation is used instead of simply
presenting the print original because a braille line may contain
additional information, such as leading blanks, print or braille page
numbers, print page separator lines, etc.

@end table 

@node translation, xml, outputFormat, Customization Configuring liblouisxml
@section translation

This section specifies the liblouis translation tables to be used for
various purposes.

@table @code

@setting{literaryTextTable, en-us-g2.ctb} 
The table used for producing literary braille. This may be either
contracted or uncontracted.

@setting{uncontractedTable, en-us-g1.ctb}
The table used for producing uncontracted or Grade One braille. This
setting appears to be superfluous and may be eliminated in the future.

@setting{compbrailleTable, en-us-compbrl.ctb}
The table used for producing large amounts of output in computer
braille, such as computer programs. The computer braille table is
usually combined with one of the two tables above.

@setting{mathtextTable, en-us-mathtext.ctb}
This table specifies how the non-mathematical parts of math books are
to be translated. In many cases it will be the same as
literaryTextTable or uncontractedTable. For books translated with the
Nemeth Code it is different, because this code requires modification
of standard Grade Two.

@setting{MathexpTable, nemeth.ctb}
This is the table used to translate mathematical expressions.

@setting{editTable, edittable.ctb}
When the output includes both mathematics and text there may be errors
where one type of translation directly follows another. The editTable
removes these errors.

@setting{interlineBackTable, en-us-interline.ctb}
This setting specifies the table to be used for back-translation when
interlining is turned on. It must be tailored for this purpose, since
an ordinary forward-translation table may contain entries that do not
handle the additional information in braille lines correctly.

@end table

@node xml, style, translation, Customization Configuring liblouisxml
@section xml

This section provides various information for the processing of xml files.

@table @code

@setting{semanticFiles, *\,nemeth.semm}
This setting gives a list of semantic-action files. These files are
read in the sequence given in the list. Here the first member of the
list is an asterisk (@samp{*}). This means that the corresponding file
is to be named by taking the root element of the document and
appending @samp{.sem}. This asterisk member may occur anywhere in the
list.

@setting{xmlheader, <?xml version='1.0' encoding='UTF8' standalone='yes'?>}
This line gives the xml header to be added to strings produced by
programs like @command{Mathtype} that lack one.

@setting{entity, nbsp ^1}
This line defines an entity or substitution in an xml file. It is one
of those that has two values. The first is the thing to be replaced,
and the second is the replacement. As many entity lines as necessary
can be used. The information they contain is added to the information
provided by xmlHeader. In @file{canonical.cfg} this line is commented
out, because specifying it at this point would prevent the user from
specifying his own xmlheader.

@setting{internetAccess, yes}
The computer has an internet connection and liblouisxml may obtain
information necessary for the processing of this file from the
Internet. If this setting is @samp{no} liblouisxml will not try to use
the internet. The necessary information may, however, be provided on
the local machine in the form of a "dtd" file.

@setting{newEntries, yes}
liblouis may create a new semantic-action file (beginning with
@file{new_}) for a document with an unknown root element or a file
(beginning with @file{appended_}) containing new entries for an
existing semantic-action file. Both kinds of files are placed on the
current directory. If this setting is @samp{no} liblouisxml will dot
create a file of new entries and if it encounters a document with an
unknown root element it will issue an error message. Setting
newEntries to @samp{no} may be useful if users should not be bothered
with the minutiae of semantic-action files.

@end table

@node style,  , xml, Customization Configuring liblouisxml
@section style

The following sections all deal with styles. Each style has its own
section. Style section names are unlike other section names in that
they consist of the word style, followed by a space, followed by a
style name. More styles may be added as the software develops, and
some may be dropped.

@subsection style document

This section specifies the style of the whole document. The settings
given in it are applied to all other styles. If a section for another
style is given, the settings in it replace those from the document
style for that section. Because the settings in the document style
apply to all other styles, if a document style section is given it
must precede the sections for all other styles.

@table @code

@setting{linesBefore, 0}

This setting gives the number of blank lines which should be left
before the text to which this style applies. It is set to a non-zero
value for some header styles.

@setting{linesAfter, 0}

The number of blank lines which should be left after the text to which
this style applies.

@setting{leftMargin, 0}

The number of cells by which the left margin of all lines in the text
should be indented. Used for hanging indents, among other things.

@setting{firstLineIndent, 0}

The number of cells by which the first line is to be indented relative
to leftMargin. firstLineIndent may be negative. If the result is less
than 0 it will be set to 0.

@setting{translate, contracted}

This setting is currently inactive. It may be used in the future. This
setting tells how text in this style should be translated. Possible
values are @samp{contracted}, @samp{uncontracted}, @samp{compbrl},
@samp{mathtext} and @samp{mathexpr}.

@setting{skipNumberLines, no}

If this setting is @samp{yes} the top and bottom lines on the page
will be skipped if they contain braille or print page numbers. This is
useful in some of the mathematical and graphical styles.

@setting{format, leftJustified}

The format setting controls how the text in the style will be
formatted. Valid values are @samp{leftJustified},
@samp{rightJustified}, @samp{centered}, @samp{computerCoded},
@samp{alignColumnsLeft}, @samp{alignColumnsRight}, @samp{listColumns}
and @samp{listLines}. The first three are self-explanatory.
@samp{computerCoded} is used for computer programs and similar
material. The next three are used for tabular material.
@samp{alignColumnsLeft} causes the left ends of columns to be aligned.
@samp{alignColumnsRight} causes the right ends of columns to be
aligned. @samp{listColumns} causes columns to be placed one after the
other, separated by whatever separation character has been specified
in the semantic-action file, followed by a space. An escape character
(hex 1b) must also be specified to indicate the end of the column. Two
escape characters must be specified to indicate the end of a row.
Indentation of the lines in a row is controlled by the leftMargin and
firstLineIndent settings. @samp{listLines} is similar except that it
lists lines, as in poetry stanzas. The semantic-action file must
specify two escape characters to indicate the end of a line.

@setting{newPageBefore, no}

If this setting is @samp{yes}, the text will begin on a new page. This
is useful for certain mathematical and graphical styles. Page numbers
are handled properly.

@setting{newPageAfter, no}

If this setting is @samp{yes} any remaining space on the page after
the material covered by this style is handled is left blank, except
for page numbers.

@setting{rightHandPage, no}

if this setting is @samp{yes} and interpoint is yes the material
covered by this style will start on a right-hand page. This may cause
a left-hand page to be left blank except for page numbers. If
interpoint is @samp{no} this setting is equivalent to newPageBefore.

@end table

@subsection style arith

This style is used for arithmetic examples in elementary math books.
On recognizing this style, the translator formats the material in a
special way. This style has no settings different from those of the
document style at the moment. Nevertheless, the line @samp{style
arith} must be included in @file{canonical.cfg} so that it will be set
up properly.

@subsection style attribution

This style is used for an attribution following a quotation.

@table @code

@setting{format, rightJustified}

@end table

@subsection style biblio

This style is used for bibliographies. Settings will be added later.

@subsection style caption

This style is used for picture captions.

@table @code

@setting{leftMargin, 4}

@setting{firstLineIndent, 2}

Note that the first line is actually indented six cells.

@end table

@subsection style code

This style is used for computer programs.

@table @code

@setting{skipNumberLines, yes}

@setting{linesBefore, 1}

@setting{linesAfter, 1}

@setting{format, computerCode}

@end table

@subsection style contents

This is for entries in a table of contents.

@subsection style dedication

This style is for the dedication of a book.

@table @code

@setting{newPageBefore, yes}

@setting{newPageAfter, yes}

@setting{center, yes}

@end table

@subsection style directions

This is for giving directions for exercises.

@subsection style dispmath

This is for showing mathematics that is set off from the text.

@table @code

@setting{leftMargin, 2}

@end table

@subsection style disptext

This if for text that is set off from the rest of the text.

@table @code

@setting{leftMargin, 2}

@setting{firstLineIndent, 2}

@end table

@subsection style exercise 1

This is the first level in a set of exercises where there are sublevels.

@table @code

@setting{leftMargin, 2}

@setting{firstLineIndent, -2}

@end table

@subsection style exercise2

This is for the second level of exercises, such as exercise a following 
exercise 1.

@table @code

@setting{leftMargin, 4}

@setting{firstLineIndent, -2}

@end table

@subsection style exercise3

This is for the third level of exercises.

@table @code

@setting{leftMargin, 6}

@setting{firstLineIndent, -2}

@end table

@subsection style glossary

This is for a glossary.

@table @code

@setting{firstLineIndent, 2}

Section: style graph

This style reserves space for a graph or other tactile material.

@setting{skipNumberLines, yes}

@end table

@subsection style graphLabel

This style reserves space for the label of a graph.

@subsection style heading1

This style is used for main headings, such as chapter titles.

@table @code

@setting{linesBefore, 1}

@setting{center, yes}

@setting{linesAfter, 1}

@end table

@subsection style heading2

The first level of subreadings after the main heading.

@table @code

@setting{linesBefore, 1}

@setting{firstLineIndent, 4}

@end table

@subsection style heading3

The third level of headings.

@table @code

@setting{firstLineIndent, 4}

@end table

@subsection style heading4

The fourth and final level of headings.

@table @code

@setting{firstLineIndent, 4}

@end table

@subsection style indexx

This style is used for indexes. The extra @samp{x} is not an error. It
is there to prevent conflict with names elsewhere in the software.

@subsection style list

This is for the individual items in a list.

@table @code

@setting{firstLineIndent, -2}

@setting{leftMargin, 2}

@end table

@subsection style matrix

This style causes its contents to be formatted in a way suitable for
the representation of matrices.

@table @code

@setting{format, alignColumnsLeft}

@end table

@subsection style music

This style is used for braille music.

@table @code

@setting{skipNumberLines, yes}

@end table

@subsection style note

This style is used for footnotes.

@subsection style para

Paragraph. This is ordinary body text.

@table @code

@setting{firstLineIndent, 2}

@end table

@subsection style quotation

This style is used for quotations that are set off from the rest of
the text.

@table @code

@setting{linesBefore, 1}

@setting{linesAfter, 1}

@end table

@subsection style section

This style is used for a section with a section number.

@table @code

@setting{firstLineIndent, 4}

@end table

@subsection style spatial

This style is used for mathematical material that is arranged
spatially, such as large fractions.

@subsection style stanza

this style is used for stanzas in poetry.

@table @code

@setting{linesBefore, 1}

@setting{linesAfter, 1}

@setting{format, listLines}

@end table

@subsection  style style1

This and the subsequent numbered styles can be used by the user for
any purpose.

@subsection style style2

@subsection style style3

@subsection style style4

@subsection style style5

@subsection style subsection

This style is used for subsections with a subsection number.

@table @code

@setting{firstLineIndent, 4}

@end table

@subsection style table

This style is used for ordinary tables.

@subsection style titlepage

This style is used to begin a title page.

@table @code

@setting{newPageAfter, yes}

@end table

@subsection style trnote

This style is used for transcriber's notes which are set off from the
text.

@subsection style volume

This style is used to indicate the beginning of a braille volume.

@node Connecting with the xml Document - Semantic-Action Files, Implementing 
Braille Mathematics Codes, Customization Configuring liblouisxml, Top
@chapter Connecting with the xml Document - Semantic-Action Files

When liblouisxml (or @command{xml2brl}) processes an xml document, it
needs to be told how to use the information in that document to
produce a properly translated and formatted braille document. These
instructions are provided by a semantic-action file, so called because
it explains the meaning, or semantics, of the various specifications
in the xml document. To understand how this works, it is necessary to
have a basic knowledge of the organization of an xml document.

An xml document is organized like a book, but with much finer detail.
first there is the title of the whole book. Then there are various
sections, such as author, copyright, table of contents, dedication,
acknowledgments, preface, various chapters, bibliography, index, and
so on. Each chapter may be divided into sections, and these in turn
can be divided into subsections, subsubsections, etc. In a book the
parts have names or titles distinguished by capitalization, type
fonts, spacing, and so forth. In an xml document the names of the
parts are enclosed in angle brackets (@samp{<>}). for example, if
liblouisxml encounters @code{<html>} at the beginning of a document,
it knows it is dealing with a document that conforms to the standards
of the extensible markup language (xhtml) - at least we hope it does.
When you see a book, you know it's a book. The computer can know only
by being told. Something enclosed in angle brackets is called an
"element" (more properly, a "tag") in xml parlance. (There may be more
between the angle brackets than just the name of the element. More of
this later). The first "element" in a document thus tells liblouisxml
what kind of document it is dealing with. This element is called the
"root element" because the document is visualized as branching out
from it like a tree. Some examples of root elements are @code{<html>},
@code{<math>}, @code{<book>}, @code{<dtbook3>} and
@code{<wordDocument>}. Whenever liblouisxml encounters a root element
that it doesn't know about it creates a new file called a
semantic-action file. The name of this file is formed by stripping the
angle brackets from the root element and adding a period plus the
letters @samp{sem}. If you look in a directory containing
semantic-action files you will see names like @file{html.sem},
@file{dtbook3.sem}, @file{math.sem}, and so on.

Sometimes it is advantageous to preempt the creation of a
semantic-action file for a new root element. For example, an article
written according to the docbook specification may have the root
element @code{<article>}. However, the specification itself has the
root element @code{<book>}. In this case you can specify the
@file{book.sem} file in the configuration file by writing, in the xml
section,:

@example
semanticFiles book.sem
@end example

You will note that this setting uses the plural of "file". This is
because you can actually specify a list of file names separated by
commas. You might want to do this to specify the semantic-action file
for the particular braille mathematical code to be used. For example:

@example
semanticFiles book.sem,ukmath.sem
@end example

As you will see in the next section, different braille style
conventions and different braille mathematical codes may require
different semantic-action files

liblouisxml records the names of all elements found in the document in
the semantic-action file. The document has a multitude of elements,
which can be thought of as describing the headings of various parts of
the document. One element is used to denote a chapter heading. Another
is used to denote a paragraph, Still another to denote text in bold
type, and so on. In other words, the elements take the place of the
capitalization, changes in type font, spacing, etc. in a book.
However, The computer still does not know what to do when it
encounters an element. The semantic-action file tells it that.

Consider @file{html.sem}. A copy is included as part of this
documentation with the name @file{example_sem}. It may differ from the
file that liblouisxml is currently using. You will see that it begins
with some lines about copyrights. Each line begins with a number sign
(@samp{#}). This indicates that it is a "comment," intended for the
human reader and the computer should ignore it. Then there is a blank
line. Finally, there are two other comments explaining that the file
must be edited to get proper output. This is because a human being
must tell the computer what to do with each element. The semantic
files for common types of documents have already been edited, so you
generally don't have to worry about this. But if you encounter a new
type of document or wish to specify special handling for styles or
mathematics you may have to edit the semantic-action file or send it
to the maintainer for editing. In any case the rest of this section is
essential for understanding how liblouisxml handles documents and for
making changes if the way it does so is not correct.

After another blank line you will see a table consisting of two, and
sometimes three, columns. The first column contains a word which tells
the computer to do something. For example, the first entry in the
table is: @samp{include nemeth.sem}. This tells liblouisxml to include
the information in the @file{nemeth.sem} file when it is deciphering
an html (actually xhtml) document (it may be preferable to use the
semanticFiles setting in the configuration file rather than an
include).

The second row of the table is:

@example
no hr 
@end example

@samp{hr} is an element with the angle brackets removed. It means
nothing in itself. However, the first column contains the word
@samp{no}. This tells liblouisxml "no do", that is, do nothing.

After a few more lines with @samp{no} in the first column, we see one
that says:

@example
softreturn br 
@end example

This means that when the element @code{<br>} is encountered,
liblouisxml is to do a soft return, that is, start a new line without
starting a new paragraph.

The next line says:

@example
heading1 h1
@end example

This tells liblouisxml that when it encounters the element @code{<h1>}
it is to format the text which follows as a first-level braille
heading, that is, the text will be centered and proceeded and followed
by blank lines. (You can change this by changing the definition of the
heading1 style).

The next line says:

@example
italicx em
@end example

This tells liblouisxml that when it encounters the element @code{<em>}
it is to enclose the text which follows in braille italic indicators.
The @samp{x} at the end of the semantic action name is there to
prevent conflicts with names elsewhere in the software. Just where the
italic indicators will be placed is controlled by the liblouis
translation table in use.

The next line says:

@example
skip style
@end example

This tells liblouis to simply skip ahead until it encounters the
element @code{</style>}. Nothing in between will have any effect on
the braille output. Note the slash (@samp{/}) before the @samp{style}.
This means the end of whatever the @code{<style>} element was
referring to. Actually, it was referring to specifications of how
things should be printed. If liblouisxml had not been told to skip
these specifications, the braille output would have contained a lot of
gobledygook.

The next line says:

@example
italicx strong
@end example

This tells liblouis to also use the italic braille indicators for the
text between the @code{<strong>} and @code{</strong>} elements.

After a few more lines with @samp{no} in the first column we come to
the line:

@example
document html 
@end example

This tells liblouisxml that everything between @code{<html>} and
@code{</html>} is an entire document. @code{<html>} was the root
element of this document, so this is logical.

After another @samp{no} line we come to:

@example
para p
@end example

liblouisxml will consider everything between @code{<p>} and
@code{</p>} to be a normal body text paragraph.

The next line is:

@example
heading1 title
@end example

this causes the title of the document to also be treated as a braille
level 1 heading.

Next we have the line:

@example
list li
@end example

The xhtml @code{<li>} and @code{</li>} pair of elements is used to
enclose an item in a list. liblouisxml will format this with its own
list style. That is, the first line will begin at the left margin and
subsequent lines will be indented two cells.

Next we have:

@example
table table
@end example

You will note that the names of actions and elements are often
identical. This is because they are both mnemonic. In any case, this
line tells liblouisxml to format the table contained in the xhtml
document according to the table formatting rules it has been given for
braille output.

Next we have the line:

@example
heading2 h2
@end example

This means that the text between @code{<h2>} and @code{</h2>} is to be
formatted according to the Liblouisxml style heading2. A blank line
will be left before the heading and the first line will be indented
four spaces.

After a few more lines we come to:

@example
no table,cellpadding
@end example

Note the comma in the second column. This divides the column into two
subcolumns. The first is the table element name. The second is called
an "attribute" in xml. It gives further instructions about the
material enclosed between the starting and ending "tags" of the
element (@code{<table>} and @code{</table>}. Full information requires
three subcolumns. The third is called the value and gives the actual
information. The attribute is merely the name of the information.

Much further down we find:

@example
no table,border,0
@end example

Here the element is table, the attribute is border and the value is 0.
If liblouisxml were to interpret this, it would mean that the table
was to have a border of 0 width. It is not told to do so because
tables in braille do not have borders.

Now let's look at the file which is included at the beginning of the
@file{html.sem} file. This is @file{nemeth.sem}. As with
@file{html.sem}, a copy is included in the documentation directory
with the name @file{example_nemeth.sem} , but it is not necessarily
the one that liblouisxml is currently using. It illustrates several
more things about how liblouisxml uses semantic-action files.

The first thing you will notice is that for quite a few lines the
first and second columns are identical. This is because the MathML
element and attribute names are part of a standard, and it was
simplest to use the element names for the semantic actions as well.

The first line of real interest is:

@example
math math
@end example

Every mathematical expression begins with the element @code{<math>}
(which may have attributes and values), and ends with @code{</math>}.
This is therefore the root element of a mathematical expression.
However, mathematical expressions are usually part of a document, so
it is not given the semantic action document. The math semantic action
causes liblouisxml to carry out special interpretation actions. These
will become clearer as we continue to look at the @file{nemeth.sem}
file. You will note that this line has three columns. The meaning of
the third column is discussed below.

After another uninteresting line we come to two that illustrate
several more facts about semantic-action files:

@example
mfrac mfrac ^?,/,^#
mfrac mfrac,linethickness,0 ^(,^;%,^)
@end example

Like the math entry above, the first line has three columns. While the
first two columns must always be present, the third column is
optional. Here, it is also divided into subcolumns by commas. The
element @code{<mfrac>} indicates a fraction. A fraction has two parts,
a numerator and a denominator. In xml, we call these parts children of
@code{<mfrac>}. They may be represented in various ways, which need
not concern us here. What is of real importance is that the third
column tells liblouisxml to put the characters @samp{~?} before the
numerator, @samp{/} between the numerator and denominator, and
@samp{~#} after the denominator. Later on, liblouis will translate
these characters into the proper representation of a fraction in the
Nemeth Code of Braille Mathematics. (For other mathematical codes,
@pxref{Implementing Braille Mathematics Codes}).

The second line is of even greater interest. The first column is again
@samp{mfrac}, but this line is for binomial coefficient. The second
column contains three subcolumns, an element name, an attribute name
and an attribute value. The attribute linethickness specifies the
thickness of the line separating the numerator and denominator. Here
it is 0, so there is no line. This is how the binomial coefficient is
represented in print. The third column tells how to represent it in
braille. liblouisxml will supply @samp{~(}, upper number, @samp{~%},
lower number, @samp{~)} to liblouis, which will then produce the
proper braille representation for the binomial coefficient.

Returning to the line for the math element, we see that the third
column begins with a backslash followed by an asterisk. The backslash
is an escape character which gives a special meaning to the character
which follows it. Here the asterisk means that what follows is to be
placed at the very end of the mathematical expression, no matter how
complex it is.

For further discussion of how the third column is used
@pxref{Implementing Braille Mathematics Codes}. The third column is
not limited to mathematics. It can be used to add characters to
anything enclosed by an xml tag.

Here is a complete list of the semantic actions which liblouisxml
recognizes. Many of them are also the names of styles. These are
listed first, preceded by an asterisk. For a discussion of these,
@pxref{Customization Configuring liblouisxml}.

@table @code

@item * arith
@item * attribution
@item * biblio
@item * blanklinebefore
@item * caption
@item * code
@item * contents
@item * dedication
@item * directions
@item * dispmath
@item * disptext
@item * document
@item * exercise1
@item * exercise2
@item * exercise3
@item * glossary
@item * graph
@item * graphlabel
@item * heading1
@item * heading2
@item * heading3
@item * heading4
@item * indexx
@item * list
@item * matrix
@item * music
@item * note
@item * para
@item * quotation
@item * section
@item * spatial
@item * stanza
@item * style1
@item * style2
@item * style3
@item * style4
@item * style5
@item * subsection
@item * table
@item * titlepage
@item * trnote
@item * volume
@item acknowledge
@item allcaps
@item author
@item blankline
@item bodymatter
@item boldx
@item booktitle
@item boxline
@item cdata
@item center
@item chemistry
@item contracted
@item copyright
@item endnotes
@item footer
@item frontmatter
@item graphic
@item italicx
@item jacket
@item line
@item linkto
@item maction
@item maligngroup
@item malignmark
@item math
@item menclose
@item merror
@item mfenced
@item mfrac
@item mglyph
@item mi
@item mlabeledtr
@item mmultiscripts
@item mn
@item mo
@item mover
@item mpadded
@item mphantom
@item mprescripts
@item mroot
@item mrow
@item ms
@item mspace
@item msqrt
@item mstyle
@item msub
@item msubsup
@item msup
@item mtd
@item mtext
@item mtr
@item munder
@item munderover
@item newpage
@item no
@item noindent
@item none
@item preface
@item rearmatter
@item rightalign
@item righthandpage
@item runninghead
@item semantics
@item skip
@item softreturn
@item specsym
@item tblbody
@item tblcol
@item tblhead
@item tblrow
@item tnpage
@item transcriber
@item uncontracted

@end table

@node Implementing Braille Mathematics Codes, Settings Index, Connecting with 
the xml Document - Semantic-Action Files, Top
@chapter Implementing Braille Mathematics Codes

The Nemeth Code of Braille Mathematical and Science Notation has been
implemented. Other braille mathematics codes can be implemented by
following the same pattern. The Nemeth Code implementation is
discussed as an example below.

Four tables are used to translate xml documents containing a mixture
of text and mathematics into the Nemeth code. They can be found in the
subdirectory @file{lbx_files} of the liblouisxml directory. First, the
semantic-action file @file{nemeth.sem} is used to interpret the
mathematical portions of the xml document (The text portions are
interpreted by another semantic-action file which will not be
discussed here). After the math and text have been interpreted, two
liblouis tables, @file{nemeth.ctb} and @file{en-mathtext.ctb} are used
to translate them. Each piece of mathematics or text is translated
separately and the pieces are strung together with blanks between
them. This results in inaccuracies where mathematics meets text. The
fourth table, also a liblouis table, is used to remove these
inaccuracies. It is called @file{edittable.ctb}, and it does things
like removing the multi-purpose indicator before a blank, inserting
the punctuation indicator before a punctuation mark following a math
expression, and removing extra spaces.

The general format and use of semantic-action files were discussed in
the previous section, (@pxref{Connecting with the xml Document -
Semantic-Action Files}). In this section we shall concentrate on the
optional third column, which is used a lot in @file{nemeth.sem}. While
the first two columns can be generated by liblouisxml but must be
edited by a person, the third column must always be provided by a
human.

As previously stated, the third column tells liblouisxml what
characters to insert to inform liblouis how to translate the math
expression. Look at the following line:

@example
mfrac mfrac ^?,/,^#
@end example

You will see that the third column contains two commas. This means
that it has three subcolumns. A fraction has a numerator and a
denominator. These are called children of the mfrac element. The first
subcolumn specifies the characters that liblouisxml should place in
front of the numerator. The second subcolumn gives the characters to
be placed between the numerator and denominator. Finally, the third
subcolumn gives the characters to place after the denominator. You
will see that the first subcolumn contains a caret followed by a
question mark. The dot pattern for the question mark in computer
braille is the same as for the Nemeth start-fraction indicator. The
caret is used so that liblouis can tell this apart from a question
mark, which also has the same dot pattern in computer braille. The
second subcolumn contains a slash but no caret. This is because there
is no danger of confusion where the slash is concerned. The third
subcolumn does contain a caret, and it also contains a number sign,
which corresponds to the Nemeth end-fraction indicator. When
liblouisxml encounters the MathML representation of the fraction
one-half it produces the following string of characters:
@samp{^?1/2^#}. liblouis then removes the carets to get @samp{?1/2#}.

As another example, consider the entry in @file{nemeth.sem} for a
subscript.

@example
msub msub ,^;,^"
@end example

Here the first subcolumn is blank, because nothing is to be placed
before the subscripted symbol. The second subcolumn contains a caret
and a semicolon (in computer braille). This corresponds to the Nemeth
subscript indicator. The third column contains a caret and a quotation
mark, corresponding to the Nemeth baseline indicator. liblouisxml
translates the MathML expression for x superscript i into
@samp{x^;i^}. liblouis subsequently produces @samp{x;i}. There are
other steps if the subscript is numeric. These are handled by pass2
opcodes in the liblouis translation table, @file{nemeth.ctb}.

You will notice that the entries in @file{nemeth.sem} have various
numbers of subcolumns in the third column. In general, the characters
given in the first subcolumn are placed before the first child of the
element given in the second column. The characters in the second
subcolumn are placed before the second child, and so on, until the
characters given in the last subcolumn are placed after the last
child.

Sometimes an element or tag can have an indeterminate number of
children. This is true of @code{<math>} itself. Yet, it may be
necessary to place some characters after the very last element. Let us
look at the @code{<math>} entry.

@example
math math \eb,\*\ee
@end example

First let us discuss escape sequences starting with a backslash. These
are basically the same as in liblouis. The sequence @samp{\e} is
shorthand for the escape character, which would otherwise be
represented by @samp{\x001b}. The beginning of a math expression is
denoted by an escape character followed by the letter b and the end by
an escape character followed by the letter @samp{e}. This enables the
editing table to do such things as drop the baseline indicator at the
end of a math expression and insert a number sign at the beginning, if
needed.

Not found in liblouis is the sequence @samp{\*}. This means to put
what follows after the very last child of the math element, no matter
how many there are.

As another example consider:

@example
mtd mtd \*\ec
@end example

@code{mtd} is the MathML tag for a table column. There may be many
children of this tag. The entry says to put an escape character (hex
1b), plus the letter @samp{c}, after the very last of them.

As a final example consider:

@example
mtr mtr ^.^\,^(,\*^.^\,^)\er
@end example

@code{mtr} is the MathML tag for a row in a table, in this case a
matrix. Each row in a matrix must begin with the dot pattern
@samp{46-6-12356} and end with the dot pattern @samp{46-6-12456}. As
usual a caret is placed before the corresponding characters. Since dot
6 is a comma, it must be escaped. This is done by placing a backslash
before the comma. There are two subcolumns. the first contains the
characters to be placed at the beginning of each row. The second
starts with @samp{\*}, signifying that the characters following it
are to be placed at the end of everything in this row. A subcolumn
starting with @samp{\*} must be the last (or only) subcolumn.

Here this last subcolumn ends with an escape character and the letter
@key{r}, signifying the end of a row.

So much for the semantic action file. Even though the characters in
the third column were chosen to correspond with nemeth characters,
they may not have to be changed for other math codes. liblouis can
replace them with anything needed.

This brings us to a consideration of the two tables used by liblouis
to translate mathematics texts. The first, @file{en-mathtext.ctb} is
used to translate text appearing outside math expressions. It is
necessary because the Nemeth code requires modifications of Grade 2
braille. Other math codes may not have this requirement.

The table actually used to translate mathematics is @file{nemeth.ctb}.
It includes two other tables, @file{chardfs.cti} and
@file{nemethdefs.cti}. The first gives ordinary character definitions
and is included by all the other tables. Note however, that the
unbreakable space, @samp{\x00a0}, is translated by dot 9. This is used
before and after the equal sign and other symbols in
@file{nemeth.ctb}. The second table contains character definitions for
special math symbols, most of which are Unicode characters greater
than @samp{\x00ff}. The Greek letters are here. So are symbols like
the integral sign.

Most of the entries in @file{nemeth.ctb} should be familiar from other
tables. The unfamiliar ones follow the comments @samp{# Semantic
pairs} and @samp{# pass2 corrections}. The first simply replace
characters preceded by a caret with the character itself. The second
make adjustments in the code generated directly from the
@file{nemeth.sem} file. The pass2 opcode is discussed in the liblouis
guide (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's
and User's Guide}). Here are some comments on a few of the entries in
@file{nemeth.ctb}.

@example 
pass2 @@1456-1456 @@6-1456 
@end example 

Replaces double start-fraction indicators with the start complex
fraction indicator.

@example 
pass2 @@3456-3456 @@6-3456 
@end example 

Replaces double end-fraction indicators with the end-complex-fraction
indicator.

@example 
pass2 @@56[$d1-5]@@5 * 
@end example 

Removes the subscript and baseline indicators from numeric subscripts.

@example 
pass2 @@5-9 @@9 
@end example 

Removes the baseline or multipurpose indicator before an unbreakable
space generated by the translation of an equal sign, etc.

@example 
pass2 @@45-3-5 @@3 
@end example 

Replaces a superscript apostrophe with a simple prime symbol.

@example 
pass2 @@9[]$d @@3456 
@end example 

Puts a number sign before a digit preceded by a blank.

@example 
pass2 @@9-0 @@9 
@end example 

Removes a space following an unbreakable space.

We now come to the fourth and last table used for math translation,
the editing table, @file{edittable.ctb}. As explained at the
beginning, this table is used to remove inaccuracies where math
translation butts up against text translation. For example, the Nemeth
code puts numbers in the lower part of the cell. However, punctuation
marks are also in the lower part of the cell. So Nemeth puts a
punctuation indicator, dots @samp{456}, in front of any lower-cell
punctuation that immediately follows a mathematical expression. If
this occurs inside Mathml it is handled by @file{nemeth.ctb}. However,
a MathML expression is often followed by a punctuation mark which is
the first part of text. liblouisxml puts a blank between math and
text, but this can result in a mathematical expression followed by a
blank and then, say, a period, dots @samp{256}. @file{edittable.ctb}
replaces the blank with the punctuation indicator.

When you look at @file{edittable.ctb} you will see that it begins with
an include of @file{chardefs.cti}. Most of the entries are ordinary,
but some are interesting. for example,

@example
always "\s 0 
@end example

replaces the baseline or multipurpose indicator followed by a space
with just a space.

@node  Settings Index, Function Index, Implementing Braille Mathematics Codes, 
Top
@unnumbered Settings Index

@printindex tp

@node  Function Index,  , Settings Index, Top
@unnumbered Function Index

@printindex fn

@bye



docdir = $(datadir)/liblouisxml/doc

doc_DATA = \
        copyright-notice \
        example_canonical.cfg \
        example_default.cfg \
        example_html.sem \
        example_math.sem \
        liblouisxml-guide.html \
        liblouisxml-guide.txt

EXTRA_DIST = \
        copyright-notice \
        example_canonical.cfg \
        example_default.cfg \
        example_html.sem \
        example_math.sem \
        liblouisxml-guide.html \
        liblouisxml-guide.txt

info_TEXINFOS = liblouisxml-guide.texi

SUFFIXES                = .txt

.texi.txt:
        $(MAKEINFO) --plaintext $< -o $@
2008-11-13  Christian Egli  <christian.egli@xxxxxxxx>

        * doc/liblouisxml-guide.texi: Added a texinfo version of the
        liblouis guide.

        * doc/Makefile.am: Integrated texinfo version of liblouis guide in
        automake process.

John J. Boyer, john.boyer@xxxxxxxxxxxxxxxx

Release liblouisxml-1.4.2, June 16, 2008

Converted to Gnu autotools

A few bugs fixed

Release liblouisxml-1.4.1

linux-Makefile optimized for RedHat

If there are errors in a liblouis table you will now get only a single 
arror report instead of myriads.

Release liblouisxml-1.4.0, May 12, 2008

doc directory renamed docs . Semantic-action and configuration files in 
this directory have been prefixed with example_ 

mode  parameter added to functions in liblouisxml.h

Cdata sections can now be translated as text, computer code or ignored.

A function to find the true namo of the "Program Files" directory in 
Windows has been written by Yuemei Sun of ViewPlus Technologies. It is 
in the paths.c module.

Paths for translation tables, semantic action files and configuration 
files can now be assigned more flexibly. See paths.c

xml2brl will now accept configuration settings on the command line, so 
it is unnecessary to change a configuration file.

It is now possible to pass a configuration string in memory to the 
library, not just the name of a configuration file.

For details on all these changes see docs/liblouisxml-guide.html

Other related posts: