[liblouis-liblouisxml] [liblouisxml commit] r7 - trunk/doc

  • From: codesite-noreply@xxxxxxxxxx
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Tue, 23 Dec 2008 13:31:37 +0000

Author: christian.egli@xxxxxxxxxxxxxx
Date: Tue Dec 23 05:25:15 2008
New Revision: 7

Added:
   trunk/doc/liblouisxml-guide.texi
Modified:
   trunk/doc/Makefile.am

Log:
Added the texinfo version of the liblouisxml-guide.


Modified: trunk/doc/Makefile.am
==============================================================================
--- trunk/doc/Makefile.am       (original)
+++ trunk/doc/Makefile.am       Tue Dec 23 05:25:15 2008
@@ -18,3 +18,9 @@
        liblouisxml-guide.html \
        liblouisxml-guide.txt

+info_TEXINFOS = liblouisxml-guide.texi
+
+SUFFIXES                = .txt
+
+.texi.txt:
+       $(MAKEINFO) --plaintext $< -o $@

Added: trunk/doc/liblouisxml-guide.texi
==============================================================================
--- (empty file)
+++ trunk/doc/liblouisxml-guide.texi    Tue Dec 23 05:25:15 2008
@@ -0,0 +1,2069 @@
+\input texinfo
+@c %**start of header
+@setfilename liblouisxml-guide.info
+@include version.texi
+@settitle Liblouisxml Programmer's and User's Guide
+
+@dircategory Misc
+@direntry
+* Liblouisxml: (liblouisxml). An xml to Braille Translation Library.
+@end direntry
+
+@c Version and Contact Info
+@set MAINTAINERSITE @uref{http://www.jjb-software.com/liblouisxml-guide.html,maintainers webpage}
+@set AUTHOR John J. Boyer
+@set MAINTAINER John J. Boyer
+@set MAINTAINEREMAIL @email{john.boyer@xxxxxxxxxxxxxxxx}
+@set MAINTAINERCONTACT @uref{mailto:john.boyer@xxxxxxxxxxxxxxxx,contact the maintainer}
+@c %**end of header
+@finalout
+
+@c Macro definitions
+
+@c Opcode.
+@macro setting{name, args}
+@tindex \name\
+@item \name\ \args\
+@end macro
+
+@copying
+This manual is for liblouisxml (version @value{VERSION},
+@value{UPDATED}), an xml to Braille Translation Library.
+
+This file may contain code borrowed from the Linux screenreader
+@acronym{BRLTTY}, Copyright @copyright{} 1999-2006 by the
+@acronym{BRLTTY} Team.
+
+@noindent
+Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc.
+@uref{www.viewplus.com} and Copyright @copyright{} 2007,2008 JJB
+Software, Inc. @uref{www.jjb-software.com}.
+
+@quotation
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU Lesser (or library) General Public License
+(LGPL) as published by the Free Software Foundation; either version 3,
+or (at your option) any later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+Lesser (or Library) General Public License LGPL for more details.
+
+You should have received a copy of the GNU Lesser (or Library) General
+Public License (LGPL) along with this program; see the file COPYING.
+If not, write to the Free Software Foundation, 51 Franklin Street,
+Fifth Floor, Boston, MA 02110-1301, USA.
+@end quotation
+@end copying
+
+@titlepage
+@title  Liblouisxml Programmer's and User's Guide
+
+@subtitle Release @value{VERSION}
+@author by John J. Boyer
+
+@c The following two commands start the copyright page.
+@page
+@vskip 0pt plus 1filll
+@insertcopying
+@end titlepage
+
+@c Output the table of contents at the beginning.
+@contents
+
+@ifnottex
+@node Top, Introduction, (dir), (dir)
+@top Liblouis Programmer's and User's Guide
+
+@insertcopying
+@end ifnottex
+
+@menu
+* Introduction::
+* Programming with liblouisxml::
+* Transcribing with the xml2brl program::
+* Customization Configuring liblouisxml::
+* Connecting with the xml Document - Semantic-Action Files::
+* Implementing Braille Mathematics Codes::
+* Settings Index::
+* Function Index::
+* Program Index::
+
+@detailmenu
+ --- The Detailed Node Listing ---
+
+Programming with liblouisxml
+
+* License::
+* Overview::
+* Files and Paths::
+* lbx_version::
+* lbx_initialize::
+* lbx_translateString::
+* lbx_translateFile::
+* lbx_translateTextFile::
+* lbx_backTranslateFile::
+* lbx_free::
+
+Transcribing with the xml2brl program
+
+* Transcribing Microsoft Word Files with msword2brl::
+
+Customization: Configuring liblouisxml
+
+* outputFormat::
+* translation::
+* xml::
+* style::
+
+@end detailmenu
+@end menu
+
+@node Introduction, Programming with liblouisxml, Top, Top
+@chapter Introduction
+
+liblouisxml is a software component which can be incorporated into
+software packages to provide the capability of translating any file in
+the computer lingua franca xml format into properly transcribed
+braille. This includes translation into grade two, if desired,
+mathematical codes, etc. It also includes formatting according to a
+built-in style sheet which can be modified by the user. The first
+program into which liblouisxml has been incorporated is
+@command{xml2brl}. This program will translate an xml or text file
+into an embosser-ready braille file. It is not necessary to know xml,
+because MSWord and other word processors can export files in this
+format. If the word processor has been used correctly
+@command{xml2brl} will produce an excellent braille file.
+
+There is a Mac GUI application incorporating liblouisxml called louis.
+For a link to it go to @uref{www.jjb-software.com/downloads}. A
+similar Windows application is in the works.
+
+Computer programmers who wish to use liblouisxml in their software can
+find the information they need in the section Programming with
+liblouisxml (@pxref{Programming with liblouisxml}). Those who wish to
+change the output generated by liblouisxml should read the section
+Configuring liblouisxml (@pxref{Customization Configuring
+liblouisxml}). If you encounter a type of xml file with which liblouis
+is not familiar you can learn how to tell it how to process that file
+by reading Connecting with the xml document: Semantic-Action Files
+(@pxref{Connecting with the xml Document - Semantic-Action Files}).
+Finally, if you wish to implement a new braille mathematics code read
+Implementing Braille Mathematics Codes (@pxref{Implementing Braille
+Mathematics Codes}).
+
+You will also find it advantageous to be acquainted with the companion
+library liblouis, which is a braille translator and back-translator
+(@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and
+User's Guide}).
+
+@node Programming with liblouisxml, Transcribing with the xml2brl program, Introduction, Top
+@chapter Programming with liblouisxml
+
+@menu
+* License::
+* Overview::
+* Files and Paths::
+* lbx_version::
+* lbx_initialize::
+* lbx_translateString::
+* lbx_translateFile::
+* lbx_translateTextFile::
+* lbx_backTranslateFile::
+* lbx_free::
+@end menu
+
+@node License, Overview, Programming with liblouisxml, Programming with liblouisxml
+@section License
+
+Liblouisxml may contain code borrowed from the Linux screenreader
+BRLTTY, Copyright @copyright{} 1999-2006 by the BRLTTY Team.
+
+@noindent
+Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc.
+@uref{www.viewplus.com}.
+
+@noindent
+Copyright @copyright{} 2007,2008 JJB Software, Inc.
+@uref{www.jjb-software.com}.
+
+Liblouisxml is free software: you can redistribute it and/or modify it
+under the terms of the GNU Lesser General Public License as published
+by the Free Software Foundation, either version 3 of the License, or
+(at your option) any later version.
+
+Liblouisxml is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with Liblouis. If not, see
+@uref{http://www.gnu.org/licenses/}.
+
+@node Overview, Files and Paths, License, Programming with liblouisxml
+@section Overview
+
+liblouisxml is an "extensible renderer," designed to translate a wide
+variety of xml documents into braille, but with a special emphasis on
+technical material. The overall operation of liblouisxml is controlled
+by a configuration file. The way in which a particular type of xml
+document is to be rendered is specified by a semantic-action file for
+that document type. Braille translation is done by the liblouis
+braille translation and back-translation library (@pxref{Top, ,
+Overview, liblouis-guide, Liblouis Programmer's and User's Guide}).
+Its operation, in turn is controlled by translation table files. All
+these files are plain text and can be created and edited in any text
+editor. Configuration settings can also be specified on the command
+line of the console-mode transcription program @command{xml2brl}.
+
+The general operation of liblouisxml is as follows. It uses the
+libxml2 library to construct a parse tree of the xml document. After
+the parse tree is constructed, a function called
+@code{examine_document} looks it over and determines whether math
+translation tables, etc. are needed. @code{examine_document} also
+constructs a prototype semantic-action file, if one does not exist
+already. When it is finished, another function, called
+@code{transcribe_document}, does the actual braille transcription. It
+calls @code{transcribe_math} to handle MathML subtrees,
+@code{transcribe_chemistry} for chemical formula subtrees,
+@code{transcribe_graphic} for SVG graphics, etc. Entities are
+translated to Unicode, if they are not already. Sequences of symbols
+indicate superscripts, return to the baseline, subscripts, start and
+end of fractions, etc. The Braille translator and back-translator
+library liblouis is used to do the braille translation.
+
+The @code{transcribe_math} function works in conjunction with the
+latest version of liblouis and a special math translation table to
+transcribe most mathematical expressions into fairly good Nemeth Code.
+Much refinement is still necessary. Other braille mathematical codes
+can be handled by modifying the translation table.
+
+The functions which are not needed at the moment, such as
+@code{transcribe_chemistry}, are only skeletons. However, I hope that
+@code{transcribe_graphics} can be expanded in the near future to use
+the graphics capability of the Tiger tactile graphics embossers.
+
+The latest versions of liblouisxml and liblouis can be downloaded from
+@uref{www.jjb-software.com}. Note that liblouisxml will only work with
+the latest version of liblouis.
+
+liblouisxml can be compiled to use either 16-bit or 32-bit Unicode
+internally. This is inherited from liblouis, so liblouis must be
+compiled first and then liblouisxml. Wherever 16 bits are mentioned in
+this document, read 32 if you have compiled the library for 32 bits.
+
+@node Files and Paths, lbx_version, Overview, Programming with liblouisxml
+@section Files and Paths
+
+As stated in the previous section, liblouisxml uses three kinds of
+files, configuration files, semantic-action files, and liblouis
+translation tables. The first two are discussed later in this
+documentation. liblouis translation tables are discussed in the
+liblouis guide (@pxref{Top, , Overview, liblouis-guide, Liblouis
+Programmer's and User's Guide}) which is distributed with liblouis.
+These files can be placed on various paths, which are determined at
+compile time. One of these paths should be to the @file{lbx_files}
+directory provided by liblouisxml, which contains the principal
+configuration file (@file{canonical.cfg}) and the semantic-action
+files. Another should be to the tables directory in the liblouis
+distribution. Note that liblouisxml also generates some files, all of
+which are placed on the current directory. These files are new
+prototype semantic-action files, additions to old semantic-action
+files, temporary files, and log files. The first two can be used to
+extend the capability of liblouisxml to process xml documents. The
+latter two are useful for debugging.
+
+Paths are set by changing a few lines of code in the @file{paths.c}
+module. If you are preparing liblouisxml for Windows a function which
+finds the name of the "Program Files" directory for your locale is
+called automatically. You can then modify the line containing the term
+@samp{yourSubDir} as needed.
+
+If you are preparing liblouisxml for a Unix-type system look for the
+line that says @samp{Set Unix Paths}. The following three lines
+establish a path to the @file{lbx_files} directory in your home
+directory. As distributed, this directory contains the semantic-action
+files and some configuration files. You can chose to copy the tables
+from the liblouis distribution into it as well, or you can modify the
+following three lines to point to the actual location of the tables.
+You can also chose to place both the @file{lbx_files} and the tables
+directory in @file{/etc}.
+
+The function @code{addPath} takes care of adding path to liblouisxml
+properly. You can specify many more than two paths.
+
+@node lbx_version, lbx_initialize, Files and Paths, Programming with liblouisxml
+@section lbx_version
+
+@findex lbx_version
+@example
+char *lbx_version (void)
+@end example
+
+This function returns a pointer to a character string containing the
+version of liblouisxml, plus other information such as the release
+date and perhaps notable changes.
+
+@node lbx_initialize, lbx_translateString, lbx_version, Programming with liblouisxml
+@section lbx_initialize
+
+@findex lbx_initialize
+@example
+void * lbx_initialize (
+     const char *const configFilelist,
+     const char const *logFileName,
+     const char *const settingsString)
+@end example
+
+This function initializes the libxml2 library, runs
+@file{canonical.cfg} and processes configuration settings given in
+@code{configSettings} and the configuration files given in
+@code{configFilelist}. This is a list of configuration file names
+separated by commas. If the first character is a comma it is taken to
+be a string containing configuration settings and is processed like
+the @code{configSettings} string. Such a string must conform to the
+format of a configuration file. Newlines should be represented with
+ASCII 10. If @code{logfilename} is not @code{null}, a log file is
+produced on the current directory. If it is @code{null} any messages
+are printed on stderr. The function returns a pointer to the
+@code{UserData} structure. This pointer is @code{void} and must be
+cast to @code{(UserData *)} in the calling program. To access the
+information in this structure you must include @file{louisxml.h}. This
+function is used by @command{xml2brl}.
+
+@node lbx_translateString, lbx_translateFile, lbx_initialize, Programming with liblouisxml
+@section lbx_translateString
+
+@findex lbx_translateString
+@example
+int lbx_translateString (
+    const char *const configfilelist,
+    char * inbuf,
+    widechar *outbuf,
+    int *outlen,
+    unsigned int mode)
+@end example
+
+This function takes a well-formed xml expression in @code{inbuf} and
+translates it into a string of 16-bit (or 32-bit if this has been
+specified in liblouis) braille characters in @code{outbuf}. The xml
+expression must be immediately followed by a zero or null byte.
+Leading whitespace is ignored. If it does not then begin with the
+characters @samp{<?xml} an xml header is added. If it does not begin
+with @samp{<} it is assumed to be a text string and is translated
+accordingly. The header is specified by the xmlHeader line in the
+configuration file. If no such line is present, a default header
+specifying UTF-8 encoding is used. The @code{mode} parameter specifies
+whether you want the library to be initialized. If it is 0 everything
+is reset, the @file{canonical.cfg} file is processed and the
+configuration file and/or string (see previous section) are processed.
+If @code{mode} is 1 liblouisxml simply prepares to handle a new document. For
+more on the @code{mode} parameter see the next section.
+
+Which 16-bit character in @code{outbuf} represents which dot pattern
+is indicated in the liblouis translation tables. The
+@code{configfilelist} parameter points to a configuration file or
+string. Among other things, this file specifies translation tables. It
+is these tables which control just how the translation is made,
+whether in Grade 2, Grade 1, the Nemeth Code of Braille Mathematics or
+something else.
+
+Note that the @code{*outlen} parameter is a pointer to an integer.
+When the function is called, this integer contains the maximum output
+length. When it returns, it is set to the actual length used. The
+function returns 1 if no errors were encountered and a negative number
+if a complete translation could not be done.
+
+@node lbx_translateFile, lbx_translateTextFile, lbx_translateString, Programming with liblouisxml
+@section lbx_translateFile
+
+@findex lbx_translateFile
+@example
+int lbx_translateFile (
+    char *configfilelist,
+    char *inputFileName,
+    char *outputFileName,
+    unsigned int mode)
+@end example
+
+This function accepts a well-formed xml document in
+@code{inputFilename} and produces a braille translation in
+@code{outputFilename}. As for @code{lbx_translateString}, the
+@code{mode} parameter specifies whether the library is to be
+initialized with new configuration information or simply prepared to
+handle a new document. In addition, the @code{mode} parameter can
+specify that a document is in html, not xhtml. @file{liblouisxml.h}
+contains an enumeration type with the values @code{dontInit} and
+@code{htmlDoc}. These can be combined with an or (@samp{|}) operator. The
+input file is assumed to be encoded in UTF-8, unless otherwise
+specified in the xml header. The encoding of the output file may be
+UTF-8, UTF-16, UTF-32 or Ascii-8. This is specified by the
+@code{outputEncoding} line in the configuration file,
+@code{configfilelist}. The function returns 1 if the translation was
+successful.
+
+@node lbx_translateTextFile, lbx_backTranslateFile, lbx_translateFile, Programming with liblouisxml
+@section lbx_translateTextFile
+
+@findex lbx_translateTextFile
+@example
+int lbx_translateTextFile (
+    char *configfilelist,
+    char *inputFileName,
+    char *outputFileName,
+    unsigned int mode)
+@end example
+
+This function accepts a text file in @code{inputFilename} and produces
+a braille translation in @code{outputFilename}. The input file is
+assumed to be encoded in Ascii8. Blank lines indicate the divisions
+between paragraphs. Two blank lines cause a blank line between
+paragraphs (or headers). The output file may be in UTF-8, UTF-16, or
+Ascii8, as specified by the @code{outputEncoding} line in the
+configuration file, @code{configfilelist}. As for
+@code{lbx_translateString}, the @code{mode} parameter specifies
+whether complete initialization is to be done or simply initialization
+for a new document.
+
+@node lbx_backTranslateFile, lbx_free, lbx_translateTextFile, Programming with liblouisxml
+@section lbx_backTranslateFile
+
+@findex lbx_backTranslateFile
+@example
+int lbx_backTranslateFile (
+    char *configfilelist,
+    char *inputFileName,
+    char *outputFileName,
+    unsigned int mode)
+@end example
+
+This function accepts a braille file in @code{inputFilename} and
+produces a back-translation in @code{outputFilename}. The input file
+is assumed to be encoded in Ascii8. The output file is in either plain
+text or html, according to the setting of @code{backFormat} in the
+configuration file. Html files are encoded in UTF8. In plain-text,
+blank lines are inserted between paragraphs. The output file may be in
+UTF-8, UTF-16, or Ascii8, as specified by the @code{outputEncoding}
+line in the configuration file, @code{configfilelist}. The mode
+parameter specifies whether or not the library is to be initialized
+with new configuration information, as described in the section on
+@code{lbx_translateString} (@pxref{lbx_translateString}).
+
+@node lbx_free,  , lbx_backTranslateFile, Programming with liblouisxml
+@section lbx_free
+
+@findex lbx_free
+@example
+void lbx_free (void)
+@end example
+
+This function should be called at the end of the application to free
+all memory allocated by liblouisxml and liblouis. If you wish to
+change configuration files during your application, use a @code{mode}
+parameter of 0 on the function call using the new configuration
+information.
+
+@node Transcribing with the xml2brl program, Customization Configuring liblouisxml, Programming with liblouisxml, Top
+@chapter Transcribing with the xml2brl program
+@pindex xml2brl
+
+At the moment, actual transcription with liblouisxml is done with the
+command-line (or console) program @command{xml2brl}. The line to type
+is:
+
+@example
+xml2brl [OPTIONS] [-f config-file] [infile] [outfile]
+@end example
+
+The brackets indicate that something is optional. You will see that
+nothing is required except the program name itself, @command{xml2brl}.
+The various optional parts control how the program will behave, as
+follows:
+
+@table @option
+
+@item -h
+This option causes @command{xml2brl} to print a help message
+describing usage and exit.
+
+@item -l
+This option will cause @command{xml2brl} and liblouisxml to print
+error messages to @file{xml2brl.log} instead of stderr. The file will
+be in the current directory. This option is particularly useful if
+@command{xml2brl} is called by a GUI script or Web application.
+
+@item -f configfile
+This specifies the configuration file which tells @command{xml2brl}
+how to do the transcription. (It may be a list of file names separated
+by commas.) This file specifies such things as the number of cells per
+line, the number of lines per page, The translation tables to be used,
+how paragraphs and headings are to be formatted, etc. If this part of
+the command line is omitted, @command{xml2brl} assumes that the
+configuration file is named @file{default.cfg} and is in the current
+directory. If the configuration file name contains a pathname
+@command{xml2brl} will consider this as a path on which to look for
+files that it needs (@pxref{Files and Paths}).
+
+@item -Csetting=value
+This option enables you to specify configuration settings on the
+command line instead of changing the configuration file. You can use
+as many @option{-C} options as you wish. Any settings can be specified
+except those having to do with styles. The settings may be in any
+order. They override any settings in @file{canonical.cfg} or in the
+configuration file used by @command{xml2brl}.
+
+@item -b
+back-translate. The input file must be a braille file, such as
+@file{.brf}. The output file is a back-translation of this file. It
+may be in either plain-text or xhtml (html), according to the setting
+of backFormat in the outputFormat section of the configuration file.
+Html files will contain page numbers and emphasis. To get good html,
+the liblouis table must have the entry @samp{space \e 1b} so that it
+will pass through escape characters. The @file{html.sem} file must
+also contain the line @samp{pagenum pagenum}. Text output files simply
+have a blank line between paragraphs. Encoding of text files is
+controlled by the outputEncoding setting. Html files are always in
+UTF-8.
+
+@item -r
+Reformat. The input file must be a braille file, such as @file{.brf}.
+The output is a braille file formatted according to the configuration
+file. It is advisable to set backFormat to html, since this will
+preserve print page numbers and emphasis. This program can be useful
+for changing the line length and page length of a braille file, for
+example, from 40 to 32 cells. It is also an excellent way to check the
+accuracy of liblouis tables. The original page numbers at the tops and
+bottoms of pages are discarded, and new ones are generated.
+
+@item -p
+Poorly formatted input translation. Infile is any text file such as may
+have been obtained by extracting the text in a pdf file. The input
+file may also be an xml or html file which is so poorly formatted that
+better braille can be obtained by ignoring the formatting.
+@command{xml2brl} tries to guess paragraph breaks. The output is
+generally reasonably formatted, that is, with reasonable paragraph
+breaks.
+
+@item -t
+The document is an h(t)ml file, not xhtml. This option is useful with
+files downloaded from the Web in source form. Without it, the program
+will first try to parse the file as an xml document, producing lots of
+error messages. It will then try the html parser. With this option, it
+goes directly to the html parser. See also the formatFor configuration
+(@pxref{formatFor setting}) file setting, which enables you to format
+the braille output for viewing in a browser.
+
+@item infile
+This is the name of the input file containing the material to be
+transcribed. The file may be either an xml file or a text file. The
+@option{-b}, @option{-r} and @option{-p} options discussed above
+provide for other types of files and processing. Typical xml files are
+those provided by @uref{www.bookshare.org} or those derived from a
+word processor by saving in xml format. If a text file is used
+paragraphs and headings should be separated by blank lines. In such a
+file there is no way to distinguish between paragraphs and headings,
+so they will all be formatted as paragraphs, as specified by the
+configuration file. However, if you want a blank line in the braille
+transcription use two consecutive blank lines in the text file.
+
+@item outfile
+This is the name of the output file. It will be transcribed as
+specified by the configuration file and the configuration settings.
+The following paragraphs provide more information on both the input
+and output files.
+
+@end table
+
+@command{xml2brl} is set up so that it can be used in a "pipe". To do
+this, omit both infile and outfile. Input is then taken from the
+standard input unit.
+
+The first file name encountered (a word not preceded by a minus sign)
+is taken to be the input file and the second to be the output file. If
+you wish input to be taken from stdin and still want to specify an
+output file use two minus signs (@samp{--}) for the input file.
+
+If only the program name is typed @command{xml2brl} assumes that the
+configuration file is @file{default.cfg}, input is from the standard
+input unit, and output is to the standard output unit.
+
+@menu
+* Transcribing Microsoft Word Files with msword2brl::
+@end menu
+
+@node Transcribing Microsoft Word Files with msword2brl, , Transcribing with the xml2brl program, Transcribing with the xml2brl program
+@section Transcribing Microsoft Word Files with msword2brl
+@pindex msword2brl
+
+@example
+msword2brl infile outfile
+@end example
+
+Infile must be a Microsoft Word file. The script first calls the
+@command{antiword} program, so you must have this installed on your
+machine. @command{antiword} is called with @option{-x db}, which
+causes the output to be in docbook format. This is piped to
+@command{xml2brl}. The output file from @command{xml2brl} contains
+much of the formatting, including emphasis, of the word file.
+
+@node Customization Configuring liblouisxml, Connecting with the xml Document - Semantic-Action Files, Transcribing with the xml2brl program, Top
+@chapter Customization: Configuring liblouisxml
+
+The operation of liblouisxml is controlled by two types of files:
+semantic-action files and configuration files. The former are
+discussed in the section Connecting with the xml Document -
+Semantic-action Files (@pxref{Connecting with the xml Document -
+Semantic-Action Files}). The latter are discussed in this section. A
+third type of file, braille translation tables, is discussed in the
+liblouis documentation (@pxref{Top, , Overview, liblouis-guide,
+Liblouis Programmer's and User's Guide}). Another section of the
+present document which may be of interest is Implementing Braille
+Mathematical Codes (@pxref{Implementing Braille Mathematics Codes}).
+
+liblouisxml (with liblouis) can be used as the braille transcription
+component in any number of applications with different overall
+purposes and user interfaces. However, as of now the principal
+application is @command{xml2brl}, which is a console application for
+Mac and Linux. (There is also a Mac GUI application called louis.) The
+information below therefore applies to @command{xml2brl} as much as to
+liblouisxml.
+
+Before discussing configuration files in detail it is worth noting
+that the application program has access to the information in the
+configuration files by calling the liblouisxml function
+@code{lbx_initialize}. This function returns a pointer to a data
+structure containing the configuration information.
+
+@command{xml2brl} uses the configuration file @file{default.cfg}
+unless a different one is specified via the @option{-f} command-line
+option. The configuration file name may include a full path. In this
+case, liblouisxml will consider this to be the user path. (This can be
+changed at compile time (@pxref{Files and Paths}). If just a file name
+(or list) is given, liblouisxml will consider the current directory as
+the user path.
+
+The configuration "file" specified with the @option{-f} option need
+not be a single filename. It can be several file names separated by
+commas. Only the first filename may have a path component. This path
+is taken as the user path, as discussed in the previous paragraph.
+This file-list feature is also found in liblouis. It enables you to
+combine configuration files on the command line. For example, a file
+list may consist of one file specifying the output format used in your
+establishment, a comma, and then the name of a stylesheet.
+
+After the path, if any, has been evaluated, but before reading any of
+the files, liblouisxml reads in a file called @file{canonical.cfg}.
+This file specifies values for all possible settings. It is needed to
+complete the initialization of the program. You may alter the values
+in the distribution @file{canonical.cfg}, but you should not delete
+any settings. If a configuration file read in later contains a
+particular setting name, the value specified simply replaces the one
+specified in @file{canonical.cfg}.
+
+As you will see by looking at @file{canonical.cfg}, it contains four
+main sections, outputFormat, translation, xml and styles. In addition,
+a configuration file can contain an include entry. This causes the
+file named on that line to be read in at the point where the line
+occurs. The sections need not follow each other in any particular
+order, nor is the order of settings within each section important. In
+this document and in the @file{canonical.cfg} file, where section and
+setting names consist of more than one word, the first letter of each
+word following the initial one is capitalized. This is merely for
+readability. The case of the letters in these names is ignored by the
+program. Section and setting names may not contain spaces.
+
+Here, then, is an explanation of each section and setting in the
+@file{canonical.cfg} file. When you look at this file you will see
+that the section names start at the left margin, while the settings
+are indented one tab stop. This is done for readability. it has no
+effect on the meaning of the lines. You will also see lines beginning
+with a number sign (@samp{#}), which are comments. Blank lines can
+also be used anywhere in a configuration file. In general, a section
+name is a single word or combination of unspaced words. However, each
+style has a section of its own, so the word @samp{style} is followed
+by the name of the style. Setting lines begin with the name of the
+setting, followed by at least one space or tab, followed by the value
+of the setting. A few settings have two values.
+
+@menu
+* outputFormat::
+* translation::
+* xml::
+* style::
+@end menu
+
+@node outputFormat, translation, Customization Configuring liblouisxml, Customization Configuring liblouisxml
+@section outputFormat
+
+This section specifies the format of the output file (or string, if no
+file name is given).
+
+@table @code
+
+@setting{cellsPerLine, 40}
+The number of cells in a braille line.
+
+@setting{LinesPerPage, 25}
+The number of lines on a braille page
+
+@setting{interpoint, no}
+Whether or not the output will be used to produce interpoint braille.
+This affects the placement of page numbers and may affect other things
+in the future. The only two values recognized are @samp{yes} and
+@samp{no}.
+
+@setting{lineEnd, \\r\\n}
+This specifies the control characters to be placed at the end of each
+output line. These characters vary from one intended use of the output
+to another. Most embossers require the carriage-return and line-feed
+combination specified above. However, a braille display may work best
+with just one or the other. Any valid control characters can be
+specified.
+
+@setting{pageEnd, \\f}
+The control Character to be given at the end of a page. Here it is a
+forms-feed character, but it can be something else if deeded.
+
+@setting{fileEnd, ^z}
+The control character to be placed at the end of the file, here a
+control-z.
+
+@setting{printPages, yes}
+Whether or not to show print page numbers if they are given in the xml
+input. The two valid values are @samp{yes} and @samp{no}.
+
+@setting{braillePages, yes}
+Whether or not to format the output into pages. Here the value is
+@samp{yes}, for use with an embosser. However the user of a braille
+display may wish to specify @samp{no}, so as not to be bothered with
+page numbers and forms feed characters. If no is specified the lines
+will still be of the length given in callsPerLine, but the value of
+linesPerPage will be ignored.
+
+@setting{paragraphs, yes}
+Whether or not to format the output into paragraphs, using appropriate
+styles. If @samp{no} is specified, what would be a paragraph is output
+simply as one long line. Applications that wish to do their own
+formatting may specify @samp{no}.
+
+@setting{BeginingPageNumber, 1}
+This is the number to be placed on the first Braille page if
+braillePages is yes. This is useful when producing multiple Braille
+volumes.
+
+@setting{printPageNumberAt, top}
+If print page numbers are given in the xml input file they will be
+placed at the top of each braille page in the right-hand corner. A
+page separator line will also be produced on the braille page where
+the print page break actually occurs. You may also specify
+@samp{bottom} for this setting.
+
+@setting{braillePageNumberAt, bottom}
+The braille page number will be placed in the bottom right-hand corner
+of each page. If interpoint yes has been specified only odd pages will
+receive page numbers. If you specify @samp{top} for this setting then
+@samp{bottom} must be specified for printPageNumberAt.
+
+@setting{hyphenate, no}
+If @samp{yes} is specified words will be hyphenated at the ends of
+lines if a hyphenation table is available. In contracted English
+Braille hyphenation is not generally used, but it can save
+considerable space. The hyphenation table is specified as part of the
+table list in the literaryTextTable setting of the translation
+section.
+
+@setting{outputEncoding, ascii8}
+This specifies that the output is to be in the form of 8-bit ASCII
+characters. This is generally used if the output is intended directly
+for a braille embosser or display. The other values of encoding are
+@samp{UTF8}, @samp{UTF16} and @samp{UTF32}. These are useful if the
+application will process the output further, such as for generating
+displays of braille dots on a screen.
+
+@setting{inputTextEncoding, ascii8}
+This setting is used to specify the encoding of an input text file.
+The valid values are @samp{UTF8} and @samp{ascii8}.
+
+@anchor{formatFor setting}
+@setting{formatFor, textDevice}
+This setting specifies the type of device the output is intended for.
+@samp{textDevice} is any device that accepts plain text, including
+embossers. You can also specify @samp{browser}. In this case the
+output will be formatted for viewing in a browser. If the input file
+contains links, they will be preserved and can be used in the normal
+way. The text will be translated into braille with the correct line
+length. Math and computer material will be translated appropriately.
+These files work well in lynx and Internet Explorer, not so well in
+elinks and Firefox.
+
+@setting{backFormat, plain}
+This setting specifies the format of back-translated files.
+@samp{Plain} specifies plain-text, while @samp{html} specifies xhtml.
+The latter is always encoded in UTF-8. Plain-text files can be encoded
+in ascii8, UTF-8 or UTF-16. Html is strongly recommended, since it
+will preserve print page numbering and emphasis.
+
+@setting{backLineLength, 70}
+This setting specifies the length of lines in back-translated files,
+whether in plain-text or html. This is mainly for human readability.
+Lines may sometimes be somewhat longer.
+
+@setting{interline, no}
+This setting specifies whether interlining is desired. If it is set to
+@samp{yes}, the first line in the output will be a braille
+translation, the next line will be its back-translation according to
+the interlineBackTable. Back-translation is used instead of simply
+presenting the print original because a braille line may contain
+additional information, such as leading blanks, print or braille page
+numbers, print page separator lines, etc.
+
+@end table
+
+@node translation, xml, outputFormat, Customization Configuring liblouisxml
+@section translation
+
+This section specifies the liblouis translation tables to be used for
+various purposes.
+
+@table @code
+
+@setting{literaryTextTable, en-us-g2.ctb}
+The table used for producing literary braille. This may be either
+contracted or uncontracted.
+
+@setting{uncontractedTable, en-us-g1.ctb}
+The table used for producing uncontracted or Grade One braille. This
+setting appears to be superfluous and may be eliminated in the future.
+
+@setting{compbrailleTable, en-us-compbrl.ctb}
+The table used for producing large amounts of output in computer
+braille, such as computer programs. The computer braille table is
+usually combined with one of the two tables above.
+
+@setting{mathtextTable, en-us-mathtext.ctb}
+This table specifies how the non-mathematical parts of math books are
+to be translated. In many cases it will be the same as
+literaryTextTable or uncontractedTable. For books translated with the
+Nemeth Code it is different, because this code requires modification
+of standard Grade Two.
+
+@setting{MathexpTable, nemeth.ctb}
+This is the table used to translate mathematical expressions.
+
+@setting{editTable, edittable.ctb}
+When the output includes both mathematics and text there may be errors
+where one type of translation directly follows another. The editTable
+removes these errors.
+
+@setting{interlineBackTable, en-us-interline.ctb}
+This setting specifies the table to be used for back-translation when
+interlining is turned on. It must be tailored for this purpose, since
+an ordinary forward-translation table may contain entries that do not
+handle the additional information in braille lines correctly.
+
+@end table
+
+@node xml, style, translation, Customization Configuring liblouisxml
+@section xml
+
+This section provides various information for the processing of xml files.
+
+@table @code
+
+@setting{semanticFiles, *\,nemeth.semm}
+This setting gives a list of semantic-action files. These files are
+read in the sequence given in the list. Here the first member of the
+list is an asterisk (@samp{*}). This means that the corresponding file
+is to be named by taking the root element of the document and
+appending @samp{.sem}. This asterisk member may occur anywhere in the
+list.
+
+@setting{xmlheader, <?xml version='1.0' encoding='UTF8' standalone='yes'?>}
+This line gives the xml header to be added to strings produced by
+programs like @command{Mathtype} that lack one.
+
+@setting{entity, nbsp ^1}
+This line defines an entity or substitution in an xml file. It is one
+of those that has two values. The first is the thing to be replaced,
+and the second is the replacement. As many entity lines as necessary
+can be used. The information they contain is added to the information
+provided by xmlHeader. In @file{canonical.cfg} this line is commented
+out, because specifying it at this point would prevent the user from
+specifying his own xmlheader.
+
+@setting{internetAccess, yes}
+The computer has an internet connection and liblouisxml may obtain
+information necessary for the processing of this file from the
+Internet. If this setting is @samp{no} liblouisxml will not try to use
+the internet. The necessary information may, however, be provided on
+the local machine in the form of a "dtd" file.
+
+@setting{newEntries, yes}
+liblouis may create a new semantic-action file (beginning with
+@file{new_}) for a document with an unknown root element or a file
+(beginning with @file{appended_}) containing new entries for an
+existing semantic-action file. Both kinds of files are placed on the
+current directory. If this setting is @samp{no} liblouisxml will dot
+create a file of new entries and if it encounters a document with an
+unknown root element it will issue an error message. Setting
+newEntries to @samp{no} may be useful if users should not be bothered
+with the minutiae of semantic-action files.
+
+@end table
+
+@node style,  , xml, Customization Configuring liblouisxml
+@section style
+
+The following sections all deal with styles. Each style has its own
+section. Style section names are unlike other section names in that
+they consist of the word style, followed by a space, followed by a
+style name. More styles may be added as the software develops, and
+some may be dropped.
+
+@subsection style document
+
+This section specifies the style of the whole document. The settings
+given in it are applied to all other styles. If a section for another
+style is given, the settings in it replace those from the document
+style for that section. Because the settings in the document style
+apply to all other styles, if a document style section is given it
+must precede the sections for all other styles.
+
+@table @code
+
+@setting{linesBefore, 0}
+
+This setting gives the number of blank lines which should be left
+before the text to which this style applies. It is set to a non-zero
+value for some header styles.
+
+@setting{linesAfter, 0}
+
+The number of blank lines which should be left after the text to which
+this style applies.
+
+@setting{leftMargin, 0}
+
+The number of cells by which the left margin of all lines in the text
+should be indented. Used for hanging indents, among other things.
+
+@setting{firstLineIndent, 0}
+
+The number of cells by which the first line is to be indented relative
+to leftMargin. firstLineIndent may be negative. If the result is less
+than 0 it will be set to 0.
+
+@setting{translate, contracted}
+
+This setting is currently inactive. It may be used in the future. This
+setting tells how text in this style should be translated. Possible
+values are @samp{contracted}, @samp{uncontracted}, @samp{compbrl},
+@samp{mathtext} and @samp{mathexpr}.
+
+@setting{skipNumberLines, no}
+
+If this setting is @samp{yes} the top and bottom lines on the page
+will be skipped if they contain braille or print page numbers. This is
+useful in some of the mathematical and graphical styles.
+
+@setting{format, leftJustified}
+
+The format setting controls how the text in the style will be
+formatted. Valid values are @samp{leftJustified},
+@samp{rightJustified}, @samp{centered}, @samp{computerCoded},
+@samp{alignColumnsLeft}, @samp{alignColumnsRight}, @samp{listColumns}
+and @samp{listLines}. The first three are self-explanatory.
+@samp{computerCoded} is used for computer programs and similar
+material. The next three are used for tabular material.
+@samp{alignColumnsLeft} causes the left ends of columns to be aligned.
+@samp{alignColumnsRight} causes the right ends of columns to be
+aligned. @samp{listColumns} causes columns to be placed one after the
+other, separated by whatever separation character has been specified
+in the semantic-action file, followed by a space. An escape character
+(hex 1b) must also be specified to indicate the end of the column. Two
+escape characters must be specified to indicate the end of a row.
+Indentation of the lines in a row is controlled by the leftMargin and
+firstLineIndent settings. @samp{listLines} is similar except that it
+lists lines, as in poetry stanzas. The semantic-action file must
+specify two escape characters to indicate the end of a line.
+
+@setting{newPageBefore, no}
+
+If this setting is @samp{yes}, the text will begin on a new page. This
+is useful for certain mathematical and graphical styles. Page numbers
+are handled properly.
+
+@setting{newPageAfter, no}
+
+If this setting is @samp{yes} any remaining space on the page after
+the material covered by this style is handled is left blank, except
+for page numbers.
+
+@setting{rightHandPage, no}
+
+if this setting is @samp{yes} and interpoint is yes the material
+covered by this style will start on a right-hand page. This may cause
+a left-hand page to be left blank except for page numbers. If
+interpoint is @samp{no} this setting is equivalent to newPageBefore.
+
+@end table
+
+@subsection style arith
+
+This style is used for arithmetic examples in elementary math books.
+On recognizing this style, the translator formats the material in a
+special way. This style has no settings different from those of the
+document style at the moment. Nevertheless, the line @samp{style
+arith} must be included in @file{canonical.cfg} so that it will be set
+up properly.
+
+@subsection style attribution
+
+This style is used for an attribution following a quotation.
+
+@table @code
+
+@setting{format, rightJustified}
+
+@end table
+
+@subsection style biblio
+
+This style is used for bibliographies. Settings will be added later.
+
+@subsection style caption
+
+This style is used for picture captions.
+
+@table @code
+
+@setting{leftMargin, 4}
+
+@setting{firstLineIndent, 2}
+
+Note that the first line is actually indented six cells.
+
+@end table
+
+@subsection style code
+
+This style is used for computer programs.
+
+@table @code
+
+@setting{skipNumberLines, yes}
+
+@setting{linesBefore, 1}
+
+@setting{linesAfter, 1}
+
+@setting{format, computerCode}
+
+@end table
+
+@subsection style contents
+
+This is for entries in a table of contents.
+
+@subsection style dedication
+
+This style is for the dedication of a book.
+
+@table @code
+
+@setting{newPageBefore, yes}
+
+@setting{newPageAfter, yes}
+
+@setting{center, yes}
+
+@end table
+
+@subsection style directions
+
+This is for giving directions for exercises.
+
+@subsection style dispmath
+
+This is for showing mathematics that is set off from the text.
+
+@table @code
+
+@setting{leftMargin, 2}
+
+@end table
+
+@subsection style disptext
+
+This if for text that is set off from the rest of the text.
+
+@table @code
+
+@setting{leftMargin, 2}
+
+@setting{firstLineIndent, 2}
+
+@end table
+
+@subsection style exercise 1
+
+This is the first level in a set of exercises where there are sublevels.
+
+@table @code
+
+@setting{leftMargin, 2}
+
+@setting{firstLineIndent, -2}
+
+@end table
+
+@subsection style exercise2
+
+This is for the second level of exercises, such as exercise a following exercise 1.
+
+@table @code
+
+@setting{leftMargin, 4}
+
+@setting{firstLineIndent, -2}
+
+@end table
+
+@subsection style exercise3
+
+This is for the third level of exercises.
+
+@table @code
+
+@setting{leftMargin, 6}
+
+@setting{firstLineIndent, -2}
+
+@end table
+
+@subsection style glossary
+
+This is for a glossary.
+
+@table @code
+
+@setting{firstLineIndent, 2}
+
+Section: style graph
+
+This style reserves space for a graph or other tactile material.
+
+@setting{skipNumberLines, yes}
+
+@end table
+
+@subsection style graphLabel
+
+This style reserves space for the label of a graph.
+
+@subsection style heading1
+
+This style is used for main headings, such as chapter titles.
+
+@table @code
+
+@setting{linesBefore, 1}
+
+@setting{center, yes}
+
+@setting{linesAfter, 1}
+
+@end table
+
+@subsection style heading2
+
+The first level of subreadings after the main heading.
+
+@table @code
+
+@setting{linesBefore, 1}
+
+@setting{firstLineIndent, 4}
+
+@end table
+
+@subsection style heading3
+
+The third level of headings.
+
+@table @code
+
+@setting{firstLineIndent, 4}
+
+@end table
+
+@subsection style heading4
+
+The fourth and final level of headings.
+
+@table @code
+
+@setting{firstLineIndent, 4}
+
+@end table
+
+@subsection style indexx
+
+This style is used for indexes. The extra @samp{x} is not an error. It
+is there to prevent conflict with names elsewhere in the software.
+
+@subsection style list
+
+This is for the individual items in a list.
+
+@table @code
+
+@setting{firstLineIndent, -2}
+
+@setting{leftMargin, 2}
+
+@end table
+
+@subsection style matrix
+
+This style causes its contents to be formatted in a way suitable for
+the representation of matrices.
+
+@table @code
+
+@setting{format, alignColumnsLeft}
+
+@end table
+
+@subsection style music
+
+This style is used for braille music.
+
+@table @code
+
+@setting{skipNumberLines, yes}
+
+@end table
+
+@subsection style note
+
+This style is used for footnotes.
+
+@subsection style para
+
+Paragraph. This is ordinary body text.
+
+@table @code
+
+@setting{firstLineIndent, 2}
+
+@end table
+
+@subsection style quotation
+
+This style is used for quotations that are set off from the rest of
+the text.
+
+@table @code
+
+@setting{linesBefore, 1}
+
+@setting{linesAfter, 1}
+
+@end table
+
+@subsection style section
+
+This style is used for a section with a section number.
+
+@table @code
+
+@setting{firstLineIndent, 4}
+
+@end table
+
+@subsection style spatial
+
+This style is used for mathematical material that is arranged
+spatially, such as large fractions.
+
+@subsection style stanza
+
+this style is used for stanzas in poetry.
+
+@table @code
+
+@setting{linesBefore, 1}
+
+@setting{linesAfter, 1}
+
+@setting{format, listLines}
+
+@end table
+
+@subsection  style style1
+
+This and the subsequent numbered styles can be used by the user for
+any purpose.
+
+@subsection style style2
+
+@subsection style style3
+
+@subsection style style4
+
+@subsection style style5
+
+@subsection style subsection
+
+This style is used for subsections with a subsection number.
+
+@table @code
+
+@setting{firstLineIndent, 4}
+
+@end table
+
+@subsection style table
+
+This style is used for ordinary tables.
+
+@subsection style titlepage
+
+This style is used to begin a title page.
+
+@table @code
+
+@setting{newPageAfter, yes}
+
+@end table
+
+@subsection style trnote
+
+This style is used for transcriber's notes which are set off from the
+text.
+
+@subsection style volume
+
+This style is used to indicate the beginning of a braille volume.
+
+@node Connecting with the xml Document - Semantic-Action Files, Implementing Braille Mathematics Codes, Customization Configuring liblouisxml, Top
+@chapter Connecting with the xml Document - Semantic-Action Files
+
+When liblouisxml (or @command{xml2brl}) processes an xml document, it
+needs to be told how to use the information in that document to
+produce a properly translated and formatted braille document. These
+instructions are provided by a semantic-action file, so called because
+it explains the meaning, or semantics, of the various specifications
+in the xml document. To understand how this works, it is necessary to
+have a basic knowledge of the organization of an xml document.
+
+An xml document is organized like a book, but with much finer detail.
+first there is the title of the whole book. Then there are various
+sections, such as author, copyright, table of contents, dedication,
+acknowledgments, preface, various chapters, bibliography, index, and
+so on. Each chapter may be divided into sections, and these in turn
+can be divided into subsections, subsubsections, etc. In a book the
+parts have names or titles distinguished by capitalization, type
+fonts, spacing, and so forth. In an xml document the names of the
+parts are enclosed in angle brackets (@samp{<>}). for example, if
+liblouisxml encounters @code{<html>} at the beginning of a document,
+it knows it is dealing with a document that conforms to the standards
+of the extensible markup language (xhtml) - at least we hope it does.
+When you see a book, you know it's a book. The computer can know only
+by being told. Something enclosed in angle brackets is called an
+"element" (more properly, a "tag") in xml parlance. (There may be more
+between the angle brackets than just the name of the element. More of
+this later). The first "element" in a document thus tells liblouisxml
+what kind of document it is dealing with. This element is called the
+"root element" because the document is visualized as branching out
+from it like a tree. Some examples of root elements are @code{<html>},
+@code{<math>}, @code{<book>}, @code{<dtbook3>} and
+@code{<wordDocument>}. Whenever liblouisxml encounters a root element
+that it doesn't know about it creates a new file called a
+semantic-action file. The name of this file is formed by stripping the
+angle brackets from the root element and adding a period plus the
+letters @samp{sem}. If you look in a directory containing
+semantic-action files you will see names like @file{html.sem},
+@file{dtbook3.sem}, @file{math.sem}, and so on.
+
+Sometimes it is advantageous to preempt the creation of a
+semantic-action file for a new root element. For example, an article
+written according to the docbook specification may have the root
+element @code{<article>}. However, the specification itself has the
+root element @code{<book>}. In this case you can specify the
+@file{book.sem} file in the configuration file by writing, in the xml
+section,:
+
+@example
+semanticFiles book.sem
+@end example
+
+You will note that this setting uses the plural of "file". This is
+because you can actually specify a list of file names separated by
+commas. You might want to do this to specify the semantic-action file
+for the particular braille mathematical code to be used. For example:
+
+@example
+semanticFiles book.sem,ukmath.sem
+@end example
+
+As you will see in the next section, different braille style
+conventions and different braille mathematical codes may require
+different semantic-action files
+
+liblouisxml records the names of all elements found in the document in
+the semantic-action file. The document has a multitude of elements,
+which can be thought of as describing the headings of various parts of
+the document. One element is used to denote a chapter heading. Another
+is used to denote a paragraph, Still another to denote text in bold
+type, and so on. In other words, the elements take the place of the
+capitalization, changes in type font, spacing, etc. in a book.
+However, The computer still does not know what to do when it
+encounters an element. The semantic-action file tells it that.
+
+Consider @file{html.sem}. A copy is included as part of this
+documentation with the name @file{example_sem}. It may differ from the
+file that liblouisxml is currently using. You will see that it begins
+with some lines about copyrights. Each line begins with a number sign
+(@samp{#}). This indicates that it is a "comment," intended for the
+human reader and the computer should ignore it. Then there is a blank
+line. Finally, there are two other comments explaining that the file
+must be edited to get proper output. This is because a human being
+must tell the computer what to do with each element. The semantic
+files for common types of documents have already been edited, so you
+generally don't have to worry about this. But if you encounter a new
+type of document or wish to specify special handling for styles or
+mathematics you may have to edit the semantic-action file or send it
+to the maintainer for editing. In any case the rest of this section is
+essential for understanding how liblouisxml handles documents and for
+making changes if the way it does so is not correct.
+
+After another blank line you will see a table consisting of two, and
+sometimes three, columns. The first column contains a word which tells
+the computer to do something. For example, the first entry in the
+table is: @samp{include nemeth.sem}. This tells liblouisxml to include
+the information in the @file{nemeth.sem} file when it is deciphering
+an html (actually xhtml) document (it may be preferable to use the
+semanticFiles setting in the configuration file rather than an
+include).
+
+The second row of the table is:
+
+@example
+no hr
+@end example
+
+@samp{hr} is an element with the angle brackets removed. It means
+nothing in itself. However, the first column contains the word
+@samp{no}. This tells liblouisxml "no do", that is, do nothing.
+
+After a few more lines with @samp{no} in the first column, we see one
+that says:
+
+@example
+softreturn br
+@end example
+
+This means that when the element @code{<br>} is encountered,
+liblouisxml is to do a soft return, that is, start a new line without
+starting a new paragraph.
+
+The next line says:
+
+@example
+heading1 h1
+@end example
+
+This tells liblouisxml that when it encounters the element @code{<h1>}
+it is to format the text which follows as a first-level braille
+heading, that is, the text will be centered and proceeded and followed
+by blank lines. (You can change this by changing the definition of the
+heading1 style).
+
+The next line says:
+
+@example
+italicx em
+@end example
+
+This tells liblouisxml that when it encounters the element @code{<em>}
+it is to enclose the text which follows in braille italic indicators.
+The @samp{x} at the end of the semantic action name is there to
+prevent conflicts with names elsewhere in the software. Just where the
+italic indicators will be placed is controlled by the liblouis
+translation table in use.
+
+The next line says:
+
+@example
+skip style
+@end example
+
+This tells liblouis to simply skip ahead until it encounters the
+element @code{</style>}. Nothing in between will have any effect on
+the braille output. Note the slash (@samp{/}) before the @samp{style}.
+This means the end of whatever the @code{<style>} element was
+referring to. Actually, it was referring to specifications of how
+things should be printed. If liblouisxml had not been told to skip
+these specifications, the braille output would have contained a lot of
+gobledygook.
+
+The next line says:
+
+@example
+italicx strong
+@end example
+
+This tells liblouis to also use the italic braille indicators for the
+text between the @code{<strong>} and @code{</strong>} elements.
+
+After a few more lines with @samp{no} in the first column we come to
+the line:
+
+@example
+document html
+@end example
+
+This tells liblouisxml that everything between @code{<html>} and
+@code{</html>} is an entire document. @code{<html>} was the root
+element of this document, so this is logical.
+
+After another @samp{no} line we come to:
+
+@example
+para p
+@end example
+
+liblouisxml will consider everything between @code{<p>} and
+@code{</p>} to be a normal body text paragraph.
+
+The next line is:
+
+@example
+heading1 title
+@end example
+
+this causes the title of the document to also be treated as a braille
+level 1 heading.
+
+Next we have the line:
+
+@example
+list li
+@end example
+
+The xhtml @code{<li>} and @code{</li>} pair of elements is used to
+enclose an item in a list. liblouisxml will format this with its own
+list style. That is, the first line will begin at the left margin and
+subsequent lines will be indented two cells.
+
+Next we have:
+
+@example
+table table
+@end example
+
+You will note that the names of actions and elements are often
+identical. This is because they are both mnemonic. In any case, this
+line tells liblouisxml to format the table contained in the xhtml
+document according to the table formatting rules it has been given for
+braille output.
+
+Next we have the line:
+
+@example
+heading2 h2
+@end example
+
+This means that the text between @code{<h2>} and @code{</h2>} is to be
+formatted according to the Liblouisxml style heading2. A blank line
+will be left before the heading and the first line will be indented
+four spaces.
+
+After a few more lines we come to:
+
+@example
+no table,cellpadding
+@end example
+
+Note the comma in the second column. This divides the column into two
+subcolumns. The first is the table element name. The second is called
+an "attribute" in xml. It gives further instructions about the
+material enclosed between the starting and ending "tags" of the
+element (@code{<table>} and @code{</table>}. Full information requires
+three subcolumns. The third is called the value and gives the actual
+information. The attribute is merely the name of the information.
+
+Much further down we find:
+
+@example
+no table,border,0
+@end example
+
+Here the element is table, the attribute is border and the value is 0.
+If liblouisxml were to interpret this, it would mean that the table
+was to have a border of 0 width. It is not told to do so because
+tables in braille do not have borders.
+
+Now let's look at the file which is included at the beginning of the
+@file{html.sem} file. This is @file{nemeth.sem}. As with
+@file{html.sem}, a copy is included in the documentation directory
+with the name @file{example_nemeth.sem} , but it is not necessarily
+the one that liblouisxml is currently using. It illustrates several
+more things about how liblouisxml uses semantic-action files.
+
+The first thing you will notice is that for quite a few lines the
+first and second columns are identical. This is because the MathML
+element and attribute names are part of a standard, and it was
+simplest to use the element names for the semantic actions as well.
+
+The first line of real interest is:
+
+@example
+math math
+@end example
+
+Every mathematical expression begins with the element @code{<math>}
+(which may have attributes and values), and ends with @code{</math>}.
+This is therefore the root element of a mathematical expression.
+However, mathematical expressions are usually part of a document, so
+it is not given the semantic action document. The math semantic action
+causes liblouisxml to carry out special interpretation actions. These
+will become clearer as we continue to look at the @file{nemeth.sem}
+file. You will note that this line has three columns. The meaning of
+the third column is discussed below.
+
+After another uninteresting line we come to two that illustrate
+several more facts about semantic-action files:
+
+@example
+mfrac mfrac ^?,/,^#
+mfrac mfrac,linethickness,0 ^(,^;%,^)
+@end example
+
+Like the math entry above, the first line has three columns. While the
+first two columns must always be present, the third column is
+optional. Here, it is also divided into subcolumns by commas. The
+element @code{<mfrac>} indicates a fraction. A fraction has two parts,
+a numerator and a denominator. In xml, we call these parts children of
+@code{<mfrac>}. They may be represented in various ways, which need
+not concern us here. What is of real importance is that the third
+column tells liblouisxml to put the characters @samp{~?} before the
+numerator, @samp{/} between the numerator and denominator, and
+@samp{~#} after the denominator. Later on, liblouis will translate
+these characters into the proper representation of a fraction in the
+Nemeth Code of Braille Mathematics. (For other mathematical codes,
+@pxref{Implementing Braille Mathematics Codes}).
+
+The second line is of even greater interest. The first column is again
+@samp{mfrac}, but this line is for binomial coefficient. The second
+column contains three subcolumns, an element name, an attribute name
+and an attribute value. The attribute linethickness specifies the
+thickness of the line separating the numerator and denominator. Here
+it is 0, so there is no line. This is how the binomial coefficient is
+represented in print. The third column tells how to represent it in
+braille. liblouisxml will supply @samp{~(}, upper number, @samp{~%},
+lower number, @samp{~)} to liblouis, which will then produce the
+proper braille representation for the binomial coefficient.
+
+Returning to the line for the math element, we see that the third
+column begins with a backslash followed by an asterisk. The backslash
+is an escape character which gives a special meaning to the character
+which follows it. Here the asterisk means that what follows is to be
+placed at the very end of the mathematical expression, no matter how
+complex it is.
+
+For further discussion of how the third column is used
+@pxref{Implementing Braille Mathematics Codes}. The third column is
+not limited to mathematics. It can be used to add characters to
+anything enclosed by an xml tag.
+
+Here is a complete list of the semantic actions which liblouisxml
+recognizes. Many of them are also the names of styles. These are
+listed first, preceded by an asterisk. For a discussion of these,
+@pxref{Customization Configuring liblouisxml}.
+
+@table @code
+
+@item * arith
+@item * attribution
+@item * biblio
+@item * blanklinebefore
+@item * caption
+@item * code
+@item * contents
+@item * dedication
+@item * directions
+@item * dispmath
+@item * disptext
+@item * document
+@item * exercise1
+@item * exercise2
+@item * exercise3
+@item * glossary
+@item * graph
+@item * graphlabel
+@item * heading1
+@item * heading2
+@item * heading3
+@item * heading4
+@item * indexx
+@item * list
+@item * matrix
+@item * music
+@item * note
+@item * para
+@item * quotation
+@item * section
+@item * spatial
+@item * stanza
+@item * style1
+@item * style2
+@item * style3
+@item * style4
+@item * style5
+@item * subsection
+@item * table
+@item * titlepage
+@item * trnote
+@item * volume
+@item acknowledge
+@item allcaps
+@item author
+@item blankline
+@item bodymatter
+@item boldx
+@item booktitle
+@item boxline
+@item cdata
+@item center
+@item chemistry
+@item contracted
+@item copyright
+@item endnotes
+@item footer
+@item frontmatter
+@item graphic
+@item italicx
+@item jacket
+@item line
+@item linkto
+@item maction
+@item maligngroup
+@item malignmark
+@item math
+@item menclose
+@item merror
+@item mfenced
+@item mfrac
+@item mglyph
+@item mi
+@item mlabeledtr
+@item mmultiscripts
+@item mn
+@item mo
+@item mover
+@item mpadded
+@item mphantom
+@item mprescripts
+@item mroot
+@item mrow
+@item ms
+@item mspace
+@item msqrt
+@item mstyle
+@item msub
+@item msubsup
+@item msup
+@item mtd
+@item mtext
+@item mtr
+@item munder
+@item munderover
+@item newpage
+@item no
+@item noindent
+@item none
+@item preface
+@item rearmatter
+@item rightalign
+@item righthandpage
+@item runninghead
+@item semantics
+@item skip
+@item softreturn
+@item specsym
+@item tblbody
+@item tblcol
+@item tblhead
+@item tblrow
+@item tnpage
+@item transcriber
+@item uncontracted
+
+@end table
+
+@node Implementing Braille Mathematics Codes, Settings Index, Connecting with the xml Document - Semantic-Action Files, Top
+@chapter Implementing Braille Mathematics Codes
+
+The Nemeth Code of Braille Mathematical and Science Notation has been
+implemented. Other braille mathematics codes can be implemented by
+following the same pattern. The Nemeth Code implementation is
+discussed as an example below.
+
+Four tables are used to translate xml documents containing a mixture
+of text and mathematics into the Nemeth code. They can be found in the
+subdirectory @file{lbx_files} of the liblouisxml directory. First, the
+semantic-action file @file{nemeth.sem} is used to interpret the
+mathematical portions of the xml document (The text portions are
+interpreted by another semantic-action file which will not be
+discussed here). After the math and text have been interpreted, two
+liblouis tables, @file{nemeth.ctb} and @file{en-mathtext.ctb} are used
+to translate them. Each piece of mathematics or text is translated
+separately and the pieces are strung together with blanks between
+them. This results in inaccuracies where mathematics meets text. The
+fourth table, also a liblouis table, is used to remove these
+inaccuracies. It is called @file{edittable.ctb}, and it does things
+like removing the multi-purpose indicator before a blank, inserting
+the punctuation indicator before a punctuation mark following a math
+expression, and removing extra spaces.
+
+The general format and use of semantic-action files were discussed in
+the previous section, (@pxref{Connecting with the xml Document -
+Semantic-Action Files}). In this section we shall concentrate on the
+optional third column, which is used a lot in @file{nemeth.sem}. While
+the first two columns can be generated by liblouisxml but must be
+edited by a person, the third column must always be provided by a
+human.
+
+As previously stated, the third column tells liblouisxml what
+characters to insert to inform liblouis how to translate the math
+expression. Look at the following line:
+
+@example
+mfrac mfrac ^?,/,^#
+@end example
+
+You will see that the third column contains two commas. This means
+that it has three subcolumns. A fraction has a numerator and a
+denominator. These are called children of the mfrac element. The first
+subcolumn specifies the characters that liblouisxml should place in
+front of the numerator. The second subcolumn gives the characters to
+be placed between the numerator and denominator. Finally, the third
+subcolumn gives the characters to place after the denominator. You
+will see that the first subcolumn contains a caret followed by a
+question mark. The dot pattern for the question mark in computer
+braille is the same as for the Nemeth start-fraction indicator. The
+caret is used so that liblouis can tell this apart from a question
+mark, which also has the same dot pattern in computer braille. The
+second subcolumn contains a slash but no caret. This is because there
+is no danger of confusion where the slash is concerned. The third
+subcolumn does contain a caret, and it also contains a number sign,
+which corresponds to the Nemeth end-fraction indicator. When
+liblouisxml encounters the MathML representation of the fraction
+one-half it produces the following string of characters:
+@samp{^?1/2^#}. liblouis then removes the carets to get @samp{?1/2#}.
+
+As another example, consider the entry in @file{nemeth.sem} for a
+subscript.
+
+@example
+msub msub ,^;,^"
+@end example
+
+Here the first subcolumn is blank, because nothing is to be placed
+before the subscripted symbol. The second subcolumn contains a caret
+and a semicolon (in computer braille). This corresponds to the Nemeth
+subscript indicator. The third column contains a caret and a quotation
+mark, corresponding to the Nemeth baseline indicator. liblouisxml
+translates the MathML expression for x superscript i into
+@samp{x^;i^}. liblouis subsequently produces @samp{x;i}. There are
+other steps if the subscript is numeric. These are handled by pass2
+opcodes in the liblouis translation table, @file{nemeth.ctb}.
+
+You will notice that the entries in @file{nemeth.sem} have various
+numbers of subcolumns in the third column. In general, the characters
+given in the first subcolumn are placed before the first child of the
+element given in the second column. The characters in the second
+subcolumn are placed before the second child, and so on, until the
+characters given in the last subcolumn are placed after the last
+child.
+
+Sometimes an element or tag can have an indeterminate number of
+children. This is true of @code{<math>} itself. Yet, it may be
+necessary to place some characters after the very last element. Let us
+look at the @code{<math>} entry.
+
+@example
+math math \eb,\*\ee
+@end example
+
+First let us discuss escape sequences starting with a backslash. These
+are basically the same as in liblouis. The sequence @samp{\e} is
+shorthand for the escape character, which would otherwise be
+represented by @samp{\x001b}. The beginning of a math expression is
+denoted by an escape character followed by the letter b and the end by
+an escape character followed by the letter @samp{e}. This enables the
+editing table to do such things as drop the baseline indicator at the
+end of a math expression and insert a number sign at the beginning, if
+needed.
+
+Not found in liblouis is the sequence @samp{\*}. This means to put
+what follows after the very last child of the math element, no matter
+how many there are.
+
+As another example consider:
+
+@example
+mtd mtd \*\ec
+@end example
+
+@code{mtd} is the MathML tag for a table column. There may be many
+children of this tag. The entry says to put an escape character (hex
+1b), plus the letter @samp{c}, after the very last of them.
+
+As a final example consider:
+
+@example
+mtr mtr ^.^\,^(,\*^.^\,^)\er
+@end example
+
+@code{mtr} is the MathML tag for a row in a table, in this case a
+matrix. Each row in a matrix must begin with the dot pattern
+@samp{46-6-12356} and end with the dot pattern @samp{46-6-12456}. As
+usual a caret is placed before the corresponding characters. Since dot
+6 is a comma, it must be escaped. This is done by placing a backslash
+before the comma. There are two subcolumns. the first contains the
+characters to be placed at the beginning of each row. The second
+starts with @samp{\*}, signifying that the characters following it
+are to be placed at the end of everything in this row. A subcolumn
+starting with @samp{\*} must be the last (or only) subcolumn.
+
+Here this last subcolumn ends with an escape character and the letter
+@key{r}, signifying the end of a row.
+
+So much for the semantic action file. Even though the characters in
+the third column were chosen to correspond with nemeth characters,
+they may not have to be changed for other math codes. liblouis can
+replace them with anything needed.
+
+This brings us to a consideration of the two tables used by liblouis
+to translate mathematics texts. The first, @file{en-mathtext.ctb} is
+used to translate text appearing outside math expressions. It is
+necessary because the Nemeth code requires modifications of Grade 2
+braille. Other math codes may not have this requirement.
+
+The table actually used to translate mathematics is @file{nemeth.ctb}.
+It includes two other tables, @file{chardfs.cti} and
+@file{nemethdefs.cti}. The first gives ordinary character definitions
+and is included by all the other tables. Note however, that the
+unbreakable space, @samp{\x00a0}, is translated by dot 9. This is used
+before and after the equal sign and other symbols in
+@file{nemeth.ctb}. The second table contains character definitions for
+special math symbols, most of which are Unicode characters greater
+than @samp{\x00ff}. The Greek letters are here. So are symbols like
+the integral sign.
+
+Most of the entries in @file{nemeth.ctb} should be familiar from other
+tables. The unfamiliar ones follow the comments @samp{# Semantic
+pairs} and @samp{# pass2 corrections}. The first simply replace
+characters preceded by a caret with the character itself. The second
+make adjustments in the code generated directly from the
+@file{nemeth.sem} file. The pass2 opcode is discussed in the liblouis
+guide (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's
+and User's Guide}). Here are some comments on a few of the entries in
+@file{nemeth.ctb}.
+
+@example
+pass2 @@1456-1456 @@6-1456
+@end example
+
+Replaces double start-fraction indicators with the start complex
+fraction indicator.
+
+@example
+pass2 @@3456-3456 @@6-3456
+@end example
+
+Replaces double end-fraction indicators with the end-complex-fraction
+indicator.
+
+@example
+pass2 @@56[$d1-5]@@5 *
+@end example
+
+Removes the subscript and baseline indicators from numeric subscripts.
+
+@example
+pass2 @@5-9 @@9
+@end example
+
+Removes the baseline or multipurpose indicator before an unbreakable
+space generated by the translation of an equal sign, etc.
+
+@example
+pass2 @@45-3-5 @@3
+@end example
+
+Replaces a superscript apostrophe with a simple prime symbol.
+
+@example
+pass2 @@9[]$d @@3456
+@end example
+
+Puts a number sign before a digit preceded by a blank.
+
+@example
+pass2 @@9-0 @@9
+@end example
+
+Removes a space following an unbreakable space.
+
+We now come to the fourth and last table used for math translation,
+the editing table, @file{edittable.ctb}. As explained at the
+beginning, this table is used to remove inaccuracies where math
+translation butts up against text translation. For example, the Nemeth
+code puts numbers in the lower part of the cell. However, punctuation
+marks are also in the lower part of the cell. So Nemeth puts a
+punctuation indicator, dots @samp{456}, in front of any lower-cell
+punctuation that immediately follows a mathematical expression. If
+this occurs inside Mathml it is handled by @file{nemeth.ctb}. However,
+a MathML expression is often followed by a punctuation mark which is
+the first part of text. liblouisxml puts a blank between math and
+text, but this can result in a mathematical expression followed by a
+blank and then, say, a period, dots @samp{256}. @file{edittable.ctb}
+replaces the blank with the punctuation indicator.
+
+When you look at @file{edittable.ctb} you will see that it begins with
+an include of @file{chardefs.cti}. Most of the entries are ordinary,
+but some are interesting. for example,
+
+@example
+always "\s 0
+@end example
+
+replaces the baseline or multipurpose indicator followed by a space
+with just a space.
+
+@node Settings Index, Function Index, Implementing Braille Mathematics Codes, Top
+@unnumbered Settings Index
+
+@printindex tp
+
+@node Function Index, Program Index, Settings Index, Top
+@unnumbered Function Index
+
+@printindex fn
+
+@node Program Index,  , Function Index, Top
+@unnumbered Program Index
+
+@printindex pg
+
+@bye
+
+
+
For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts:

  • » [liblouis-liblouisxml] [liblouisxml commit] r7 - trunk/doc - codesite-noreply