Author: christian.egli@xxxxxxxxxxxxxx Date: Tue Dec 23 05:25:15 2008 New Revision: 7 Added: trunk/doc/liblouisxml-guide.texi Modified: trunk/doc/Makefile.am Log: Added the texinfo version of the liblouisxml-guide. Modified: trunk/doc/Makefile.am ============================================================================== --- trunk/doc/Makefile.am (original) +++ trunk/doc/Makefile.am Tue Dec 23 05:25:15 2008 @@ -18,3 +18,9 @@ liblouisxml-guide.html \ liblouisxml-guide.txt +info_TEXINFOS = liblouisxml-guide.texi + +SUFFIXES = .txt + +.texi.txt: + $(MAKEINFO) --plaintext $< -o $@ Added: trunk/doc/liblouisxml-guide.texi ============================================================================== --- (empty file) +++ trunk/doc/liblouisxml-guide.texi Tue Dec 23 05:25:15 2008 @@ -0,0 +1,2069 @@ +\input texinfo +@c %**start of header +@setfilename liblouisxml-guide.info +@include version.texi +@settitle Liblouisxml Programmer's and User's Guide + +@dircategory Misc +@direntry +* Liblouisxml: (liblouisxml). An xml to Braille Translation Library. +@end direntry + +@c Version and Contact Info+@set MAINTAINERSITE @uref{http://www.jjb-software.com/liblouisxml-guide.html,maintainers webpage}
+@set AUTHOR John J. Boyer +@set MAINTAINER John J. Boyer +@set MAINTAINEREMAIL @email{john.boyer@xxxxxxxxxxxxxxxx}+@set MAINTAINERCONTACT @uref{mailto:john.boyer@xxxxxxxxxxxxxxxx,contact the maintainer}
+@c %**end of header +@finalout + +@c Macro definitions + +@c Opcode. +@macro setting{name, args} +@tindex \name\ +@item \name\ \args\ +@end macro + +@copying +This manual is for liblouisxml (version @value{VERSION}, +@value{UPDATED}), an xml to Braille Translation Library. + +This file may contain code borrowed from the Linux screenreader +@acronym{BRLTTY}, Copyright @copyright{} 1999-2006 by the +@acronym{BRLTTY} Team. + +@noindent +Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc. +@uref{www.viewplus.com} and Copyright @copyright{} 2007,2008 JJB +Software, Inc. @uref{www.jjb-software.com}. + +@quotation +This file is free software; you can redistribute it and/or modify it +under the terms of the GNU Lesser (or library) General Public License +(LGPL) as published by the Free Software Foundation; either version 3, +or (at your option) any later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Lesser (or Library) General Public License LGPL for more details. + +You should have received a copy of the GNU Lesser (or Library) General +Public License (LGPL) along with this program; see the file COPYING. +If not, write to the Free Software Foundation, 51 Franklin Street, +Fifth Floor, Boston, MA 02110-1301, USA. +@end quotation +@end copying + +@titlepage +@title Liblouisxml Programmer's and User's Guide + +@subtitle Release @value{VERSION} +@author by John J. Boyer + +@c The following two commands start the copyright page. +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@c Output the table of contents at the beginning. +@contents + +@ifnottex +@node Top, Introduction, (dir), (dir) +@top Liblouis Programmer's and User's Guide + +@insertcopying +@end ifnottex + +@menu +* Introduction:: +* Programming with liblouisxml:: +* Transcribing with the xml2brl program:: +* Customization Configuring liblouisxml:: +* Connecting with the xml Document - Semantic-Action Files:: +* Implementing Braille Mathematics Codes:: +* Settings Index:: +* Function Index:: +* Program Index:: + +@detailmenu + --- The Detailed Node Listing --- + +Programming with liblouisxml + +* License:: +* Overview:: +* Files and Paths:: +* lbx_version:: +* lbx_initialize:: +* lbx_translateString:: +* lbx_translateFile:: +* lbx_translateTextFile:: +* lbx_backTranslateFile:: +* lbx_free:: + +Transcribing with the xml2brl program + +* Transcribing Microsoft Word Files with msword2brl:: + +Customization: Configuring liblouisxml + +* outputFormat:: +* translation:: +* xml:: +* style:: + +@end detailmenu +@end menu + +@node Introduction, Programming with liblouisxml, Top, Top +@chapter Introduction + +liblouisxml is a software component which can be incorporated into +software packages to provide the capability of translating any file in +the computer lingua franca xml format into properly transcribed +braille. This includes translation into grade two, if desired, +mathematical codes, etc. It also includes formatting according to a +built-in style sheet which can be modified by the user. The first +program into which liblouisxml has been incorporated is +@command{xml2brl}. This program will translate an xml or text file +into an embosser-ready braille file. It is not necessary to know xml, +because MSWord and other word processors can export files in this +format. If the word processor has been used correctly +@command{xml2brl} will produce an excellent braille file. + +There is a Mac GUI application incorporating liblouisxml called louis. +For a link to it go to @uref{www.jjb-software.com/downloads}. A +similar Windows application is in the works. + +Computer programmers who wish to use liblouisxml in their software can +find the information they need in the section Programming with +liblouisxml (@pxref{Programming with liblouisxml}). Those who wish to +change the output generated by liblouisxml should read the section +Configuring liblouisxml (@pxref{Customization Configuring +liblouisxml}). If you encounter a type of xml file with which liblouis +is not familiar you can learn how to tell it how to process that file +by reading Connecting with the xml document: Semantic-Action Files +(@pxref{Connecting with the xml Document - Semantic-Action Files}). +Finally, if you wish to implement a new braille mathematics code read +Implementing Braille Mathematics Codes (@pxref{Implementing Braille +Mathematics Codes}). + +You will also find it advantageous to be acquainted with the companion +library liblouis, which is a braille translator and back-translator +(@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and +User's Guide}). ++@node Programming with liblouisxml, Transcribing with the xml2brl program, Introduction, Top
+@chapter Programming with liblouisxml + +@menu +* License:: +* Overview:: +* Files and Paths:: +* lbx_version:: +* lbx_initialize:: +* lbx_translateString:: +* lbx_translateFile:: +* lbx_translateTextFile:: +* lbx_backTranslateFile:: +* lbx_free:: +@end menu ++@node License, Overview, Programming with liblouisxml, Programming with liblouisxml
+@section License + +Liblouisxml may contain code borrowed from the Linux screenreader +BRLTTY, Copyright @copyright{} 1999-2006 by the BRLTTY Team. + +@noindent +Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc. +@uref{www.viewplus.com}. + +@noindent +Copyright @copyright{} 2007,2008 JJB Software, Inc. +@uref{www.jjb-software.com}. + +Liblouisxml is free software: you can redistribute it and/or modify it +under the terms of the GNU Lesser General Public License as published +by the Free Software Foundation, either version 3 of the License, or +(at your option) any later version. + +Liblouisxml is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Lesser General Public License for more details. + +You should have received a copy of the GNU Lesser General Public +License along with Liblouis. If not, see +@uref{http://www.gnu.org/licenses/}. + +@node Overview, Files and Paths, License, Programming with liblouisxml +@section Overview + +liblouisxml is an "extensible renderer," designed to translate a wide +variety of xml documents into braille, but with a special emphasis on +technical material. The overall operation of liblouisxml is controlled +by a configuration file. The way in which a particular type of xml +document is to be rendered is specified by a semantic-action file for +that document type. Braille translation is done by the liblouis +braille translation and back-translation library (@pxref{Top, , +Overview, liblouis-guide, Liblouis Programmer's and User's Guide}). +Its operation, in turn is controlled by translation table files. All +these files are plain text and can be created and edited in any text +editor. Configuration settings can also be specified on the command +line of the console-mode transcription program @command{xml2brl}. + +The general operation of liblouisxml is as follows. It uses the +libxml2 library to construct a parse tree of the xml document. After +the parse tree is constructed, a function called +@code{examine_document} looks it over and determines whether math +translation tables, etc. are needed. @code{examine_document} also +constructs a prototype semantic-action file, if one does not exist +already. When it is finished, another function, called +@code{transcribe_document}, does the actual braille transcription. It +calls @code{transcribe_math} to handle MathML subtrees, +@code{transcribe_chemistry} for chemical formula subtrees, +@code{transcribe_graphic} for SVG graphics, etc. Entities are +translated to Unicode, if they are not already. Sequences of symbols +indicate superscripts, return to the baseline, subscripts, start and +end of fractions, etc. The Braille translator and back-translator +library liblouis is used to do the braille translation. + +The @code{transcribe_math} function works in conjunction with the +latest version of liblouis and a special math translation table to +transcribe most mathematical expressions into fairly good Nemeth Code. +Much refinement is still necessary. Other braille mathematical codes +can be handled by modifying the translation table. + +The functions which are not needed at the moment, such as +@code{transcribe_chemistry}, are only skeletons. However, I hope that +@code{transcribe_graphics} can be expanded in the near future to use +the graphics capability of the Tiger tactile graphics embossers. + +The latest versions of liblouisxml and liblouis can be downloaded from +@uref{www.jjb-software.com}. Note that liblouisxml will only work with +the latest version of liblouis. + +liblouisxml can be compiled to use either 16-bit or 32-bit Unicode +internally. This is inherited from liblouis, so liblouis must be +compiled first and then liblouisxml. Wherever 16 bits are mentioned in +this document, read 32 if you have compiled the library for 32 bits. + +@node Files and Paths, lbx_version, Overview, Programming with liblouisxml +@section Files and Paths + +As stated in the previous section, liblouisxml uses three kinds of +files, configuration files, semantic-action files, and liblouis +translation tables. The first two are discussed later in this +documentation. liblouis translation tables are discussed in the +liblouis guide (@pxref{Top, , Overview, liblouis-guide, Liblouis +Programmer's and User's Guide}) which is distributed with liblouis. +These files can be placed on various paths, which are determined at +compile time. One of these paths should be to the @file{lbx_files} +directory provided by liblouisxml, which contains the principal +configuration file (@file{canonical.cfg}) and the semantic-action +files. Another should be to the tables directory in the liblouis +distribution. Note that liblouisxml also generates some files, all of +which are placed on the current directory. These files are new +prototype semantic-action files, additions to old semantic-action +files, temporary files, and log files. The first two can be used to +extend the capability of liblouisxml to process xml documents. The +latter two are useful for debugging. + +Paths are set by changing a few lines of code in the @file{paths.c} +module. If you are preparing liblouisxml for Windows a function which +finds the name of the "Program Files" directory for your locale is +called automatically. You can then modify the line containing the term +@samp{yourSubDir} as needed. + +If you are preparing liblouisxml for a Unix-type system look for the +line that says @samp{Set Unix Paths}. The following three lines +establish a path to the @file{lbx_files} directory in your home +directory. As distributed, this directory contains the semantic-action +files and some configuration files. You can chose to copy the tables +from the liblouis distribution into it as well, or you can modify the +following three lines to point to the actual location of the tables. +You can also chose to place both the @file{lbx_files} and the tables +directory in @file{/etc}. + +The function @code{addPath} takes care of adding path to liblouisxml +properly. You can specify many more than two paths. ++@node lbx_version, lbx_initialize, Files and Paths, Programming with liblouisxml
+@section lbx_version + +@findex lbx_version +@example +char *lbx_version (void) +@end example + +This function returns a pointer to a character string containing the +version of liblouisxml, plus other information such as the release +date and perhaps notable changes. ++@node lbx_initialize, lbx_translateString, lbx_version, Programming with liblouisxml
+@section lbx_initialize + +@findex lbx_initialize +@example +void * lbx_initialize ( + const char *const configFilelist, + const char const *logFileName, + const char *const settingsString) +@end example + +This function initializes the libxml2 library, runs +@file{canonical.cfg} and processes configuration settings given in +@code{configSettings} and the configuration files given in +@code{configFilelist}. This is a list of configuration file names +separated by commas. If the first character is a comma it is taken to +be a string containing configuration settings and is processed like +the @code{configSettings} string. Such a string must conform to the +format of a configuration file. Newlines should be represented with +ASCII 10. If @code{logfilename} is not @code{null}, a log file is +produced on the current directory. If it is @code{null} any messages +are printed on stderr. The function returns a pointer to the +@code{UserData} structure. This pointer is @code{void} and must be +cast to @code{(UserData *)} in the calling program. To access the +information in this structure you must include @file{louisxml.h}. This +function is used by @command{xml2brl}. ++@node lbx_translateString, lbx_translateFile, lbx_initialize, Programming with liblouisxml
+@section lbx_translateString + +@findex lbx_translateString +@example +int lbx_translateString ( + const char *const configfilelist, + char * inbuf, + widechar *outbuf, + int *outlen, + unsigned int mode) +@end example + +This function takes a well-formed xml expression in @code{inbuf} and +translates it into a string of 16-bit (or 32-bit if this has been +specified in liblouis) braille characters in @code{outbuf}. The xml +expression must be immediately followed by a zero or null byte. +Leading whitespace is ignored. If it does not then begin with the +characters @samp{<?xml} an xml header is added. If it does not begin +with @samp{<} it is assumed to be a text string and is translated +accordingly. The header is specified by the xmlHeader line in the +configuration file. If no such line is present, a default header +specifying UTF-8 encoding is used. The @code{mode} parameter specifies +whether you want the library to be initialized. If it is 0 everything +is reset, the @file{canonical.cfg} file is processed and the +configuration file and/or string (see previous section) are processed.+If @code{mode} is 1 liblouisxml simply prepares to handle a new document. For
+more on the @code{mode} parameter see the next section. + +Which 16-bit character in @code{outbuf} represents which dot pattern +is indicated in the liblouis translation tables. The +@code{configfilelist} parameter points to a configuration file or +string. Among other things, this file specifies translation tables. It +is these tables which control just how the translation is made, +whether in Grade 2, Grade 1, the Nemeth Code of Braille Mathematics or +something else. + +Note that the @code{*outlen} parameter is a pointer to an integer. +When the function is called, this integer contains the maximum output +length. When it returns, it is set to the actual length used. The +function returns 1 if no errors were encountered and a negative number +if a complete translation could not be done. ++@node lbx_translateFile, lbx_translateTextFile, lbx_translateString, Programming with liblouisxml
+@section lbx_translateFile + +@findex lbx_translateFile +@example +int lbx_translateFile ( + char *configfilelist, + char *inputFileName, + char *outputFileName, + unsigned int mode) +@end example + +This function accepts a well-formed xml document in +@code{inputFilename} and produces a braille translation in +@code{outputFilename}. As for @code{lbx_translateString}, the +@code{mode} parameter specifies whether the library is to be +initialized with new configuration information or simply prepared to +handle a new document. In addition, the @code{mode} parameter can +specify that a document is in html, not xhtml. @file{liblouisxml.h} +contains an enumeration type with the values @code{dontInit} and +@code{htmlDoc}. These can be combined with an or (@samp{|}) operator. The +input file is assumed to be encoded in UTF-8, unless otherwise +specified in the xml header. The encoding of the output file may be +UTF-8, UTF-16, UTF-32 or Ascii-8. This is specified by the +@code{outputEncoding} line in the configuration file, +@code{configfilelist}. The function returns 1 if the translation was +successful. ++@node lbx_translateTextFile, lbx_backTranslateFile, lbx_translateFile, Programming with liblouisxml
+@section lbx_translateTextFile + +@findex lbx_translateTextFile +@example +int lbx_translateTextFile ( + char *configfilelist, + char *inputFileName, + char *outputFileName, + unsigned int mode) +@end example + +This function accepts a text file in @code{inputFilename} and produces +a braille translation in @code{outputFilename}. The input file is +assumed to be encoded in Ascii8. Blank lines indicate the divisions +between paragraphs. Two blank lines cause a blank line between +paragraphs (or headers). The output file may be in UTF-8, UTF-16, or +Ascii8, as specified by the @code{outputEncoding} line in the +configuration file, @code{configfilelist}. As for +@code{lbx_translateString}, the @code{mode} parameter specifies +whether complete initialization is to be done or simply initialization +for a new document. ++@node lbx_backTranslateFile, lbx_free, lbx_translateTextFile, Programming with liblouisxml
+@section lbx_backTranslateFile + +@findex lbx_backTranslateFile +@example +int lbx_backTranslateFile ( + char *configfilelist, + char *inputFileName, + char *outputFileName, + unsigned int mode) +@end example + +This function accepts a braille file in @code{inputFilename} and +produces a back-translation in @code{outputFilename}. The input file +is assumed to be encoded in Ascii8. The output file is in either plain +text or html, according to the setting of @code{backFormat} in the +configuration file. Html files are encoded in UTF8. In plain-text, +blank lines are inserted between paragraphs. The output file may be in +UTF-8, UTF-16, or Ascii8, as specified by the @code{outputEncoding} +line in the configuration file, @code{configfilelist}. The mode +parameter specifies whether or not the library is to be initialized +with new configuration information, as described in the section on +@code{lbx_translateString} (@pxref{lbx_translateString}). + +@node lbx_free, , lbx_backTranslateFile, Programming with liblouisxml +@section lbx_free + +@findex lbx_free +@example +void lbx_free (void) +@end example + +This function should be called at the end of the application to free +all memory allocated by liblouisxml and liblouis. If you wish to +change configuration files during your application, use a @code{mode} +parameter of 0 on the function call using the new configuration +information. ++@node Transcribing with the xml2brl program, Customization Configuring liblouisxml, Programming with liblouisxml, Top
+@chapter Transcribing with the xml2brl program +@pindex xml2brl + +At the moment, actual transcription with liblouisxml is done with the +command-line (or console) program @command{xml2brl}. The line to type +is: + +@example +xml2brl [OPTIONS] [-f config-file] [infile] [outfile] +@end example + +The brackets indicate that something is optional. You will see that +nothing is required except the program name itself, @command{xml2brl}. +The various optional parts control how the program will behave, as +follows: + +@table @option + +@item -h +This option causes @command{xml2brl} to print a help message +describing usage and exit. + +@item -l +This option will cause @command{xml2brl} and liblouisxml to print +error messages to @file{xml2brl.log} instead of stderr. The file will +be in the current directory. This option is particularly useful if +@command{xml2brl} is called by a GUI script or Web application. + +@item -f configfile +This specifies the configuration file which tells @command{xml2brl} +how to do the transcription. (It may be a list of file names separated +by commas.) This file specifies such things as the number of cells per +line, the number of lines per page, The translation tables to be used, +how paragraphs and headings are to be formatted, etc. If this part of +the command line is omitted, @command{xml2brl} assumes that the +configuration file is named @file{default.cfg} and is in the current +directory. If the configuration file name contains a pathname +@command{xml2brl} will consider this as a path on which to look for +files that it needs (@pxref{Files and Paths}). + +@item -Csetting=value +This option enables you to specify configuration settings on the +command line instead of changing the configuration file. You can use +as many @option{-C} options as you wish. Any settings can be specified +except those having to do with styles. The settings may be in any +order. They override any settings in @file{canonical.cfg} or in the +configuration file used by @command{xml2brl}. + +@item -b +back-translate. The input file must be a braille file, such as +@file{.brf}. The output file is a back-translation of this file. It +may be in either plain-text or xhtml (html), according to the setting +of backFormat in the outputFormat section of the configuration file. +Html files will contain page numbers and emphasis. To get good html, +the liblouis table must have the entry @samp{space \e 1b} so that it +will pass through escape characters. The @file{html.sem} file must +also contain the line @samp{pagenum pagenum}. Text output files simply +have a blank line between paragraphs. Encoding of text files is +controlled by the outputEncoding setting. Html files are always in +UTF-8. + +@item -r +Reformat. The input file must be a braille file, such as @file{.brf}. +The output is a braille file formatted according to the configuration +file. It is advisable to set backFormat to html, since this will +preserve print page numbers and emphasis. This program can be useful +for changing the line length and page length of a braille file, for +example, from 40 to 32 cells. It is also an excellent way to check the +accuracy of liblouis tables. The original page numbers at the tops and +bottoms of pages are discarded, and new ones are generated. + +@item -p +Poorly formatted input translation. Infile is any text file such as may +have been obtained by extracting the text in a pdf file. The input +file may also be an xml or html file which is so poorly formatted that +better braille can be obtained by ignoring the formatting. +@command{xml2brl} tries to guess paragraph breaks. The output is +generally reasonably formatted, that is, with reasonable paragraph +breaks. + +@item -t +The document is an h(t)ml file, not xhtml. This option is useful with +files downloaded from the Web in source form. Without it, the program +will first try to parse the file as an xml document, producing lots of +error messages. It will then try the html parser. With this option, it +goes directly to the html parser. See also the formatFor configuration +(@pxref{formatFor setting}) file setting, which enables you to format +the braille output for viewing in a browser. + +@item infile +This is the name of the input file containing the material to be +transcribed. The file may be either an xml file or a text file. The +@option{-b}, @option{-r} and @option{-p} options discussed above +provide for other types of files and processing. Typical xml files are +those provided by @uref{www.bookshare.org} or those derived from a +word processor by saving in xml format. If a text file is used +paragraphs and headings should be separated by blank lines. In such a +file there is no way to distinguish between paragraphs and headings, +so they will all be formatted as paragraphs, as specified by the +configuration file. However, if you want a blank line in the braille +transcription use two consecutive blank lines in the text file. + +@item outfile +This is the name of the output file. It will be transcribed as +specified by the configuration file and the configuration settings. +The following paragraphs provide more information on both the input +and output files. + +@end table + +@command{xml2brl} is set up so that it can be used in a "pipe". To do +this, omit both infile and outfile. Input is then taken from the +standard input unit. + +The first file name encountered (a word not preceded by a minus sign) +is taken to be the input file and the second to be the output file. If +you wish input to be taken from stdin and still want to specify an +output file use two minus signs (@samp{--}) for the input file. + +If only the program name is typed @command{xml2brl} assumes that the +configuration file is @file{default.cfg}, input is from the standard +input unit, and output is to the standard output unit. + +@menu +* Transcribing Microsoft Word Files with msword2brl:: +@end menu ++@node Transcribing Microsoft Word Files with msword2brl, , Transcribing with the xml2brl program, Transcribing with the xml2brl program
+@section Transcribing Microsoft Word Files with msword2brl +@pindex msword2brl + +@example +msword2brl infile outfile +@end example + +Infile must be a Microsoft Word file. The script first calls the +@command{antiword} program, so you must have this installed on your +machine. @command{antiword} is called with @option{-x db}, which +causes the output to be in docbook format. This is piped to +@command{xml2brl}. The output file from @command{xml2brl} contains +much of the formatting, including emphasis, of the word file. ++@node Customization Configuring liblouisxml, Connecting with the xml Document - Semantic-Action Files, Transcribing with the xml2brl program, Top
+@chapter Customization: Configuring liblouisxml + +The operation of liblouisxml is controlled by two types of files: +semantic-action files and configuration files. The former are +discussed in the section Connecting with the xml Document - +Semantic-action Files (@pxref{Connecting with the xml Document - +Semantic-Action Files}). The latter are discussed in this section. A +third type of file, braille translation tables, is discussed in the +liblouis documentation (@pxref{Top, , Overview, liblouis-guide, +Liblouis Programmer's and User's Guide}). Another section of the +present document which may be of interest is Implementing Braille +Mathematical Codes (@pxref{Implementing Braille Mathematics Codes}). + +liblouisxml (with liblouis) can be used as the braille transcription +component in any number of applications with different overall +purposes and user interfaces. However, as of now the principal +application is @command{xml2brl}, which is a console application for +Mac and Linux. (There is also a Mac GUI application called louis.) The +information below therefore applies to @command{xml2brl} as much as to +liblouisxml. + +Before discussing configuration files in detail it is worth noting +that the application program has access to the information in the +configuration files by calling the liblouisxml function +@code{lbx_initialize}. This function returns a pointer to a data +structure containing the configuration information. + +@command{xml2brl} uses the configuration file @file{default.cfg} +unless a different one is specified via the @option{-f} command-line +option. The configuration file name may include a full path. In this +case, liblouisxml will consider this to be the user path. (This can be +changed at compile time (@pxref{Files and Paths}). If just a file name +(or list) is given, liblouisxml will consider the current directory as +the user path. + +The configuration "file" specified with the @option{-f} option need +not be a single filename. It can be several file names separated by +commas. Only the first filename may have a path component. This path +is taken as the user path, as discussed in the previous paragraph. +This file-list feature is also found in liblouis. It enables you to +combine configuration files on the command line. For example, a file +list may consist of one file specifying the output format used in your +establishment, a comma, and then the name of a stylesheet. + +After the path, if any, has been evaluated, but before reading any of +the files, liblouisxml reads in a file called @file{canonical.cfg}. +This file specifies values for all possible settings. It is needed to +complete the initialization of the program. You may alter the values +in the distribution @file{canonical.cfg}, but you should not delete +any settings. If a configuration file read in later contains a +particular setting name, the value specified simply replaces the one +specified in @file{canonical.cfg}. + +As you will see by looking at @file{canonical.cfg}, it contains four +main sections, outputFormat, translation, xml and styles. In addition, +a configuration file can contain an include entry. This causes the +file named on that line to be read in at the point where the line +occurs. The sections need not follow each other in any particular +order, nor is the order of settings within each section important. In +this document and in the @file{canonical.cfg} file, where section and +setting names consist of more than one word, the first letter of each +word following the initial one is capitalized. This is merely for +readability. The case of the letters in these names is ignored by the +program. Section and setting names may not contain spaces. + +Here, then, is an explanation of each section and setting in the +@file{canonical.cfg} file. When you look at this file you will see +that the section names start at the left margin, while the settings +are indented one tab stop. This is done for readability. it has no +effect on the meaning of the lines. You will also see lines beginning +with a number sign (@samp{#}), which are comments. Blank lines can +also be used anywhere in a configuration file. In general, a section +name is a single word or combination of unspaced words. However, each +style has a section of its own, so the word @samp{style} is followed +by the name of the style. Setting lines begin with the name of the +setting, followed by at least one space or tab, followed by the value +of the setting. A few settings have two values. + +@menu +* outputFormat:: +* translation:: +* xml:: +* style:: +@end menu ++@node outputFormat, translation, Customization Configuring liblouisxml, Customization Configuring liblouisxml
+@section outputFormat + +This section specifies the format of the output file (or string, if no +file name is given). + +@table @code + +@setting{cellsPerLine, 40} +The number of cells in a braille line. + +@setting{LinesPerPage, 25} +The number of lines on a braille page + +@setting{interpoint, no} +Whether or not the output will be used to produce interpoint braille. +This affects the placement of page numbers and may affect other things +in the future. The only two values recognized are @samp{yes} and +@samp{no}. + +@setting{lineEnd, \\r\\n} +This specifies the control characters to be placed at the end of each +output line. These characters vary from one intended use of the output +to another. Most embossers require the carriage-return and line-feed +combination specified above. However, a braille display may work best +with just one or the other. Any valid control characters can be +specified. + +@setting{pageEnd, \\f} +The control Character to be given at the end of a page. Here it is a +forms-feed character, but it can be something else if deeded. + +@setting{fileEnd, ^z} +The control character to be placed at the end of the file, here a +control-z. + +@setting{printPages, yes} +Whether or not to show print page numbers if they are given in the xml +input. The two valid values are @samp{yes} and @samp{no}. + +@setting{braillePages, yes} +Whether or not to format the output into pages. Here the value is +@samp{yes}, for use with an embosser. However the user of a braille +display may wish to specify @samp{no}, so as not to be bothered with +page numbers and forms feed characters. If no is specified the lines +will still be of the length given in callsPerLine, but the value of +linesPerPage will be ignored. + +@setting{paragraphs, yes} +Whether or not to format the output into paragraphs, using appropriate +styles. If @samp{no} is specified, what would be a paragraph is output +simply as one long line. Applications that wish to do their own +formatting may specify @samp{no}. + +@setting{BeginingPageNumber, 1} +This is the number to be placed on the first Braille page if +braillePages is yes. This is useful when producing multiple Braille +volumes. + +@setting{printPageNumberAt, top} +If print page numbers are given in the xml input file they will be +placed at the top of each braille page in the right-hand corner. A +page separator line will also be produced on the braille page where +the print page break actually occurs. You may also specify +@samp{bottom} for this setting. + +@setting{braillePageNumberAt, bottom} +The braille page number will be placed in the bottom right-hand corner +of each page. If interpoint yes has been specified only odd pages will +receive page numbers. If you specify @samp{top} for this setting then +@samp{bottom} must be specified for printPageNumberAt. + +@setting{hyphenate, no} +If @samp{yes} is specified words will be hyphenated at the ends of +lines if a hyphenation table is available. In contracted English +Braille hyphenation is not generally used, but it can save +considerable space. The hyphenation table is specified as part of the +table list in the literaryTextTable setting of the translation +section. + +@setting{outputEncoding, ascii8} +This specifies that the output is to be in the form of 8-bit ASCII +characters. This is generally used if the output is intended directly +for a braille embosser or display. The other values of encoding are +@samp{UTF8}, @samp{UTF16} and @samp{UTF32}. These are useful if the +application will process the output further, such as for generating +displays of braille dots on a screen. + +@setting{inputTextEncoding, ascii8} +This setting is used to specify the encoding of an input text file. +The valid values are @samp{UTF8} and @samp{ascii8}. + +@anchor{formatFor setting} +@setting{formatFor, textDevice} +This setting specifies the type of device the output is intended for. +@samp{textDevice} is any device that accepts plain text, including +embossers. You can also specify @samp{browser}. In this case the +output will be formatted for viewing in a browser. If the input file +contains links, they will be preserved and can be used in the normal +way. The text will be translated into braille with the correct line +length. Math and computer material will be translated appropriately. +These files work well in lynx and Internet Explorer, not so well in +elinks and Firefox. + +@setting{backFormat, plain} +This setting specifies the format of back-translated files. +@samp{Plain} specifies plain-text, while @samp{html} specifies xhtml. +The latter is always encoded in UTF-8. Plain-text files can be encoded +in ascii8, UTF-8 or UTF-16. Html is strongly recommended, since it +will preserve print page numbering and emphasis. + +@setting{backLineLength, 70} +This setting specifies the length of lines in back-translated files, +whether in plain-text or html. This is mainly for human readability. +Lines may sometimes be somewhat longer. + +@setting{interline, no} +This setting specifies whether interlining is desired. If it is set to +@samp{yes}, the first line in the output will be a braille +translation, the next line will be its back-translation according to +the interlineBackTable. Back-translation is used instead of simply +presenting the print original because a braille line may contain +additional information, such as leading blanks, print or braille page +numbers, print page separator lines, etc. + +@end table + +@node translation, xml, outputFormat, Customization Configuring liblouisxml +@section translation + +This section specifies the liblouis translation tables to be used for +various purposes. + +@table @code + +@setting{literaryTextTable, en-us-g2.ctb} +The table used for producing literary braille. This may be either +contracted or uncontracted. + +@setting{uncontractedTable, en-us-g1.ctb} +The table used for producing uncontracted or Grade One braille. This +setting appears to be superfluous and may be eliminated in the future. + +@setting{compbrailleTable, en-us-compbrl.ctb} +The table used for producing large amounts of output in computer +braille, such as computer programs. The computer braille table is +usually combined with one of the two tables above. + +@setting{mathtextTable, en-us-mathtext.ctb} +This table specifies how the non-mathematical parts of math books are +to be translated. In many cases it will be the same as +literaryTextTable or uncontractedTable. For books translated with the +Nemeth Code it is different, because this code requires modification +of standard Grade Two. + +@setting{MathexpTable, nemeth.ctb} +This is the table used to translate mathematical expressions. + +@setting{editTable, edittable.ctb} +When the output includes both mathematics and text there may be errors +where one type of translation directly follows another. The editTable +removes these errors. + +@setting{interlineBackTable, en-us-interline.ctb} +This setting specifies the table to be used for back-translation when +interlining is turned on. It must be tailored for this purpose, since +an ordinary forward-translation table may contain entries that do not +handle the additional information in braille lines correctly. + +@end table + +@node xml, style, translation, Customization Configuring liblouisxml +@section xml + +This section provides various information for the processing of xml files. + +@table @code + +@setting{semanticFiles, *\,nemeth.semm} +This setting gives a list of semantic-action files. These files are +read in the sequence given in the list. Here the first member of the +list is an asterisk (@samp{*}). This means that the corresponding file +is to be named by taking the root element of the document and +appending @samp{.sem}. This asterisk member may occur anywhere in the +list. + +@setting{xmlheader, <?xml version='1.0' encoding='UTF8' standalone='yes'?>} +This line gives the xml header to be added to strings produced by +programs like @command{Mathtype} that lack one. + +@setting{entity, nbsp ^1} +This line defines an entity or substitution in an xml file. It is one +of those that has two values. The first is the thing to be replaced, +and the second is the replacement. As many entity lines as necessary +can be used. The information they contain is added to the information +provided by xmlHeader. In @file{canonical.cfg} this line is commented +out, because specifying it at this point would prevent the user from +specifying his own xmlheader. + +@setting{internetAccess, yes} +The computer has an internet connection and liblouisxml may obtain +information necessary for the processing of this file from the +Internet. If this setting is @samp{no} liblouisxml will not try to use +the internet. The necessary information may, however, be provided on +the local machine in the form of a "dtd" file. + +@setting{newEntries, yes} +liblouis may create a new semantic-action file (beginning with +@file{new_}) for a document with an unknown root element or a file +(beginning with @file{appended_}) containing new entries for an +existing semantic-action file. Both kinds of files are placed on the +current directory. If this setting is @samp{no} liblouisxml will dot +create a file of new entries and if it encounters a document with an +unknown root element it will issue an error message. Setting +newEntries to @samp{no} may be useful if users should not be bothered +with the minutiae of semantic-action files. + +@end table + +@node style, , xml, Customization Configuring liblouisxml +@section style + +The following sections all deal with styles. Each style has its own +section. Style section names are unlike other section names in that +they consist of the word style, followed by a space, followed by a +style name. More styles may be added as the software develops, and +some may be dropped. + +@subsection style document + +This section specifies the style of the whole document. The settings +given in it are applied to all other styles. If a section for another +style is given, the settings in it replace those from the document +style for that section. Because the settings in the document style +apply to all other styles, if a document style section is given it +must precede the sections for all other styles. + +@table @code + +@setting{linesBefore, 0} + +This setting gives the number of blank lines which should be left +before the text to which this style applies. It is set to a non-zero +value for some header styles. + +@setting{linesAfter, 0} + +The number of blank lines which should be left after the text to which +this style applies. + +@setting{leftMargin, 0} + +The number of cells by which the left margin of all lines in the text +should be indented. Used for hanging indents, among other things. + +@setting{firstLineIndent, 0} + +The number of cells by which the first line is to be indented relative +to leftMargin. firstLineIndent may be negative. If the result is less +than 0 it will be set to 0. + +@setting{translate, contracted} + +This setting is currently inactive. It may be used in the future. This +setting tells how text in this style should be translated. Possible +values are @samp{contracted}, @samp{uncontracted}, @samp{compbrl}, +@samp{mathtext} and @samp{mathexpr}. + +@setting{skipNumberLines, no} + +If this setting is @samp{yes} the top and bottom lines on the page +will be skipped if they contain braille or print page numbers. This is +useful in some of the mathematical and graphical styles. + +@setting{format, leftJustified} + +The format setting controls how the text in the style will be +formatted. Valid values are @samp{leftJustified}, +@samp{rightJustified}, @samp{centered}, @samp{computerCoded}, +@samp{alignColumnsLeft}, @samp{alignColumnsRight}, @samp{listColumns} +and @samp{listLines}. The first three are self-explanatory. +@samp{computerCoded} is used for computer programs and similar +material. The next three are used for tabular material. +@samp{alignColumnsLeft} causes the left ends of columns to be aligned. +@samp{alignColumnsRight} causes the right ends of columns to be +aligned. @samp{listColumns} causes columns to be placed one after the +other, separated by whatever separation character has been specified +in the semantic-action file, followed by a space. An escape character +(hex 1b) must also be specified to indicate the end of the column. Two +escape characters must be specified to indicate the end of a row. +Indentation of the lines in a row is controlled by the leftMargin and +firstLineIndent settings. @samp{listLines} is similar except that it +lists lines, as in poetry stanzas. The semantic-action file must +specify two escape characters to indicate the end of a line. + +@setting{newPageBefore, no} + +If this setting is @samp{yes}, the text will begin on a new page. This +is useful for certain mathematical and graphical styles. Page numbers +are handled properly. + +@setting{newPageAfter, no} + +If this setting is @samp{yes} any remaining space on the page after +the material covered by this style is handled is left blank, except +for page numbers. + +@setting{rightHandPage, no} + +if this setting is @samp{yes} and interpoint is yes the material +covered by this style will start on a right-hand page. This may cause +a left-hand page to be left blank except for page numbers. If +interpoint is @samp{no} this setting is equivalent to newPageBefore. + +@end table + +@subsection style arith + +This style is used for arithmetic examples in elementary math books. +On recognizing this style, the translator formats the material in a +special way. This style has no settings different from those of the +document style at the moment. Nevertheless, the line @samp{style +arith} must be included in @file{canonical.cfg} so that it will be set +up properly. + +@subsection style attribution + +This style is used for an attribution following a quotation. + +@table @code + +@setting{format, rightJustified} + +@end table + +@subsection style biblio + +This style is used for bibliographies. Settings will be added later. + +@subsection style caption + +This style is used for picture captions. + +@table @code + +@setting{leftMargin, 4} + +@setting{firstLineIndent, 2} + +Note that the first line is actually indented six cells. + +@end table + +@subsection style code + +This style is used for computer programs. + +@table @code + +@setting{skipNumberLines, yes} + +@setting{linesBefore, 1} + +@setting{linesAfter, 1} + +@setting{format, computerCode} + +@end table + +@subsection style contents + +This is for entries in a table of contents. + +@subsection style dedication + +This style is for the dedication of a book. + +@table @code + +@setting{newPageBefore, yes} + +@setting{newPageAfter, yes} + +@setting{center, yes} + +@end table + +@subsection style directions + +This is for giving directions for exercises. + +@subsection style dispmath + +This is for showing mathematics that is set off from the text. + +@table @code + +@setting{leftMargin, 2} + +@end table + +@subsection style disptext + +This if for text that is set off from the rest of the text. + +@table @code + +@setting{leftMargin, 2} + +@setting{firstLineIndent, 2} + +@end table + +@subsection style exercise 1 + +This is the first level in a set of exercises where there are sublevels. + +@table @code + +@setting{leftMargin, 2} + +@setting{firstLineIndent, -2} + +@end table + +@subsection style exercise2 ++This is for the second level of exercises, such as exercise a following exercise 1.
+ +@table @code + +@setting{leftMargin, 4} + +@setting{firstLineIndent, -2} + +@end table + +@subsection style exercise3 + +This is for the third level of exercises. + +@table @code + +@setting{leftMargin, 6} + +@setting{firstLineIndent, -2} + +@end table + +@subsection style glossary + +This is for a glossary. + +@table @code + +@setting{firstLineIndent, 2} + +Section: style graph + +This style reserves space for a graph or other tactile material. + +@setting{skipNumberLines, yes} + +@end table + +@subsection style graphLabel + +This style reserves space for the label of a graph. + +@subsection style heading1 + +This style is used for main headings, such as chapter titles. + +@table @code + +@setting{linesBefore, 1} + +@setting{center, yes} + +@setting{linesAfter, 1} + +@end table + +@subsection style heading2 + +The first level of subreadings after the main heading. + +@table @code + +@setting{linesBefore, 1} + +@setting{firstLineIndent, 4} + +@end table + +@subsection style heading3 + +The third level of headings. + +@table @code + +@setting{firstLineIndent, 4} + +@end table + +@subsection style heading4 + +The fourth and final level of headings. + +@table @code + +@setting{firstLineIndent, 4} + +@end table + +@subsection style indexx + +This style is used for indexes. The extra @samp{x} is not an error. It +is there to prevent conflict with names elsewhere in the software. + +@subsection style list + +This is for the individual items in a list. + +@table @code + +@setting{firstLineIndent, -2} + +@setting{leftMargin, 2} + +@end table + +@subsection style matrix + +This style causes its contents to be formatted in a way suitable for +the representation of matrices. + +@table @code + +@setting{format, alignColumnsLeft} + +@end table + +@subsection style music + +This style is used for braille music. + +@table @code + +@setting{skipNumberLines, yes} + +@end table + +@subsection style note + +This style is used for footnotes. + +@subsection style para + +Paragraph. This is ordinary body text. + +@table @code + +@setting{firstLineIndent, 2} + +@end table + +@subsection style quotation + +This style is used for quotations that are set off from the rest of +the text. + +@table @code + +@setting{linesBefore, 1} + +@setting{linesAfter, 1} + +@end table + +@subsection style section + +This style is used for a section with a section number. + +@table @code + +@setting{firstLineIndent, 4} + +@end table + +@subsection style spatial + +This style is used for mathematical material that is arranged +spatially, such as large fractions. + +@subsection style stanza + +this style is used for stanzas in poetry. + +@table @code + +@setting{linesBefore, 1} + +@setting{linesAfter, 1} + +@setting{format, listLines} + +@end table + +@subsection style style1 + +This and the subsequent numbered styles can be used by the user for +any purpose. + +@subsection style style2 + +@subsection style style3 + +@subsection style style4 + +@subsection style style5 + +@subsection style subsection + +This style is used for subsections with a subsection number. + +@table @code + +@setting{firstLineIndent, 4} + +@end table + +@subsection style table + +This style is used for ordinary tables. + +@subsection style titlepage + +This style is used to begin a title page. + +@table @code + +@setting{newPageAfter, yes} + +@end table + +@subsection style trnote + +This style is used for transcriber's notes which are set off from the +text. + +@subsection style volume + +This style is used to indicate the beginning of a braille volume. ++@node Connecting with the xml Document - Semantic-Action Files, Implementing Braille Mathematics Codes, Customization Configuring liblouisxml, Top
+@chapter Connecting with the xml Document - Semantic-Action Files + +When liblouisxml (or @command{xml2brl}) processes an xml document, it +needs to be told how to use the information in that document to +produce a properly translated and formatted braille document. These +instructions are provided by a semantic-action file, so called because +it explains the meaning, or semantics, of the various specifications +in the xml document. To understand how this works, it is necessary to +have a basic knowledge of the organization of an xml document. + +An xml document is organized like a book, but with much finer detail. +first there is the title of the whole book. Then there are various +sections, such as author, copyright, table of contents, dedication, +acknowledgments, preface, various chapters, bibliography, index, and +so on. Each chapter may be divided into sections, and these in turn +can be divided into subsections, subsubsections, etc. In a book the +parts have names or titles distinguished by capitalization, type +fonts, spacing, and so forth. In an xml document the names of the +parts are enclosed in angle brackets (@samp{<>}). for example, if +liblouisxml encounters @code{<html>} at the beginning of a document, +it knows it is dealing with a document that conforms to the standards +of the extensible markup language (xhtml) - at least we hope it does. +When you see a book, you know it's a book. The computer can know only +by being told. Something enclosed in angle brackets is called an +"element" (more properly, a "tag") in xml parlance. (There may be more +between the angle brackets than just the name of the element. More of +this later). The first "element" in a document thus tells liblouisxml +what kind of document it is dealing with. This element is called the +"root element" because the document is visualized as branching out +from it like a tree. Some examples of root elements are @code{<html>}, +@code{<math>}, @code{<book>}, @code{<dtbook3>} and +@code{<wordDocument>}. Whenever liblouisxml encounters a root element +that it doesn't know about it creates a new file called a +semantic-action file. The name of this file is formed by stripping the +angle brackets from the root element and adding a period plus the +letters @samp{sem}. If you look in a directory containing +semantic-action files you will see names like @file{html.sem}, +@file{dtbook3.sem}, @file{math.sem}, and so on. + +Sometimes it is advantageous to preempt the creation of a +semantic-action file for a new root element. For example, an article +written according to the docbook specification may have the root +element @code{<article>}. However, the specification itself has the +root element @code{<book>}. In this case you can specify the +@file{book.sem} file in the configuration file by writing, in the xml +section,: + +@example +semanticFiles book.sem +@end example + +You will note that this setting uses the plural of "file". This is +because you can actually specify a list of file names separated by +commas. You might want to do this to specify the semantic-action file +for the particular braille mathematical code to be used. For example: + +@example +semanticFiles book.sem,ukmath.sem +@end example + +As you will see in the next section, different braille style +conventions and different braille mathematical codes may require +different semantic-action files + +liblouisxml records the names of all elements found in the document in +the semantic-action file. The document has a multitude of elements, +which can be thought of as describing the headings of various parts of +the document. One element is used to denote a chapter heading. Another +is used to denote a paragraph, Still another to denote text in bold +type, and so on. In other words, the elements take the place of the +capitalization, changes in type font, spacing, etc. in a book. +However, The computer still does not know what to do when it +encounters an element. The semantic-action file tells it that. + +Consider @file{html.sem}. A copy is included as part of this +documentation with the name @file{example_sem}. It may differ from the +file that liblouisxml is currently using. You will see that it begins +with some lines about copyrights. Each line begins with a number sign +(@samp{#}). This indicates that it is a "comment," intended for the +human reader and the computer should ignore it. Then there is a blank +line. Finally, there are two other comments explaining that the file +must be edited to get proper output. This is because a human being +must tell the computer what to do with each element. The semantic +files for common types of documents have already been edited, so you +generally don't have to worry about this. But if you encounter a new +type of document or wish to specify special handling for styles or +mathematics you may have to edit the semantic-action file or send it +to the maintainer for editing. In any case the rest of this section is +essential for understanding how liblouisxml handles documents and for +making changes if the way it does so is not correct. + +After another blank line you will see a table consisting of two, and +sometimes three, columns. The first column contains a word which tells +the computer to do something. For example, the first entry in the +table is: @samp{include nemeth.sem}. This tells liblouisxml to include +the information in the @file{nemeth.sem} file when it is deciphering +an html (actually xhtml) document (it may be preferable to use the +semanticFiles setting in the configuration file rather than an +include). + +The second row of the table is: + +@example +no hr +@end example + +@samp{hr} is an element with the angle brackets removed. It means +nothing in itself. However, the first column contains the word +@samp{no}. This tells liblouisxml "no do", that is, do nothing. + +After a few more lines with @samp{no} in the first column, we see one +that says: + +@example +softreturn br +@end example + +This means that when the element @code{<br>} is encountered, +liblouisxml is to do a soft return, that is, start a new line without +starting a new paragraph. + +The next line says: + +@example +heading1 h1 +@end example + +This tells liblouisxml that when it encounters the element @code{<h1>} +it is to format the text which follows as a first-level braille +heading, that is, the text will be centered and proceeded and followed +by blank lines. (You can change this by changing the definition of the +heading1 style). + +The next line says: + +@example +italicx em +@end example + +This tells liblouisxml that when it encounters the element @code{<em>} +it is to enclose the text which follows in braille italic indicators. +The @samp{x} at the end of the semantic action name is there to +prevent conflicts with names elsewhere in the software. Just where the +italic indicators will be placed is controlled by the liblouis +translation table in use. + +The next line says: + +@example +skip style +@end example + +This tells liblouis to simply skip ahead until it encounters the +element @code{</style>}. Nothing in between will have any effect on +the braille output. Note the slash (@samp{/}) before the @samp{style}. +This means the end of whatever the @code{<style>} element was +referring to. Actually, it was referring to specifications of how +things should be printed. If liblouisxml had not been told to skip +these specifications, the braille output would have contained a lot of +gobledygook. + +The next line says: + +@example +italicx strong +@end example + +This tells liblouis to also use the italic braille indicators for the +text between the @code{<strong>} and @code{</strong>} elements. + +After a few more lines with @samp{no} in the first column we come to +the line: + +@example +document html +@end example + +This tells liblouisxml that everything between @code{<html>} and +@code{</html>} is an entire document. @code{<html>} was the root +element of this document, so this is logical. + +After another @samp{no} line we come to: + +@example +para p +@end example + +liblouisxml will consider everything between @code{<p>} and +@code{</p>} to be a normal body text paragraph. + +The next line is: + +@example +heading1 title +@end example + +this causes the title of the document to also be treated as a braille +level 1 heading. + +Next we have the line: + +@example +list li +@end example + +The xhtml @code{<li>} and @code{</li>} pair of elements is used to +enclose an item in a list. liblouisxml will format this with its own +list style. That is, the first line will begin at the left margin and +subsequent lines will be indented two cells. + +Next we have: + +@example +table table +@end example + +You will note that the names of actions and elements are often +identical. This is because they are both mnemonic. In any case, this +line tells liblouisxml to format the table contained in the xhtml +document according to the table formatting rules it has been given for +braille output. + +Next we have the line: + +@example +heading2 h2 +@end example + +This means that the text between @code{<h2>} and @code{</h2>} is to be +formatted according to the Liblouisxml style heading2. A blank line +will be left before the heading and the first line will be indented +four spaces. + +After a few more lines we come to: + +@example +no table,cellpadding +@end example + +Note the comma in the second column. This divides the column into two +subcolumns. The first is the table element name. The second is called +an "attribute" in xml. It gives further instructions about the +material enclosed between the starting and ending "tags" of the +element (@code{<table>} and @code{</table>}. Full information requires +three subcolumns. The third is called the value and gives the actual +information. The attribute is merely the name of the information. + +Much further down we find: + +@example +no table,border,0 +@end example + +Here the element is table, the attribute is border and the value is 0. +If liblouisxml were to interpret this, it would mean that the table +was to have a border of 0 width. It is not told to do so because +tables in braille do not have borders. + +Now let's look at the file which is included at the beginning of the +@file{html.sem} file. This is @file{nemeth.sem}. As with +@file{html.sem}, a copy is included in the documentation directory +with the name @file{example_nemeth.sem} , but it is not necessarily +the one that liblouisxml is currently using. It illustrates several +more things about how liblouisxml uses semantic-action files. + +The first thing you will notice is that for quite a few lines the +first and second columns are identical. This is because the MathML +element and attribute names are part of a standard, and it was +simplest to use the element names for the semantic actions as well. + +The first line of real interest is: + +@example +math math +@end example + +Every mathematical expression begins with the element @code{<math>} +(which may have attributes and values), and ends with @code{</math>}. +This is therefore the root element of a mathematical expression. +However, mathematical expressions are usually part of a document, so +it is not given the semantic action document. The math semantic action +causes liblouisxml to carry out special interpretation actions. These +will become clearer as we continue to look at the @file{nemeth.sem} +file. You will note that this line has three columns. The meaning of +the third column is discussed below. + +After another uninteresting line we come to two that illustrate +several more facts about semantic-action files: + +@example +mfrac mfrac ^?,/,^# +mfrac mfrac,linethickness,0 ^(,^;%,^) +@end example + +Like the math entry above, the first line has three columns. While the +first two columns must always be present, the third column is +optional. Here, it is also divided into subcolumns by commas. The +element @code{<mfrac>} indicates a fraction. A fraction has two parts, +a numerator and a denominator. In xml, we call these parts children of +@code{<mfrac>}. They may be represented in various ways, which need +not concern us here. What is of real importance is that the third +column tells liblouisxml to put the characters @samp{~?} before the +numerator, @samp{/} between the numerator and denominator, and +@samp{~#} after the denominator. Later on, liblouis will translate +these characters into the proper representation of a fraction in the +Nemeth Code of Braille Mathematics. (For other mathematical codes, +@pxref{Implementing Braille Mathematics Codes}). + +The second line is of even greater interest. The first column is again +@samp{mfrac}, but this line is for binomial coefficient. The second +column contains three subcolumns, an element name, an attribute name +and an attribute value. The attribute linethickness specifies the +thickness of the line separating the numerator and denominator. Here +it is 0, so there is no line. This is how the binomial coefficient is +represented in print. The third column tells how to represent it in +braille. liblouisxml will supply @samp{~(}, upper number, @samp{~%}, +lower number, @samp{~)} to liblouis, which will then produce the +proper braille representation for the binomial coefficient. + +Returning to the line for the math element, we see that the third +column begins with a backslash followed by an asterisk. The backslash +is an escape character which gives a special meaning to the character +which follows it. Here the asterisk means that what follows is to be +placed at the very end of the mathematical expression, no matter how +complex it is. + +For further discussion of how the third column is used +@pxref{Implementing Braille Mathematics Codes}. The third column is +not limited to mathematics. It can be used to add characters to +anything enclosed by an xml tag. + +Here is a complete list of the semantic actions which liblouisxml +recognizes. Many of them are also the names of styles. These are +listed first, preceded by an asterisk. For a discussion of these, +@pxref{Customization Configuring liblouisxml}. + +@table @code + +@item * arith +@item * attribution +@item * biblio +@item * blanklinebefore +@item * caption +@item * code +@item * contents +@item * dedication +@item * directions +@item * dispmath +@item * disptext +@item * document +@item * exercise1 +@item * exercise2 +@item * exercise3 +@item * glossary +@item * graph +@item * graphlabel +@item * heading1 +@item * heading2 +@item * heading3 +@item * heading4 +@item * indexx +@item * list +@item * matrix +@item * music +@item * note +@item * para +@item * quotation +@item * section +@item * spatial +@item * stanza +@item * style1 +@item * style2 +@item * style3 +@item * style4 +@item * style5 +@item * subsection +@item * table +@item * titlepage +@item * trnote +@item * volume +@item acknowledge +@item allcaps +@item author +@item blankline +@item bodymatter +@item boldx +@item booktitle +@item boxline +@item cdata +@item center +@item chemistry +@item contracted +@item copyright +@item endnotes +@item footer +@item frontmatter +@item graphic +@item italicx +@item jacket +@item line +@item linkto +@item maction +@item maligngroup +@item malignmark +@item math +@item menclose +@item merror +@item mfenced +@item mfrac +@item mglyph +@item mi +@item mlabeledtr +@item mmultiscripts +@item mn +@item mo +@item mover +@item mpadded +@item mphantom +@item mprescripts +@item mroot +@item mrow +@item ms +@item mspace +@item msqrt +@item mstyle +@item msub +@item msubsup +@item msup +@item mtd +@item mtext +@item mtr +@item munder +@item munderover +@item newpage +@item no +@item noindent +@item none +@item preface +@item rearmatter +@item rightalign +@item righthandpage +@item runninghead +@item semantics +@item skip +@item softreturn +@item specsym +@item tblbody +@item tblcol +@item tblhead +@item tblrow +@item tnpage +@item transcriber +@item uncontracted + +@end table ++@node Implementing Braille Mathematics Codes, Settings Index, Connecting with the xml Document - Semantic-Action Files, Top
+@chapter Implementing Braille Mathematics Codes + +The Nemeth Code of Braille Mathematical and Science Notation has been +implemented. Other braille mathematics codes can be implemented by +following the same pattern. The Nemeth Code implementation is +discussed as an example below. + +Four tables are used to translate xml documents containing a mixture +of text and mathematics into the Nemeth code. They can be found in the +subdirectory @file{lbx_files} of the liblouisxml directory. First, the +semantic-action file @file{nemeth.sem} is used to interpret the +mathematical portions of the xml document (The text portions are +interpreted by another semantic-action file which will not be +discussed here). After the math and text have been interpreted, two +liblouis tables, @file{nemeth.ctb} and @file{en-mathtext.ctb} are used +to translate them. Each piece of mathematics or text is translated +separately and the pieces are strung together with blanks between +them. This results in inaccuracies where mathematics meets text. The +fourth table, also a liblouis table, is used to remove these +inaccuracies. It is called @file{edittable.ctb}, and it does things +like removing the multi-purpose indicator before a blank, inserting +the punctuation indicator before a punctuation mark following a math +expression, and removing extra spaces. + +The general format and use of semantic-action files were discussed in +the previous section, (@pxref{Connecting with the xml Document - +Semantic-Action Files}). In this section we shall concentrate on the +optional third column, which is used a lot in @file{nemeth.sem}. While +the first two columns can be generated by liblouisxml but must be +edited by a person, the third column must always be provided by a +human. + +As previously stated, the third column tells liblouisxml what +characters to insert to inform liblouis how to translate the math +expression. Look at the following line: + +@example +mfrac mfrac ^?,/,^# +@end example + +You will see that the third column contains two commas. This means +that it has three subcolumns. A fraction has a numerator and a +denominator. These are called children of the mfrac element. The first +subcolumn specifies the characters that liblouisxml should place in +front of the numerator. The second subcolumn gives the characters to +be placed between the numerator and denominator. Finally, the third +subcolumn gives the characters to place after the denominator. You +will see that the first subcolumn contains a caret followed by a +question mark. The dot pattern for the question mark in computer +braille is the same as for the Nemeth start-fraction indicator. The +caret is used so that liblouis can tell this apart from a question +mark, which also has the same dot pattern in computer braille. The +second subcolumn contains a slash but no caret. This is because there +is no danger of confusion where the slash is concerned. The third +subcolumn does contain a caret, and it also contains a number sign, +which corresponds to the Nemeth end-fraction indicator. When +liblouisxml encounters the MathML representation of the fraction +one-half it produces the following string of characters: +@samp{^?1/2^#}. liblouis then removes the carets to get @samp{?1/2#}. + +As another example, consider the entry in @file{nemeth.sem} for a +subscript. + +@example +msub msub ,^;,^" +@end example + +Here the first subcolumn is blank, because nothing is to be placed +before the subscripted symbol. The second subcolumn contains a caret +and a semicolon (in computer braille). This corresponds to the Nemeth +subscript indicator. The third column contains a caret and a quotation +mark, corresponding to the Nemeth baseline indicator. liblouisxml +translates the MathML expression for x superscript i into +@samp{x^;i^}. liblouis subsequently produces @samp{x;i}. There are +other steps if the subscript is numeric. These are handled by pass2 +opcodes in the liblouis translation table, @file{nemeth.ctb}. + +You will notice that the entries in @file{nemeth.sem} have various +numbers of subcolumns in the third column. In general, the characters +given in the first subcolumn are placed before the first child of the +element given in the second column. The characters in the second +subcolumn are placed before the second child, and so on, until the +characters given in the last subcolumn are placed after the last +child. + +Sometimes an element or tag can have an indeterminate number of +children. This is true of @code{<math>} itself. Yet, it may be +necessary to place some characters after the very last element. Let us +look at the @code{<math>} entry. + +@example +math math \eb,\*\ee +@end example + +First let us discuss escape sequences starting with a backslash. These +are basically the same as in liblouis. The sequence @samp{\e} is +shorthand for the escape character, which would otherwise be +represented by @samp{\x001b}. The beginning of a math expression is +denoted by an escape character followed by the letter b and the end by +an escape character followed by the letter @samp{e}. This enables the +editing table to do such things as drop the baseline indicator at the +end of a math expression and insert a number sign at the beginning, if +needed. + +Not found in liblouis is the sequence @samp{\*}. This means to put +what follows after the very last child of the math element, no matter +how many there are. + +As another example consider: + +@example +mtd mtd \*\ec +@end example + +@code{mtd} is the MathML tag for a table column. There may be many +children of this tag. The entry says to put an escape character (hex +1b), plus the letter @samp{c}, after the very last of them. + +As a final example consider: + +@example +mtr mtr ^.^\,^(,\*^.^\,^)\er +@end example + +@code{mtr} is the MathML tag for a row in a table, in this case a +matrix. Each row in a matrix must begin with the dot pattern +@samp{46-6-12356} and end with the dot pattern @samp{46-6-12456}. As +usual a caret is placed before the corresponding characters. Since dot +6 is a comma, it must be escaped. This is done by placing a backslash +before the comma. There are two subcolumns. the first contains the +characters to be placed at the beginning of each row. The second +starts with @samp{\*}, signifying that the characters following it +are to be placed at the end of everything in this row. A subcolumn +starting with @samp{\*} must be the last (or only) subcolumn. + +Here this last subcolumn ends with an escape character and the letter +@key{r}, signifying the end of a row. + +So much for the semantic action file. Even though the characters in +the third column were chosen to correspond with nemeth characters, +they may not have to be changed for other math codes. liblouis can +replace them with anything needed. + +This brings us to a consideration of the two tables used by liblouis +to translate mathematics texts. The first, @file{en-mathtext.ctb} is +used to translate text appearing outside math expressions. It is +necessary because the Nemeth code requires modifications of Grade 2 +braille. Other math codes may not have this requirement. + +The table actually used to translate mathematics is @file{nemeth.ctb}. +It includes two other tables, @file{chardfs.cti} and +@file{nemethdefs.cti}. The first gives ordinary character definitions +and is included by all the other tables. Note however, that the +unbreakable space, @samp{\x00a0}, is translated by dot 9. This is used +before and after the equal sign and other symbols in +@file{nemeth.ctb}. The second table contains character definitions for +special math symbols, most of which are Unicode characters greater +than @samp{\x00ff}. The Greek letters are here. So are symbols like +the integral sign. + +Most of the entries in @file{nemeth.ctb} should be familiar from other +tables. The unfamiliar ones follow the comments @samp{# Semantic +pairs} and @samp{# pass2 corrections}. The first simply replace +characters preceded by a caret with the character itself. The second +make adjustments in the code generated directly from the +@file{nemeth.sem} file. The pass2 opcode is discussed in the liblouis +guide (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's +and User's Guide}). Here are some comments on a few of the entries in +@file{nemeth.ctb}. + +@example +pass2 @@1456-1456 @@6-1456 +@end example + +Replaces double start-fraction indicators with the start complex +fraction indicator. + +@example +pass2 @@3456-3456 @@6-3456 +@end example + +Replaces double end-fraction indicators with the end-complex-fraction +indicator. + +@example +pass2 @@56[$d1-5]@@5 * +@end example + +Removes the subscript and baseline indicators from numeric subscripts. + +@example +pass2 @@5-9 @@9 +@end example + +Removes the baseline or multipurpose indicator before an unbreakable +space generated by the translation of an equal sign, etc. + +@example +pass2 @@45-3-5 @@3 +@end example + +Replaces a superscript apostrophe with a simple prime symbol. + +@example +pass2 @@9[]$d @@3456 +@end example + +Puts a number sign before a digit preceded by a blank. + +@example +pass2 @@9-0 @@9 +@end example + +Removes a space following an unbreakable space. + +We now come to the fourth and last table used for math translation, +the editing table, @file{edittable.ctb}. As explained at the +beginning, this table is used to remove inaccuracies where math +translation butts up against text translation. For example, the Nemeth +code puts numbers in the lower part of the cell. However, punctuation +marks are also in the lower part of the cell. So Nemeth puts a +punctuation indicator, dots @samp{456}, in front of any lower-cell +punctuation that immediately follows a mathematical expression. If +this occurs inside Mathml it is handled by @file{nemeth.ctb}. However, +a MathML expression is often followed by a punctuation mark which is +the first part of text. liblouisxml puts a blank between math and +text, but this can result in a mathematical expression followed by a +blank and then, say, a period, dots @samp{256}. @file{edittable.ctb} +replaces the blank with the punctuation indicator. + +When you look at @file{edittable.ctb} you will see that it begins with +an include of @file{chardefs.cti}. Most of the entries are ordinary, +but some are interesting. for example, + +@example +always "\s 0 +@end example + +replaces the baseline or multipurpose indicator followed by a space +with just a space. ++@node Settings Index, Function Index, Implementing Braille Mathematics Codes, Top
+@unnumbered Settings Index + +@printindex tp + +@node Function Index, Program Index, Settings Index, Top +@unnumbered Function Index + +@printindex fn + +@node Program Index, , Function Index, Top +@unnumbered Program Index + +@printindex pg + +@bye + + + For a description of the software and to download it go to http://www.jjb-software.com