Hi John On Wed, 2008-11-12 at 16:35 -0600, John J. Boyer wrote: > Thanks for your work. I am looking into making texinfo documentation for > liblouisxml also. That's fantastic. I've already started converting the liblouisxml guide to texinfo. I just haven't posted it because it is not quite finished. I attach my current version. > How did you cerate the texinfo version of the liblouis > guide? I basically just copy and pasted the content of the html version from the browser into a text editor and started doing the markup (I've done that before for other manuals so I could draw from that experience). > I'll probably be writing to you offlist for help with texinfo and > maybe autotools. Sure, the automake integration should be similar to the one in liblouis, in fact I just attached the changed Makefile.am. For completeness sake I also added an entry in the Changelog file (also attached). > The documentation was originally created in xhtml so that it could be > translated with formatting by liblouisxml. Ah OK, now I understand. Is the html or the text produced by texinfo fully accessible? Thanks Christian -- Christian Egli Swiss Library for the Blind and Visually Impaired Grubenstrasse 12, CH-8045 Zürich, Switzerland
\input texinfo @c %**start of header @setfilename liblouisxml-guide.info @include version.texi @settitle Liblouisxml Programmer's and User's Guide @dircategory Misc @direntry * Liblouisxml: (liblouisxml). An xml to Braille Translation Library. @end direntry @c Version and Contact Info @set MAINTAINERSITE @uref{http://www.jjb-software.com/liblouisxml-guide.html,maintainers webpage} @set AUTHOR John J. Boyer @set MAINTAINER John J. Boyer @set MAINTAINEREMAIL @email{john.boyer@xxxxxxxxxxxxxxxx} @set MAINTAINERCONTACT @uref{mailto:john.boyer@xxxxxxxxxxxxxxxx,contact the maintainer} @c %**end of header @finalout @c Macro definitions @c Opcode. @macro setting{name, args} @tindex \name\ @item \name\ \args\ @end macro @copying This manual is for liblouisxml (version @value{VERSION}, @value{UPDATED}), an xml to Braille Translation Library. This file may contain code borrowed from the Linux screenreader @acronym{BRLTTY}, Copyright @copyright{} 1999-2006 by the @acronym{BRLTTY} Team. Copyright @copyright{} 2004-2007 ViewPlus Technologies, Inc. @uref{www.viewplus.com} and JJB Software, Inc. @uref{www.jjb-software.com}. @quotation This file is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser (or library) General Public License (LGPL) as published by the Free Software Foundation; either version 3, or (at your option) any later version. This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser (or Library) General Public License LGPL for more details. You should have received a copy of the GNU Lesser (or Library) General Public License (LGPL) along with this program; see the file COPYING. If not, write to the Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. @end quotation @end copying @titlepage @title Liblouisxml Programmer's and User's Guide @subtitle Release @value{VERSION} @author by John J. Boyer @c The following two commands start the copyright page. @page @vskip 0pt plus 1filll @insertcopying @end titlepage @c Output the table of contents at the beginning. @contents @ifnottex @node Top, Introduction, (dir), (dir) @top Liblouis Programmer's and User's Guide @insertcopying @end ifnottex @menu * Introduction:: * Programming with liblouisxml:: * Transcribing with the xml2brl program:: * Customization Configuring liblouisxml:: * Connecting with the xml Document - Semantic-Action Files:: * Implementing Braille Mathematics Codes:: * Settings Index:: * Function Index:: @detailmenu --- The Detailed Node Listing --- Programming with liblouisxml * License:: * Overview:: * Files and Paths:: * lbx_version:: * lbx_initialize:: * lbx_translateString:: * lbx_translateFile:: * lbx_translateTextFile:: * lbx_backTranslateFile:: * lbx_free:: Transcribing with the xml2brl program * Transcribing Microsoft Word Files with msword2brl:: Customization: Configuring liblouisxml * outputFormat:: * translation:: * xml:: * style:: @end detailmenu @end menu @node Introduction, Programming with liblouisxml, Top, Top @chapter Introduction liblouisxml is a software component which can be incorporated into software packages to provide the capability of translating any file in the computer lingua franca xml format into properly transcribed braille. This includes translation into grade two, if desired, mathematical codes, etc. It also includes formatting according to a built-in style sheet which can be modified by the user. The first program into which liblouisxml has been incorporated is @command{xml2brl}. This program will translate an xml or text file into an embosser-ready braille file. It is not necessary to know xml, because MSWord and other word processors can export files in this format. If the word processor has been used correctly @command{xml2brl} will produce an excellent braille file. There is a Mac GUI application incorporating liblouisxml called louis. For a link to it go to @uref{www.jjb-software.com/downloads}. A similar Windows application is in the works. Computer programmers who wish to use liblouisxml in their software can find the information they need in the section Programming with liblouisxml (@pxref{Programming with liblouisxml}). Those who wish to change the output generated by liblouisxml should read the section Configuring liblouisxml (@pxref{Customization Configuring liblouisxml}). If you encounter a type of xml file with which liblouis is not familiar you can learn how to tell it how to process that file by reading Connecting with the xml document: Semantic-Action Files (@pxref{Connecting with the xml Document - Semantic-Action Files}). Finally, if you wish to implement a new braille mathematics code read Implementing Braille Mathematics Codes (@pxref{Implementing Braille Mathematics Codes}). You will also find it advantageous to be acquainted with the companion library liblouis, which is a braille translator and back-translator (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and User's Guide}). @node Programming with liblouisxml, Transcribing with the xml2brl program, Introduction, Top @chapter Programming with liblouisxml @menu * License:: * Overview:: * Files and Paths:: * lbx_version:: * lbx_initialize:: * lbx_translateString:: * lbx_translateFile:: * lbx_translateTextFile:: * lbx_backTranslateFile:: * lbx_free:: @end menu @node License, Overview, Programming with liblouisxml, Programming with liblouisxml @section License liblouisxml xml to Braille Translation Library This file may contain code borrowed from the Linux screenreader BRLTTY, Copyright (C) 1999-2006 by the BRLTTY Team. Copyright (C) 2004-2007 ViewPlus Technologies, Inc. www.viewplus.com and JJB Software, Inc. www.jjb-software.com All rights reserved This file is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. In addition to the permissions and restrictions contained in the GNU General Public License (GPL), the copyright holders grant two explicit permissions and impose one explicit restriction. The permissions are: 1) Using, copying, merging, publishing, distributing, sublicensing, and/or selling copies of this software that are either compiled or loaded as part of and/or linked into other code is not bound by the GPL. 2) Modifying copies of this software as needed in order to facilitate compiling and/or linking with other code is not bound by the GPL. The restriction is: 3. The translation, semantic-action and configuration tables that are read at run-time are considered part of this code and are under the terms of the GPL. Any changes to these tables and any additional tables that are created for use by this code must be made publicly available. All other uses, including modifications not required for compiling or linking and distribution of code which is not linked into a combined executable, are bound by the terms of the GPL. This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, write to the Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. @node Overview, Files and Paths, License, Programming with liblouisxml @section Overview liblouisxml is an "extensible renderer," designed to translate a wide variety of xml documents into braille, but with a special emphasis on technical material. The overall operation of liblouisxml is controlled by a configuration file. The way in which a particular type of xml document is to be rendered is specified by a semantic-action file for that document type. Braille translation is done by the liblouis braille translation and back-translation library (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and User's Guide}). Its operation, in turn is controlled by translation table files. All these files are plain text and can be created and edited in any text editor. Configuration settings can also be specified on the command line of the console-mode transcription program @command{xml2brl}. The general operation of liblouisxml is as follows. It uses the libxml2 library to construct a parse tree of the xml document. After the parse tree is constructed, a function called @code{examine_document} looks it over and determines whether math translation tables, etc. are needed. @code{examine_document} also constructs a prototype semantic-action file, if one does not exist already. When it is finished, another function, called @code{transcribe_document}, does the actual braille transcription. It calls @code{transcribe_math} to handle MathML subtrees, @code{transcribe_chemistry} for chemical formula subtrees, @code{transcribe_graphic} for SVG graphics, etc. Entities are translated to Unicode, if they are not already. Sequences of symbols indicate superscripts, return to the baseline, subscripts, start and end of fractions, etc. The Braille translator and back-translator library liblouis is used to do the braille translation. The @code{transcribe_math} function works in conjunction with the latest version of liblouis and a special math translation table to transcribe most mathematical expressions into fairly good Nemeth Code. Much refinement is still necessary. Other braille mathematical codes can be handled by modifying the translation table. The functions which are not needed at the moment, such as @code{transcribe_chemistry}, are only skeletons. However, I hope that @code{transcribe_graphics} can be expanded in the near future to use the graphics capability of the Tiger tactile graphics embossers. The latest versions of liblouisxml and liblouis can be downloaded from @uref{www.jjb-software.com}. Note that liblouisxml will only work with the latest version of liblouis. liblouisxml can be compiled to use either 16-bit or 32-bit Unicode internally. This is inherited from liblouis, so liblouis must be compiled first and then liblouisxml. Wherever 16 bits are mentioned in this document, read 32 if you have compiled the library for 32 bits. @node Files and Paths, lbx_version, Overview, Programming with liblouisxml @section Files and Paths As stated in the previous section, liblouisxml uses three kinds of files, configuration files, semantic-action files, and liblouis translation tables. The first two are discussed later in this documentation. liblouis translation tables are discussed in the liblouis guide (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and User's Guide}) which is distributed with liblouis. These files can be placed on various paths, which are determined at compile time. One of these paths should be to the @file{lbx_files} directory provided by liblouisxml, which contains the principal configuration file (@file{canonical.cfg}) and the semantic-action files. Another should be to the tables directory in the liblouis distribution. Note that liblouisxml also generates some files, all of which are placed on the current directory. These files are new prototype semantic-action files, additions to old semantic-action files, temporary files, and log files. The first two can be used to extend the capability of liblouisxml to process xml documents. The latter two are useful for debugging. Paths are set by changing a few lines of code in the @file{paths.c} module. If you are preparing liblouisxml for Windows a function which finds the name of the "Program Files" directory for your locale is called automatically. You can then modify the line containing the term @samp{yourSubDir} as needed. If you are preparing liblouisxml for a Unix-type system look for the line that says @samp{Set Unix Paths}. The following three lines establish a path to the @file{lbx_files} directory in your home directory. As distributed, this directory contains the semantic-action files and some configuration files. You can chose to copy the tables from the liblouis distribution into it as well, or you can modify the following three lines to point to the actual location of the tables. You can also chose to place both the @file{lbx_files} and the tables directory in @file{/etc}. The function @code{addPath} takes care of adding path to liblouisxml properly. You can specify many more than two paths. @node lbx_version, lbx_initialize, Files and Paths, Programming with liblouisxml @section lbx_version @findex lbx_version @example char *lbx_version (void) @end example This function returns a pointer to a character string containing the version of liblouisxml, plus other information such as the release date and perhaps notable changes. @node lbx_initialize, lbx_translateString, lbx_version, Programming with liblouisxml @section lbx_initialize @findex lbx_initialize @example void * lbx_initialize ( const char *const configFilelist, const char const *logFileName, const char *const settingsString) @end example This function initializes the libxml2 library, runs @file{canonical.cfg} and processes configuration settings given in @code{configSettings} and the configuration files given in @code{configFilelist}. This is a list of configuration file names separated by commas. If the first character is a comma it is taken to be a string containing configuration settings and is processed like the @code{configSettings} string. Such a string must conform to the format of a configuration file. Newlines should be represented with ASCII 10. If @code{logfilename} is not @code{null}, a log file is produced on the current directory. If it is @code{null} any messages are printed on stderr. The function returns a pointer to the @code{UserData} structure. This pointer is @code{void} and must be cast to @code{(UserData *)} in the calling program. To access the information in this structure you must include @file{louisxml.h}. This function is used by @command{xml2brl}. @node lbx_translateString, lbx_translateFile, lbx_initialize, Programming with liblouisxml @section lbx_translateString @findex lbx_translateString @example int lbx_translateString ( const char *const configfilelist, char * inbuf, widechar *outbuf, int *outlen, unsigned int mode) @end example This function takes a well-formed xml expression in @code{inbuf} and translates it into a string of 16-bit (or 32-bit if this has been specified in liblouis) braille characters in @code{outbuf}. The xml expression must be immediately followed by a zero or null byte. Leading whitespace is ignored. If it does not then begin with the characters @samp{<?xml} an xml header is added. If it does not begin with @samp{<} it is assumed to be a text string and is translated accordingly. The header is specified by the xmlHeader line in the configuration file. If no such line is present, a default header specifying UTF-8 encoding is used. The @code{mode} parameter specifies whether you want the library to be initialized. If it is 0 everything is reset, the @file{canonical.cfg} file is processed and the configuration file and/or string (see previous section) are processed. If @code{mode} is 1 liblouisxml simply prepares to handle a new document. For more on the @code{mode} parameter see the next section. Which 16-bit character in @code{outbuf} represents which dot pattern is indicated in the liblouis translation tables. The @code{configfilelist} parameter points to a configuration file or string. Among other things, this file specifies translation tables. It is these tables which control just how the translation is made, whether in Grade 2, Grade 1, the Nemeth Code of Braille Mathematics or something else. Note that the @code{*outlen} parameter is a pointer to an integer. When the function is called, this integer contains the maximum output length. When it returns, it is set to the actual length used. The function returns 1 if no errors were encountered and a negative number if a complete translation could not be done. @node lbx_translateFile, lbx_translateTextFile, lbx_translateString, Programming with liblouisxml @section lbx_translateFile @findex lbx_translateFile @example int lbx_translateFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode) @end example This function accepts a well-formed xml document in @code{inputFilename} and produces a braille translation in @code{outputFilename}. As for @code{lbx_translateString}, the @code{mode} parameter specifies whether the library is to be initialized with new configuration information or simply prepared to handle a new document. In addition, the @code{mode} parameter can specify that a document is in html, not xhtml. @file{liblouisxml.h} contains an enumeration type with the values @code{dontInit} and @code{htmlDoc}. These can be combined with an or (@samp{|}) operator. The input file is assumed to be encoded in UTF-8, unless otherwise specified in the xml header. The encoding of the output file may be UTF-8, UTF-16, UTF-32 or Ascii-8. This is specified by the @code{outputEncoding} line in the configuration file, @code{configfilelist}. The function returns 1 if the translation was successful. @node lbx_translateTextFile, lbx_backTranslateFile, lbx_translateFile, Programming with liblouisxml @section lbx_translateTextFile @findex lbx_translateTextFile @example int lbx_translateTextFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode) @end example This function accepts a text file in @code{inputFilename} and produces a braille translation in @code{outputFilename}. The input file is assumed to be encoded in Ascii8. Blank lines indicate the divisions between paragraphs. Two blank lines cause a blank line between paragraphs (or headers). The output file may be in UTF-8, UTF-16, or Ascii8, as specified by the @code{outputEncoding} line in the configuration file, @code{configfilelist}. As for @code{lbx_translateString}, the @code{mode} parameter specifies whether complete initialization is to be done or simply initialization for a new document. @node lbx_backTranslateFile, lbx_free, lbx_translateTextFile, Programming with liblouisxml @section lbx_backTranslateFile @findex lbx_backTranslateFile @example int lbx_backTranslateFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode) @end example This function accepts a braille file in @code{inputFilename} and produces a back-translation in @code{outputFilename}. The input file is assumed to be encoded in Ascii8. The output file is in either plain text or html, according to the setting of @code{backFormat} in the configuration file. Html files are encoded in UTF8. In plain-text, blank lines are inserted between paragraphs. The output file may be in UTF-8, UTF-16, or Ascii8, as specified by the @code{outputEncoding} line in the configuration file, @code{configfilelist}. The mode parameter specifies whether or not the library is to be initialized with new configuration information, as described in the section on @code{lbx_translateString} (@pxref{lbx_translateString}). @node lbx_free, , lbx_backTranslateFile, Programming with liblouisxml @section lbx_free @findex lbx_free @example void lbx_free (void) @end example This function should be called at the end of the application to free all memory allocated by liblouisxml and liblouis. If you wish to change configuration files during your application, use a @code{mode} parameter of 0 on the function call using the new configuration information. @node Transcribing with the xml2brl program, Customization Configuring liblouisxml, Programming with liblouisxml, Top @chapter Transcribing with the xml2brl program At the moment, actual transcription with liblouisxml is done with the command-line (or console) program @command{xml2brl}. The line to type is: @example xml2brl [OPTIONS] [-f config-file] [infile] [outfile] @end example The brackets indicate that something is optional. You will see that nothing is required except the program name itself, @command{xml2brl}. The various optional parts control how the program will behave, as follows: @table @option @item -h This option causes @command{xml2brl} to print a help message describing usage and exit. @item -l This option will cause @command{xml2brl} and liblouisxml to print error messages to @file{xml2brl.log} instead of stderr. The file will be in the current directory. This option is particularly useful if @command{xml2brl} is called by a GUI script or Web application. @item -f configfile This specifies the configuration file which tells @command{xml2brl} how to do the transcription. (It may be a list of file names separated by commas.) This file specifies such things as the number of cells per line, the number of lines per page, The translation tables to be used, how paragraphs and headings are to be formatted, etc. If this part of the command line is omitted, @command{xml2brl} assumes that the configuration file is named @file{default.cfg} and is in the current directory. If the configuration file name contains a pathname @command{xml2brl} will consider this as a path on which to look for files that it needs (@pxref{Files and Paths}). @item -Csetting=value This option enables you to specify configuration settings on the command line instead of changing the configuration file. You can use as many @option{-C} options as you wish. Any settings can be specified except those having to do with styles. The settings may be in any order. They override any settings in @file{canonical.cfg} or in the configuration file used by @command{xml2brl}. @item -b back-translate. The input file must be a braille file, such as @file{.brf}. The output file is a back-translation of this file. It may be in either plain-text or xhtml (html), according to the setting of backFormat in the outputFormat section of the configuration file. Html files will contain page numbers and emphasis. To get good html, the liblouis table must have the entry @samp{space \e 1b} so that it will pass through escape characters. The @file{html.sem} file must also contain the line @samp{pagenum pagenum}. Text output files simply have a blank line between paragraphs. Encoding of text files is controlled by the outputEncoding setting. Html files are always in UTF-8. @item -r Reformat. The input file must be a braille file, such as @file{.brf}. The output is a braille file formatted according to the configuration file. It is advisable to set backFormat to html, since this will preserve print page numbers and emphasis. This program can be useful for changing the line length and page length of a braille file, for example, from 40 to 32 cells. It is also an excellent way to check the accuracy of liblouis tables. The original page numbers at the tops and bottoms of pages are discarded, and new ones are generated. @item -p Poorly formatted input translation. Infile is any text file such as may have been obtained by extracting the text in a pdf file. The input file may also be an xml or html file which is so poorly formatted that better braille can be obtained by ignoring the formatting. @command{xml2brl} tries to guess paragraph breaks. The output is generally reasonably formatted, that is, with reasonable paragraph breaks. @item -t The document is an h(t)ml file, not xhtml. This option is useful with files downloaded from the Web in source form. Without it, the program will first try to parse the file as an xml document, producing lots of error messages. It will then try the html parser. With this option, it goes directly to the html parser. See also the formatFor configuration (@pxref{formatFor setting}) file setting, which enables you to format the braille output for viewing in a browser. @item infile This is the name of the input file containing the material to be transcribed. The file may be either an xml file or a text file. The @option{-b}, @option{-r} and @option{-p} options discussed above provide for other types of files and processing. Typical xml files are those provided by @uref{www.bookshare.org} or those derived from a word processor by saving in xml format. If a text file is used paragraphs and headings should be separated by blank lines. In such a file there is no way to distinguish between paragraphs and headings, so they will all be formatted as paragraphs, as specified by the configuration file. However, if you want a blank line in the braille transcription use two consecutive blank lines in the text file. @item outfile This is the name of the output file. It will be transcribed as specified by the configuration file and the configuration settings. The following paragraphs provide more information on both the input and output files. @end table @command{xml2brl} is set up so that it can be used in a "pipe". To do this, omit both infile and outfile. Input is then taken from the standard input unit. The first file name encountered (a word not preceded by a minus sign) is taken to be the input file and the second to be the output file. If you wish input to be taken from stdin and still want to specify an output file use two minus signs (@samp{--}) for the input file. If only the program name is typed @command{xml2brl} assumes that the configuration file is @file{default.cfg}, input is from the standard input unit, and output is to the standard output unit. @menu * Transcribing Microsoft Word Files with msword2brl:: @end menu @node Transcribing Microsoft Word Files with msword2brl, , Transcribing with the xml2brl program, Transcribing with the xml2brl program @section Transcribing Microsoft Word Files with msword2brl msword2brl: Type: msword2brl infile outfile Infile must be a Microsoft Word file. The script first calls the @command{antiword} program, so you must have this installed on your machine. @command{antiword} is called with @option{-x db}, which causes the output to be in docbook format. This is piped to @command{xml2brl}. The output file from @command{xml2brl} contains much of the formatting, including emphasis, of the word file. @node Customization Configuring liblouisxml, Connecting with the xml Document - Semantic-Action Files, Transcribing with the xml2brl program, Top @chapter Customization: Configuring liblouisxml The operation of liblouisxml is controlled by two types of files: semantic-action files and configuration files. The former are discussed in the section Connecting with the xml Document - Semantic-action Files (@pxref{Connecting with the xml Document - Semantic-Action Files}). The latter are discussed in this section. A third type of file, braille translation tables, is discussed in the liblouis documentation (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and User's Guide}). Another section of the present document which may be of interest is Implementing Braille Mathematical Codes (@pxref{Implementing Braille Mathematics Codes}). liblouisxml (with liblouis) can be used as the braille transcription component in any number of applications with different overall purposes and user interfaces. However, as of now the principal application is @command{xml2brl}, which is a console application for Mac and Linux. (There is also a Mac GUI application called louis.) The information below therefore applies to @command{xml2brl} as much as to liblouisxml. Before discussing configuration files in detail it is worth noting that the application program has access to the information in the configuration files by calling the liblouisxml function @code{lbx_initialize}. This function returns a pointer to a data structure containing the configuration information. @command{xml2brl} uses the configuration file @file{default.cfg} unless a different one is specified via the @option{-f} command-line option. The configuration file name may include a full path. In this case, liblouisxml will consider this to be the user path. (This can be changed at compile time (@pxref{Files and Paths}). If just a file name (or list) is given, liblouisxml will consider the current directory as the user path. The configuration "file" specified with the @option{-f} option need not be a single filename. It can be several file names separated by commas. Only the first filename may have a path component. This path is taken as the user path, as discussed in the previous paragraph. This file-list feature is also found in liblouis. It enables you to combine configuration files on the command line. For example, a file list may consist of one file specifying the output format used in your establishment, a comma, and then the name of a stylesheet. After the path, if any, has been evaluated, but before reading any of the files, liblouisxml reads in a file called @file{canonical.cfg}. This file specifies values for all possible settings. It is needed to complete the initialization of the program. You may alter the values in the distribution @file{canonical.cfg}, but you should not delete any settings. If a configuration file read in later contains a particular setting name, the value specified simply replaces the one specified in @file{canonical.cfg}. As you will see by looking at @file{canonical.cfg}, it contains four main sections, outputFormat, translation, xml and styles. In addition, a configuration file can contain an include entry. This causes the file named on that line to be read in at the point where the line occurs. The sections need not follow each other in any particular order, nor is the order of settings within each section important. In this document and in the @file{canonical.cfg} file, where section and setting names consist of more than one word, the first letter of each word following the initial one is capitalized. This is merely for readability. The case of the letters in these names is ignored by the program. Section and setting names may not contain spaces. Here, then, is an explanation of each section and setting in the @file{canonical.cfg} file. When you look at this file you will see that the section names start at the left margin, while the settings are indented one tab stop. This is done for readability. it has no effect on the meaning of the lines. You will also see lines beginning with a number sign (@samp{#}), which are comments. Blank lines can also be used anywhere in a configuration file. In general, a section name is a single word or combination of unspaced words. However, each style has a section of its own, so the word @samp{style} is followed by the name of the style. Setting lines begin with the name of the setting, followed by at least one space or tab, followed by the value of the setting. A few settings have two values. @menu * outputFormat:: * translation:: * xml:: * style:: @end menu @node outputFormat, translation, Customization Configuring liblouisxml, Customization Configuring liblouisxml @section outputFormat This section specifies the format of the output file (or string, if no file name is given). @table @code @setting{cellsPerLine, 40} The number of cells in a braille line. @setting{LinesPerPage, 25} The number of lines on a braille page @setting{interpoint, no} Whether or not the output will be used to produce interpoint braille. This affects the placement of page numbers and may affect other things in the future. The only two values recognized are @samp{yes} and @samp{no}. @setting{lineEnd, \\r\\n} This specifies the control characters to be placed at the end of each output line. These characters vary from one intended use of the output to another. Most embossers require the carriage-return and line-feed combination specified above. However, a braille display may work best with just one or the other. Any valid control characters can be specified. @setting{pageEnd, \\f} The control Character to be given at the end of a page. Here it is a forms-feed character, but it can be something else if deeded. @setting{fileEnd, ^z} The control character to be placed at the end of the file, here a control-z. @setting{printPages, yes} Whether or not to show print page numbers if they are given in the xml input. The two valid values are @samp{yes} and @samp{no}. @setting{braillePages, yes} Whether or not to format the output into pages. Here the value is @samp{yes}, for use with an embosser. However the user of a braille display may wish to specify @samp{no}, so as not to be bothered with page numbers and forms feed characters. If no is specified the lines will still be of the length given in callsPerLine, but the value of linesPerPage will be ignored. @setting{paragraphs, yes} Whether or not to format the output into paragraphs, using appropriate styles. If @samp{no} is specified, what would be a paragraph is output simply as one long line. Applications that wish to do their own formatting may specify @samp{no}. @setting{BeginingPageNumber, 1} This is the number to be placed on the first Braille page if braillePages is yes. This is useful when producing multiple Braille volumes. @setting{printPageNumberAt, top} If print page numbers are given in the xml input file they will be placed at the top of each braille page in the right-hand corner. A page separator line will also be produced on the braille page where the print page break actually occurs. You may also specify @samp{bottom} for this setting. @setting{braillePageNumberAt, bottom} The braille page number will be placed in the bottom right-hand corner of each page. If interpoint yes has been specified only odd pages will receive page numbers. If you specify @samp{top} for this setting then @samp{bottom} must be specified for printPageNumberAt. @setting{hyphenate, no} If @samp{yes} is specified words will be hyphenated at the ends of lines if a hyphenation table is available. In contracted English Braille hyphenation is not generally used, but it can save considerable space. The hyphenation table is specified as part of the table list in the literaryTextTable setting of the translation section. @setting{outputEncoding, ascii8} This specifies that the output is to be in the form of 8-bit ASCII characters. This is generally used if the output is intended directly for a braille embosser or display. The other values of encoding are @samp{UTF8}, @samp{UTF16} and @samp{UTF32}. These are useful if the application will process the output further, such as for generating displays of braille dots on a screen. @setting{inputTextEncoding, ascii8} This setting is used to specify the encoding of an input text file. The valid values are @samp{UTF8} and @samp{ascii8}. @anchor{formatFor setting} @setting{formatFor, textDevice} This setting specifies the type of device the output is intended for. @samp{textDevice} is any device that accepts plain text, including embossers. You can also specify @samp{browser}. In this case the output will be formatted for viewing in a browser. If the input file contains links, they will be preserved and can be used in the normal way. The text will be translated into braille with the correct line length. Math and computer material will be translated appropriately. These files work well in lynx and Internet Explorer, not so well in elinks and Firefox. @setting{backFormat, plain} This setting specifies the format of back-translated files. @samp{Plain} specifies plain-text, while @samp{html} specifies xhtml. The latter is always encoded in UTF-8. Plain-text files can be encoded in ascii8, UTF-8 or UTF-16. Html is strongly recommended, since it will preserve print page numbering and emphasis. @setting{backLineLength, 70} This setting specifies the length of lines in back-translated files, whether in plain-text or html. This is mainly for human readability. Lines may sometimes be somewhat longer. @setting{interline, no} This setting specifies whether interlining is desired. If it is set to @samp{yes}, the first line in the output will be a braille translation, the next line will be its back-translation according to the interlineBackTable. Back-translation is used instead of simply presenting the print original because a braille line may contain additional information, such as leading blanks, print or braille page numbers, print page separator lines, etc. @end table @node translation, xml, outputFormat, Customization Configuring liblouisxml @section translation This section specifies the liblouis translation tables to be used for various purposes. @table @code @setting{literaryTextTable, en-us-g2.ctb} The table used for producing literary braille. This may be either contracted or uncontracted. @setting{uncontractedTable, en-us-g1.ctb} The table used for producing uncontracted or Grade One braille. This setting appears to be superfluous and may be eliminated in the future. @setting{compbrailleTable, en-us-compbrl.ctb} The table used for producing large amounts of output in computer braille, such as computer programs. The computer braille table is usually combined with one of the two tables above. @setting{mathtextTable, en-us-mathtext.ctb} This table specifies how the non-mathematical parts of math books are to be translated. In many cases it will be the same as literaryTextTable or uncontractedTable. For books translated with the Nemeth Code it is different, because this code requires modification of standard Grade Two. @setting{MathexpTable, nemeth.ctb} This is the table used to translate mathematical expressions. @setting{editTable, edittable.ctb} When the output includes both mathematics and text there may be errors where one type of translation directly follows another. The editTable removes these errors. @setting{interlineBackTable, en-us-interline.ctb} This setting specifies the table to be used for back-translation when interlining is turned on. It must be tailored for this purpose, since an ordinary forward-translation table may contain entries that do not handle the additional information in braille lines correctly. @end table @node xml, style, translation, Customization Configuring liblouisxml @section xml This section provides various information for the processing of xml files. @table @code @setting{semanticFiles, *\,nemeth.semm} This setting gives a list of semantic-action files. These files are read in the sequence given in the list. Here the first member of the list is an asterisk (@samp{*}). This means that the corresponding file is to be named by taking the root element of the document and appending @samp{.sem}. This asterisk member may occur anywhere in the list. @setting{xmlheader, <?xml version='1.0' encoding='UTF8' standalone='yes'?>} This line gives the xml header to be added to strings produced by programs like @command{Mathtype} that lack one. @setting{entity, nbsp ^1} This line defines an entity or substitution in an xml file. It is one of those that has two values. The first is the thing to be replaced, and the second is the replacement. As many entity lines as necessary can be used. The information they contain is added to the information provided by xmlHeader. In @file{canonical.cfg} this line is commented out, because specifying it at this point would prevent the user from specifying his own xmlheader. @setting{internetAccess, yes} The computer has an internet connection and liblouisxml may obtain information necessary for the processing of this file from the Internet. If this setting is @samp{no} liblouisxml will not try to use the internet. The necessary information may, however, be provided on the local machine in the form of a "dtd" file. @setting{newEntries, yes} liblouis may create a new semantic-action file (beginning with @file{new_}) for a document with an unknown root element or a file (beginning with @file{appended_}) containing new entries for an existing semantic-action file. Both kinds of files are placed on the current directory. If this setting is @samp{no} liblouisxml will dot create a file of new entries and if it encounters a document with an unknown root element it will issue an error message. Setting newEntries to @samp{no} may be useful if users should not be bothered with the minutiae of semantic-action files. @end table @node style, , xml, Customization Configuring liblouisxml @section style The following sections all deal with styles. Each style has its own section. Style section names are unlike other section names in that they consist of the word style, followed by a space, followed by a style name. More styles may be added as the software develops, and some may be dropped. @subsection style document This section specifies the style of the whole document. The settings given in it are applied to all other styles. If a section for another style is given, the settings in it replace those from the document style for that section. Because the settings in the document style apply to all other styles, if a document style section is given it must precede the sections for all other styles. @table @code @setting{linesBefore, 0} This setting gives the number of blank lines which should be left before the text to which this style applies. It is set to a non-zero value for some header styles. @setting{linesAfter, 0} The number of blank lines which should be left after the text to which this style applies. @setting{leftMargin, 0} The number of cells by which the left margin of all lines in the text should be indented. Used for hanging indents, among other things. @setting{firstLineIndent, 0} The number of cells by which the first line is to be indented relative to leftMargin. firstLineIndent may be negative. If the result is less than 0 it will be set to 0. @setting{translate, contracted} This setting is currently inactive. It may be used in the future. This setting tells how text in this style should be translated. Possible values are @samp{contracted}, @samp{uncontracted}, @samp{compbrl}, @samp{mathtext} and @samp{mathexpr}. @setting{skipNumberLines, no} If this setting is @samp{yes} the top and bottom lines on the page will be skipped if they contain braille or print page numbers. This is useful in some of the mathematical and graphical styles. @setting{format, leftJustified} The format setting controls how the text in the style will be formatted. Valid values are @samp{leftJustified}, @samp{rightJustified}, @samp{centered}, @samp{computerCoded}, @samp{alignColumnsLeft}, @samp{alignColumnsRight}, @samp{listColumns} and @samp{listLines}. The first three are self-explanatory. @samp{computerCoded} is used for computer programs and similar material. The next three are used for tabular material. @samp{alignColumnsLeft} causes the left ends of columns to be aligned. @samp{alignColumnsRight} causes the right ends of columns to be aligned. @samp{listColumns} causes columns to be placed one after the other, separated by whatever separation character has been specified in the semantic-action file, followed by a space. An escape character (hex 1b) must also be specified to indicate the end of the column. Two escape characters must be specified to indicate the end of a row. Indentation of the lines in a row is controlled by the leftMargin and firstLineIndent settings. @samp{listLines} is similar except that it lists lines, as in poetry stanzas. The semantic-action file must specify two escape characters to indicate the end of a line. @setting{newPageBefore, no} If this setting is @samp{yes}, the text will begin on a new page. This is useful for certain mathematical and graphical styles. Page numbers are handled properly. @setting{newPageAfter, no} If this setting is @samp{yes} any remaining space on the page after the material covered by this style is handled is left blank, except for page numbers. @setting{rightHandPage, no} if this setting is @samp{yes} and interpoint is yes the material covered by this style will start on a right-hand page. This may cause a left-hand page to be left blank except for page numbers. If interpoint is @samp{no} this setting is equivalent to newPageBefore. @end table @subsection style arith This style is used for arithmetic examples in elementary math books. On recognizing this style, the translator formats the material in a special way. This style has no settings different from those of the document style at the moment. Nevertheless, the line @samp{style arith} must be included in @file{canonical.cfg} so that it will be set up properly. @subsection style attribution This style is used for an attribution following a quotation. @table @code @setting{format, rightJustified} @end table @subsection style biblio This style is used for bibliographies. Settings will be added later. @subsection style caption This style is used for picture captions. @table @code @setting{leftMargin, 4} @setting{firstLineIndent, 2} Note that the first line is actually indented six cells. @end table @subsection style code This style is used for computer programs. @table @code @setting{skipNumberLines, yes} @setting{linesBefore, 1} @setting{linesAfter, 1} @setting{format, computerCode} @end table @subsection style contents This is for entries in a table of contents. @subsection style dedication This style is for the dedication of a book. @table @code @setting{newPageBefore, yes} @setting{newPageAfter, yes} @setting{center, yes} @end table @subsection style directions This is for giving directions for exercises. @subsection style dispmath This is for showing mathematics that is set off from the text. @table @code @setting{leftMargin, 2} @end table @subsection style disptext This if for text that is set off from the rest of the text. @table @code @setting{leftMargin, 2} @setting{firstLineIndent, 2} @end table @subsection style exercise 1 This is the first level in a set of exercises where there are sublevels. @table @code @setting{leftMargin, 2} @setting{firstLineIndent, -2} @end table @subsection style exercise2 This is for the second level of exercises, such as exercise a following exercise 1. @table @code @setting{leftMargin, 4} @setting{firstLineIndent, -2} @end table @subsection style exercise3 This is for the third level of exercises. @table @code @setting{leftMargin, 6} @setting{firstLineIndent, -2} @end table @subsection style glossary This is for a glossary. @table @code @setting{firstLineIndent, 2} Section: style graph This style reserves space for a graph or other tactile material. @setting{skipNumberLines, yes} @end table @subsection style graphLabel This style reserves space for the label of a graph. @subsection style heading1 This style is used for main headings, such as chapter titles. @table @code @setting{linesBefore, 1} @setting{center, yes} @setting{linesAfter, 1} @end table @subsection style heading2 The first level of subreadings after the main heading. @table @code @setting{linesBefore, 1} @setting{firstLineIndent, 4} @end table @subsection style heading3 The third level of headings. @table @code @setting{firstLineIndent, 4} @end table @subsection style heading4 The fourth and final level of headings. @table @code @setting{firstLineIndent, 4} @end table @subsection style indexx This style is used for indexes. The extra @samp{x} is not an error. It is there to prevent conflict with names elsewhere in the software. @subsection style list This is for the individual items in a list. @table @code @setting{firstLineIndent, -2} @setting{leftMargin, 2} @end table @subsection style matrix This style causes its contents to be formatted in a way suitable for the representation of matrices. @table @code @setting{format, alignColumnsLeft} @end table @subsection style music This style is used for braille music. @table @code @setting{skipNumberLines, yes} @end table @subsection style note This style is used for footnotes. @subsection style para Paragraph. This is ordinary body text. @table @code @setting{firstLineIndent, 2} @end table @subsection style quotation This style is used for quotations that are set off from the rest of the text. @table @code @setting{linesBefore, 1} @setting{linesAfter, 1} @end table @subsection style section This style is used for a section with a section number. @table @code @setting{firstLineIndent, 4} @end table @subsection style spatial This style is used for mathematical material that is arranged spatially, such as large fractions. @subsection style stanza this style is used for stanzas in poetry. @table @code @setting{linesBefore, 1} @setting{linesAfter, 1} @setting{format, listLines} @end table @subsection style style1 This and the subsequent numbered styles can be used by the user for any purpose. @subsection style style2 @subsection style style3 @subsection style style4 @subsection style style5 @subsection style subsection This style is used for subsections with a subsection number. @table @code @setting{firstLineIndent, 4} @end table @subsection style table This style is used for ordinary tables. @subsection style titlepage This style is used to begin a title page. @table @code @setting{newPageAfter, yes} @end table @subsection style trnote This style is used for transcriber's notes which are set off from the text. @subsection style volume This style is used to indicate the beginning of a braille volume. @node Connecting with the xml Document - Semantic-Action Files, Implementing Braille Mathematics Codes, Customization Configuring liblouisxml, Top @chapter Connecting with the xml Document - Semantic-Action Files When liblouisxml (or @command{xml2brl}) processes an xml document, it needs to be told how to use the information in that document to produce a properly translated and formatted braille document. These instructions are provided by a semantic-action file, so called because it explains the meaning, or semantics, of the various specifications in the xml document. To understand how this works, it is necessary to have a basic knowledge of the organization of an xml document. An xml document is organized like a book, but with much finer detail. first there is the title of the whole book. Then there are various sections, such as author, copyright, table of contents, dedication, acknowledgments, preface, various chapters, bibliography, index, and so on. Each chapter may be divided into sections, and these in turn can be divided into subsections, subsubsections, etc. In a book the parts have names or titles distinguished by capitalization, type fonts, spacing, and so forth. In an xml document the names of the parts are enclosed in angle brackets (@samp{<>}). for example, if liblouisxml encounters @code{<html>} at the beginning of a document, it knows it is dealing with a document that conforms to the standards of the extensible markup language (xhtml) - at least we hope it does. When you see a book, you know it's a book. The computer can know only by being told. Something enclosed in angle brackets is called an "element" (more properly, a "tag") in xml parlance. (There may be more between the angle brackets than just the name of the element. More of this later). The first "element" in a document thus tells liblouisxml what kind of document it is dealing with. This element is called the "root element" because the document is visualized as branching out from it like a tree. Some examples of root elements are @code{<html>}, @code{<math>}, @code{<book>}, @code{<dtbook3>} and @code{<wordDocument>}. Whenever liblouisxml encounters a root element that it doesn't know about it creates a new file called a semantic-action file. The name of this file is formed by stripping the angle brackets from the root element and adding a period plus the letters @samp{sem}. If you look in a directory containing semantic-action files you will see names like @file{html.sem}, @file{dtbook3.sem}, @file{math.sem}, and so on. Sometimes it is advantageous to preempt the creation of a semantic-action file for a new root element. For example, an article written according to the docbook specification may have the root element @code{<article>}. However, the specification itself has the root element @code{<book>}. In this case you can specify the @file{book.sem} file in the configuration file by writing, in the xml section,: @example semanticFiles book.sem @end example You will note that this setting uses the plural of "file". This is because you can actually specify a list of file names separated by commas. You might want to do this to specify the semantic-action file for the particular braille mathematical code to be used. For example: @example semanticFiles book.sem,ukmath.sem @end example As you will see in the next section, different braille style conventions and different braille mathematical codes may require different semantic-action files liblouisxml records the names of all elements found in the document in the semantic-action file. The document has a multitude of elements, which can be thought of as describing the headings of various parts of the document. One element is used to denote a chapter heading. Another is used to denote a paragraph, Still another to denote text in bold type, and so on. In other words, the elements take the place of the capitalization, changes in type font, spacing, etc. in a book. However, The computer still does not know what to do when it encounters an element. The semantic-action file tells it that. Consider @file{html.sem}. A copy is included as part of this documentation with the name @file{example_sem}. It may differ from the file that liblouisxml is currently using. You will see that it begins with some lines about copyrights. Each line begins with a number sign (@samp{#}). This indicates that it is a "comment," intended for the human reader and the computer should ignore it. Then there is a blank line. Finally, there are two other comments explaining that the file must be edited to get proper output. This is because a human being must tell the computer what to do with each element. The semantic files for common types of documents have already been edited, so you generally don't have to worry about this. But if you encounter a new type of document or wish to specify special handling for styles or mathematics you may have to edit the semantic-action file or send it to the maintainer for editing. In any case the rest of this section is essential for understanding how liblouisxml handles documents and for making changes if the way it does so is not correct. After another blank line you will see a table consisting of two, and sometimes three, columns. The first column contains a word which tells the computer to do something. For example, the first entry in the table is: @samp{include nemeth.sem}. This tells liblouisxml to include the information in the @file{nemeth.sem} file when it is deciphering an html (actually xhtml) document (it may be preferable to use the semanticFiles setting in the configuration file rather than an include). The second row of the table is: @example no hr @end example @samp{hr} is an element with the angle brackets removed. It means nothing in itself. However, the first column contains the word @samp{no}. This tells liblouisxml "no do", that is, do nothing. After a few more lines with @samp{no} in the first column, we see one that says: @example softreturn br @end example This means that when the element @code{<br>} is encountered, liblouisxml is to do a soft return, that is, start a new line without starting a new paragraph. The next line says: @example heading1 h1 @end example This tells liblouisxml that when it encounters the element @code{<h1>} it is to format the text which follows as a first-level braille heading, that is, the text will be centered and proceeded and followed by blank lines. (You can change this by changing the definition of the heading1 style). The next line says: @example italicx em @end example This tells liblouisxml that when it encounters the element @code{<em>} it is to enclose the text which follows in braille italic indicators. The @samp{x} at the end of the semantic action name is there to prevent conflicts with names elsewhere in the software. Just where the italic indicators will be placed is controlled by the liblouis translation table in use. The next line says: @example skip style @end example This tells liblouis to simply skip ahead until it encounters the element @code{</style>}. Nothing in between will have any effect on the braille output. Note the slash (@samp{/}) before the @samp{style}. This means the end of whatever the @code{<style>} element was referring to. Actually, it was referring to specifications of how things should be printed. If liblouisxml had not been told to skip these specifications, the braille output would have contained a lot of gobledygook. The next line says: @example italicx strong @end example This tells liblouis to also use the italic braille indicators for the text between the @code{<strong>} and @code{</strong>} elements. After a few more lines with @samp{no} in the first column we come to the line: @example document html @end example This tells liblouisxml that everything between @code{<html>} and @code{</html>} is an entire document. @code{<html>} was the root element of this document, so this is logical. After another @samp{no} line we come to: @example para p @end example liblouisxml will consider everything between @code{<p>} and @code{</p>} to be a normal body text paragraph. The next line is: @example heading1 title @end example this causes the title of the document to also be treated as a braille level 1 heading. Next we have the line: @example list li @end example The xhtml @code{<li>} and @code{</li>} pair of elements is used to enclose an item in a list. liblouisxml will format this with its own list style. That is, the first line will begin at the left margin and subsequent lines will be indented two cells. Next we have: @example table table @end example You will note that the names of actions and elements are often identical. This is because they are both mnemonic. In any case, this line tells liblouisxml to format the table contained in the xhtml document according to the table formatting rules it has been given for braille output. Next we have the line: @example heading2 h2 @end example This means that the text between @code{<h2>} and @code{</h2>} is to be formatted according to the Liblouisxml style heading2. A blank line will be left before the heading and the first line will be indented four spaces. After a few more lines we come to: @example no table,cellpadding @end example Note the comma in the second column. This divides the column into two subcolumns. The first is the table element name. The second is called an "attribute" in xml. It gives further instructions about the material enclosed between the starting and ending "tags" of the element (@code{<table>} and @code{</table>}. Full information requires three subcolumns. The third is called the value and gives the actual information. The attribute is merely the name of the information. Much further down we find: @example no table,border,0 @end example Here the element is table, the attribute is border and the value is 0. If liblouisxml were to interpret this, it would mean that the table was to have a border of 0 width. It is not told to do so because tables in braille do not have borders. Now let's look at the file which is included at the beginning of the @file{html.sem} file. This is @file{nemeth.sem}. As with @file{html.sem}, a copy is included in the documentation directory with the name @file{example_nemeth.sem} , but it is not necessarily the one that liblouisxml is currently using. It illustrates several more things about how liblouisxml uses semantic-action files. The first thing you will notice is that for quite a few lines the first and second columns are identical. This is because the MathML element and attribute names are part of a standard, and it was simplest to use the element names for the semantic actions as well. The first line of real interest is: @example math math @end example Every mathematical expression begins with the element @code{<math>} (which may have attributes and values), and ends with @code{</math>}. This is therefore the root element of a mathematical expression. However, mathematical expressions are usually part of a document, so it is not given the semantic action document. The math semantic action causes liblouisxml to carry out special interpretation actions. These will become clearer as we continue to look at the @file{nemeth.sem} file. You will note that this line has three columns. The meaning of the third column is discussed below. After another uninteresting line we come to two that illustrate several more facts about semantic-action files: @example mfrac mfrac ^?,/,^# mfrac mfrac,linethickness,0 ^(,^;%,^) @end example Like the math entry above, the first line has three columns. While the first two columns must always be present, the third column is optional. Here, it is also divided into subcolumns by commas. The element @code{<mfrac>} indicates a fraction. A fraction has two parts, a numerator and a denominator. In xml, we call these parts children of @code{<mfrac>}. They may be represented in various ways, which need not concern us here. What is of real importance is that the third column tells liblouisxml to put the characters @samp{~?} before the numerator, @samp{/} between the numerator and denominator, and @samp{~#} after the denominator. Later on, liblouis will translate these characters into the proper representation of a fraction in the Nemeth Code of Braille Mathematics. (For other mathematical codes, @pxref{Implementing Braille Mathematics Codes}). The second line is of even greater interest. The first column is again @samp{mfrac}, but this line is for binomial coefficient. The second column contains three subcolumns, an element name, an attribute name and an attribute value. The attribute linethickness specifies the thickness of the line separating the numerator and denominator. Here it is 0, so there is no line. This is how the binomial coefficient is represented in print. The third column tells how to represent it in braille. liblouisxml will supply @samp{~(}, upper number, @samp{~%}, lower number, @samp{~)} to liblouis, which will then produce the proper braille representation for the binomial coefficient. Returning to the line for the math element, we see that the third column begins with a backslash followed by an asterisk. The backslash is an escape character which gives a special meaning to the character which follows it. Here the asterisk means that what follows is to be placed at the very end of the mathematical expression, no matter how complex it is. For further discussion of how the third column is used @pxref{Implementing Braille Mathematics Codes}. The third column is not limited to mathematics. It can be used to add characters to anything enclosed by an xml tag. Here is a complete list of the semantic actions which liblouisxml recognizes. Many of them are also the names of styles. These are listed first, preceded by an asterisk. For a discussion of these, @pxref{Customization Configuring liblouisxml}. @table @code @item * arith @item * attribution @item * biblio @item * blanklinebefore @item * caption @item * code @item * contents @item * dedication @item * directions @item * dispmath @item * disptext @item * document @item * exercise1 @item * exercise2 @item * exercise3 @item * glossary @item * graph @item * graphlabel @item * heading1 @item * heading2 @item * heading3 @item * heading4 @item * indexx @item * list @item * matrix @item * music @item * note @item * para @item * quotation @item * section @item * spatial @item * stanza @item * style1 @item * style2 @item * style3 @item * style4 @item * style5 @item * subsection @item * table @item * titlepage @item * trnote @item * volume @item acknowledge @item allcaps @item author @item blankline @item bodymatter @item boldx @item booktitle @item boxline @item cdata @item center @item chemistry @item contracted @item copyright @item endnotes @item footer @item frontmatter @item graphic @item italicx @item jacket @item line @item linkto @item maction @item maligngroup @item malignmark @item math @item menclose @item merror @item mfenced @item mfrac @item mglyph @item mi @item mlabeledtr @item mmultiscripts @item mn @item mo @item mover @item mpadded @item mphantom @item mprescripts @item mroot @item mrow @item ms @item mspace @item msqrt @item mstyle @item msub @item msubsup @item msup @item mtd @item mtext @item mtr @item munder @item munderover @item newpage @item no @item noindent @item none @item preface @item rearmatter @item rightalign @item righthandpage @item runninghead @item semantics @item skip @item softreturn @item specsym @item tblbody @item tblcol @item tblhead @item tblrow @item tnpage @item transcriber @item uncontracted @end table @node Implementing Braille Mathematics Codes, Settings Index, Connecting with the xml Document - Semantic-Action Files, Top @chapter Implementing Braille Mathematics Codes The Nemeth Code of Braille Mathematical and Science Notation has been implemented. Other braille mathematics codes can be implemented by following the same pattern. The Nemeth Code implementation is discussed as an example below. Four tables are used to translate xml documents containing a mixture of text and mathematics into the Nemeth code. They can be found in the subdirectory @file{lbx_files} of the liblouisxml directory. First, the semantic-action file @file{nemeth.sem} is used to interpret the mathematical portions of the xml document (The text portions are interpreted by another semantic-action file which will not be discussed here). After the math and text have been interpreted, two liblouis tables, @file{nemeth.ctb} and @file{en-mathtext.ctb} are used to translate them. Each piece of mathematics or text is translated separately and the pieces are strung together with blanks between them. This results in inaccuracies where mathematics meets text. The fourth table, also a liblouis table, is used to remove these inaccuracies. It is called @file{edittable.ctb}, and it does things like removing the multi-purpose indicator before a blank, inserting the punctuation indicator before a punctuation mark following a math expression, and removing extra spaces. The general format and use of semantic-action files were discussed in the previous section, (@pxref{Connecting with the xml Document - Semantic-Action Files}). In this section we shall concentrate on the optional third column, which is used a lot in @file{nemeth.sem}. While the first two columns can be generated by liblouisxml but must be edited by a person, the third column must always be provided by a human. As previously stated, the third column tells liblouisxml what characters to insert to inform liblouis how to translate the math expression. Look at the following line: @example mfrac mfrac ^?,/,^# @end example You will see that the third column contains two commas. This means that it has three subcolumns. A fraction has a numerator and a denominator. These are called children of the mfrac element. The first subcolumn specifies the characters that liblouisxml should place in front of the numerator. The second subcolumn gives the characters to be placed between the numerator and denominator. Finally, the third subcolumn gives the characters to place after the denominator. You will see that the first subcolumn contains a caret followed by a question mark. The dot pattern for the question mark in computer braille is the same as for the Nemeth start-fraction indicator. The caret is used so that liblouis can tell this apart from a question mark, which also has the same dot pattern in computer braille. The second subcolumn contains a slash but no caret. This is because there is no danger of confusion where the slash is concerned. The third subcolumn does contain a caret, and it also contains a number sign, which corresponds to the Nemeth end-fraction indicator. When liblouisxml encounters the MathML representation of the fraction one-half it produces the following string of characters: @samp{^?1/2^#}. liblouis then removes the carets to get @samp{?1/2#}. As another example, consider the entry in @file{nemeth.sem} for a subscript. @example msub msub ,^;,^" @end example Here the first subcolumn is blank, because nothing is to be placed before the subscripted symbol. The second subcolumn contains a caret and a semicolon (in computer braille). This corresponds to the Nemeth subscript indicator. The third column contains a caret and a quotation mark, corresponding to the Nemeth baseline indicator. liblouisxml translates the MathML expression for x superscript i into @samp{x^;i^}. liblouis subsequently produces @samp{x;i}. There are other steps if the subscript is numeric. These are handled by pass2 opcodes in the liblouis translation table, @file{nemeth.ctb}. You will notice that the entries in @file{nemeth.sem} have various numbers of subcolumns in the third column. In general, the characters given in the first subcolumn are placed before the first child of the element given in the second column. The characters in the second subcolumn are placed before the second child, and so on, until the characters given in the last subcolumn are placed after the last child. Sometimes an element or tag can have an indeterminate number of children. This is true of @code{<math>} itself. Yet, it may be necessary to place some characters after the very last element. Let us look at the @code{<math>} entry. @example math math \eb,\*\ee @end example First let us discuss escape sequences starting with a backslash. These are basically the same as in liblouis. The sequence @samp{\e} is shorthand for the escape character, which would otherwise be represented by @samp{\x001b}. The beginning of a math expression is denoted by an escape character followed by the letter b and the end by an escape character followed by the letter @samp{e}. This enables the editing table to do such things as drop the baseline indicator at the end of a math expression and insert a number sign at the beginning, if needed. Not found in liblouis is the sequence @samp{\*}. This means to put what follows after the very last child of the math element, no matter how many there are. As another example consider: @example mtd mtd \*\ec @end example @code{mtd} is the MathML tag for a table column. There may be many children of this tag. The entry says to put an escape character (hex 1b), plus the letter @samp{c}, after the very last of them. As a final example consider: @example mtr mtr ^.^\,^(,\*^.^\,^)\er @end example @code{mtr} is the MathML tag for a row in a table, in this case a matrix. Each row in a matrix must begin with the dot pattern @samp{46-6-12356} and end with the dot pattern @samp{46-6-12456}. As usual a caret is placed before the corresponding characters. Since dot 6 is a comma, it must be escaped. This is done by placing a backslash before the comma. There are two subcolumns. the first contains the characters to be placed at the beginning of each row. The second starts with @samp{\*}, signifying that the characters following it are to be placed at the end of everything in this row. A subcolumn starting with @samp{\*} must be the last (or only) subcolumn. Here this last subcolumn ends with an escape character and the letter @key{r}, signifying the end of a row. So much for the semantic action file. Even though the characters in the third column were chosen to correspond with nemeth characters, they may not have to be changed for other math codes. liblouis can replace them with anything needed. This brings us to a consideration of the two tables used by liblouis to translate mathematics texts. The first, @file{en-mathtext.ctb} is used to translate text appearing outside math expressions. It is necessary because the Nemeth code requires modifications of Grade 2 braille. Other math codes may not have this requirement. The table actually used to translate mathematics is @file{nemeth.ctb}. It includes two other tables, @file{chardfs.cti} and @file{nemethdefs.cti}. The first gives ordinary character definitions and is included by all the other tables. Note however, that the unbreakable space, @samp{\x00a0}, is translated by dot 9. This is used before and after the equal sign and other symbols in @file{nemeth.ctb}. The second table contains character definitions for special math symbols, most of which are Unicode characters greater than @samp{\x00ff}. The Greek letters are here. So are symbols like the integral sign. Most of the entries in @file{nemeth.ctb} should be familiar from other tables. The unfamiliar ones follow the comments @samp{# Semantic pairs} and @samp{# pass2 corrections}. The first simply replace characters preceded by a caret with the character itself. The second make adjustments in the code generated directly from the @file{nemeth.sem} file. The pass2 opcode is discussed in the liblouis guide (@pxref{Top, , Overview, liblouis-guide, Liblouis Programmer's and User's Guide}). Here are some comments on a few of the entries in @file{nemeth.ctb}. @example pass2 @@1456-1456 @@6-1456 @end example Replaces double start-fraction indicators with the start complex fraction indicator. @example pass2 @@3456-3456 @@6-3456 @end example Replaces double end-fraction indicators with the end-complex-fraction indicator. @example pass2 @@56[$d1-5]@@5 * @end example Removes the subscript and baseline indicators from numeric subscripts. @example pass2 @@5-9 @@9 @end example Removes the baseline or multipurpose indicator before an unbreakable space generated by the translation of an equal sign, etc. @example pass2 @@45-3-5 @@3 @end example Replaces a superscript apostrophe with a simple prime symbol. @example pass2 @@9[]$d @@3456 @end example Puts a number sign before a digit preceded by a blank. @example pass2 @@9-0 @@9 @end example Removes a space following an unbreakable space. We now come to the fourth and last table used for math translation, the editing table, @file{edittable.ctb}. As explained at the beginning, this table is used to remove inaccuracies where math translation butts up against text translation. For example, the Nemeth code puts numbers in the lower part of the cell. However, punctuation marks are also in the lower part of the cell. So Nemeth puts a punctuation indicator, dots @samp{456}, in front of any lower-cell punctuation that immediately follows a mathematical expression. If this occurs inside Mathml it is handled by @file{nemeth.ctb}. However, a MathML expression is often followed by a punctuation mark which is the first part of text. liblouisxml puts a blank between math and text, but this can result in a mathematical expression followed by a blank and then, say, a period, dots @samp{256}. @file{edittable.ctb} replaces the blank with the punctuation indicator. When you look at @file{edittable.ctb} you will see that it begins with an include of @file{chardefs.cti}. Most of the entries are ordinary, but some are interesting. for example, @example always "\s 0 @end example replaces the baseline or multipurpose indicator followed by a space with just a space. @node Settings Index, Function Index, Implementing Braille Mathematics Codes, Top @unnumbered Settings Index @printindex tp @node Function Index, , Settings Index, Top @unnumbered Function Index @printindex fn @bye
docdir = $(datadir)/liblouisxml/doc doc_DATA = \ copyright-notice \ example_canonical.cfg \ example_default.cfg \ example_html.sem \ example_math.sem \ liblouisxml-guide.html \ liblouisxml-guide.txt EXTRA_DIST = \ copyright-notice \ example_canonical.cfg \ example_default.cfg \ example_html.sem \ example_math.sem \ liblouisxml-guide.html \ liblouisxml-guide.txt info_TEXINFOS = liblouisxml-guide.texi SUFFIXES = .txt .texi.txt: $(MAKEINFO) --plaintext $< -o $@
2008-11-13 Christian Egli <christian.egli@xxxxxxxx> * doc/liblouisxml-guide.texi: Added a texinfo version of the liblouis guide. * doc/Makefile.am: Integrated texinfo version of liblouis guide in automake process. John J. Boyer, john.boyer@xxxxxxxxxxxxxxxx Release liblouisxml-1.4.2, June 16, 2008 Converted to Gnu autotools A few bugs fixed Release liblouisxml-1.4.1 linux-Makefile optimized for RedHat If there are errors in a liblouis table you will now get only a single arror report instead of myriads. Release liblouisxml-1.4.0, May 12, 2008 doc directory renamed docs . Semantic-action and configuration files in this directory have been prefixed with example_ mode parameter added to functions in liblouisxml.h Cdata sections can now be translated as text, computer code or ignored. A function to find the true namo of the "Program Files" directory in Windows has been written by Yuemei Sun of ViewPlus Technologies. It is in the paths.c module. Paths for translation tables, semantic action files and configuration files can now be assigned more flexibly. See paths.c xml2brl will now accept configuration settings on the command line, so it is unnecessary to change a configuration file. It is now possible to pass a configuration string in memory to the library, not just the name of a configuration file. For details on all these changes see docs/liblouisxml-guide.html