[liblouis-liblouisxml] [PATCH 1/1] Added a texinfo version of the liblouis guide.

  • From: Christian Egli <christian.egli@xxxxxxxx>
  • To: liblouis-liblouisxml <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Wed, 12 Nov 2008 17:08:39 +0100

include it in the automake process and add a changelog entry. Also
make sure html and txt versions are built on make dist.
---
 ChangeLog               |    6 +
 doc/Makefile.am         |    6 +
 doc/liblouis-guide.texi | 1653 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1665 insertions(+), 0 deletions(-)
 create mode 100644 doc/liblouis-guide.texi
diff --git a/ChangeLog b/ChangeLog
index 7227b4e..6756b55 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2008-11-12  Christian Egli  <christian.egli@xxxxxxxx>
+
+       * doc/liblouis-guide.texi: Added the guide in texinfo
+       * doc/Makefile.am (.texi.txt): Integrate the texinfo guide in the
+       build system.
+
 John J. Boyer john.boyer@xxxxxxxxxxxxxxxx
 
 Release liblouis-1.3.8, June 16, 2008
diff --git a/doc/Makefile.am b/doc/Makefile.am
index 16d2a4b..0c6d731 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -8,3 +8,9 @@ EXTRA_DIST = \
        liblouis-guide.html \
        liblouis-guide.txt
 
+info_TEXINFOS = liblouis-guide.texi
+
+SUFFIXES                = .txt
+
+.texi.txt:
+       $(MAKEINFO) --plaintext $< -o $@
diff --git a/doc/liblouis-guide.texi b/doc/liblouis-guide.texi
new file mode 100644
index 0000000..7366a0f
--- /dev/null
+++ b/doc/liblouis-guide.texi
@@ -0,0 +1,1653 @@
+\input texinfo
+@c %**start of header
+@setfilename liblouis-guide.info
+@include version.texi
+@settitle Liblouis Programmer's and User's Guide
+
+@dircategory Misc
+@direntry
+* Liblouis: (liblouis). A braille translator and back-translator 
+@end direntry
+
+@c Version and Contact Info
+@set MAINTAINERSITE 
@uref{http://www.jjb-software.com/liblouis-guide.html,maintainers webpage}
+@set AUTHOR John J. Boyer
+@set MAINTAINER John J. Boyer
+@set MAINTAINEREMAIL @email{john.boyer@xxxxxxxxxxxxxxxx}
+@set MAINTAINERCONTACT @uref{mailto:john.boyer@xxxxxxxxxxxxxxxx,contact the 
maintainer}
+@c %**end of header
+@finalout
+
+@c Macro definitions
+
+@c Opcode.
+@macro opcode{name, args}
+@findex \name\
+@item \name\ \args\
+@end macro
+
+@macro doubleOpcode{name1, args1, name2, args2}
+@findex \name1\
+@findex \name2\
+@item \name1\ \args1\
+@itemx \name2\ \args2\
+@end macro
+
+@copying
+This manual is for liblouis (version @value{VERSION}, @value{UPDATED}),
+a Braille Translation and Back-Translation Library derived from the
+Linux screenreader @acronym{BRLTTY}. 
+
+Copyright @copyright{} 1999-2008 by the @acronym{BRLTTY} Team.
+
+It is also Copyright @copyright{} 2004-2008 by ViewPlus Technologies,
+Inc. @uref{www.viewplus.com} and JJB Software, Inc.
+@uref{www.jjb-software.com}.
+
+@quotation
+This file is free software; you can redistribute it and/or modify it
+under the terms of the GNU Lesser (or library) General Public License
+(LGPL) as published by the Free Software Foundation; either version 3,
+or (at your option) any later version.
+
+This file is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+Lesser (or Library) General Public License LGPL for more details.
+
+You should have received a copy of the GNU Lesser (or Library) General
+Public License (LGPL) along with this program; see the file COPYING.
+If not, write to the Free Software Foundation, 51 Franklin Street,
+Fifth Floor, Boston, MA 02110-1301, USA.
+@end quotation
+@end copying
+
+@titlepage
+@title Liblouis Programmer's and User's Guide
+
+@subtitle for version @value{VERSION}, @value{UPDATED}
+@author by John J. Boyer
+
+@c The following two commands start the copyright page.
+@page
+@vskip 0pt plus 1filll
+@insertcopying
+@end titlepage
+
+@c Output the table of contents at the beginning.
+@contents
+
+@ifnottex
+@node Top, Introduction, (dir), (dir)
+@top Liblouis Programmer's and User's Guide
+
+@insertcopying
+@end ifnottex
+
+@menu
+* Introduction::                
+* Programming with liblouis::   
+* Test Programs::               
+* How to Write Translation Tables::  
+* Notes on Back-Translation::   
+* Key Index::                   
+
+@detailmenu
+ --- The Detailed Node Listing ---
+
+Programming with liblouis
+
+* Overview::                    
+* lou_version::                 
+* lou_translateString::         
+* lou_translate::               
+* lou_backTranslateString::     
+* lou_backTranslate::           
+* lou_hyphenate::               
+* lou_logFileName::             
+* lou_logPrint::                
+* lou_getTable::                
+* lou_readCharFromFile::        
+* lou_free::                    
+
+Test Programs
+
+* lou_checktable::              
+* lou_allround::                
+* lou_translate -f | -b tablename::  
+
+How to Write Translation Tables
+
+* Hyphenation Tables::          
+* Character-Definition Opcodes::  
+* Braille Indicator Opcodes::   
+* Emphasis Opcodes::            
+* Special Symbol Opcodes::      
+* Special Processing Opcodes::  
+* Translation Opcodes::         
+* Character-Class Opcodes::     
+* Swap Opcodes::                
+* The Context and Multipass Opcodes::  
+* The correct Opcode::          
+* Miscellaneous Opcodes::       
+
+@end detailmenu
+@end menu
+
+@node  Introduction, Programming with liblouis, Top, Top
+@chapter Introduction
+
+Liblouis is an open-source braille translator and back-translator
+derived from the translation routines in the BRLTTY screenreader for
+Linux. It has, however, gone far beyond these routines. It is named in
+honor of Louis Braille. In Linux and Mac OSX it is a shared library,
+and in Windows it is a DLL. For installation instructions see the
+README file. Please report bugs and oddities to the maintainer,
+@email{john.boyer@@jjb-software.com}
+
+This documentation is derived from Chapter 7 of the BRLTTY manual, but
+it has been extensively rewritten to cover new features.
+
+Please read the following copyright and warranty information. Note
+that this information also applies to all source code, tables and
+other files in this distribution of liblouis. It applies similarly to
+the sister library liblouisxml.
+
+This file is maintained by John J. Boyer
+@email{john.boyer@@jjb-software.com}.
+
+Persons who wish to write translation tables but will not be
+programming with liblouis may want to skip ahead to @ref{Test
+Programs} or @ref{How to Write Translation Tables}.
+
+@node Programming with liblouis, Test Programs, Introduction, Top
+@chapter Programming with liblouis
+
+@menu
+* Overview::                    
+* lou_version::                 
+* lou_translateString::         
+* lou_translate::               
+* lou_backTranslateString::     
+* lou_backTranslate::           
+* lou_hyphenate::               
+* lou_logFileName::             
+* lou_logPrint::                
+* lou_getTable::                
+* lou_readCharFromFile::        
+* lou_free::                    
+@end menu
+
+@node Overview, lou_version, Programming with liblouis, Programming with 
liblouis
+@section Overview
+
+You use the liblouis library by calling eleven functions,
+@code{lou_translateString}, @code{lou_backTranslateString},
+@code{lou_logFileName}, @code{lou_logPrint}, @code{lou_getTable},
+@code{lou_translate}, @code{lou_backTranslate}, @code{lou_hyphenate},
+@code{lou_readCharFromFile} and @code{lou_free}. These are described
+below. The header file, @file{liblouis.h}, also contains brief
+descriptions. Liblouis is written in straight C. It has just three
+code modules, @file{compileTranslationTable.c},
+@file{lou_translateString.c} and @file{lou_backTranslateString.c}. In
+addition, there are two header files, @file{liblouis.h}, which defines
+the API, and @file{louis.h}, used only internally. The latter includes
+@file{liblouis.h}.
+
+@file{compileTranslationTable.c} keeps track of all translation tables
+which an application has used. It is called by the translation,
+hyphenation and checking functions when they start. If a table has not
+yet been compiled @file{compileTranslationTable.c} checks it for
+correctness and compiles it into an efficient internal representation.
+The main entry point is @code{lou_getTable}. Since it is the module
+that keeps track of memory usage, it also contains the @code{lou_free}
+function. In addition, it contains the @code{lou_logFileName} and
+@code{lou_logPrint} functions, plus some utility functions which are
+used by the other modules.
+
+By default, liblouis handles all characters internally as 16-bit
+unsigned integers. It can be compiled for 32-bit characters as
+explained below. The meanings of these integers are not hard-coded.
+rather they are defined by the character-definition opcodes. However,
+the standard printable characters, from decimal 32 to 126 are
+recognized for the purpose of processing the opcodes. Hence, the
+following definition is included in @file{liblouis.h}. It is correct
+for computers with at least 32-bit processors.
+
+@example
+#define widechar unsigned short int
+@end example
+
+To make liblouis handle 32-bit Unicode simply remove the word
+@code{short} in the above define. This will cause the translate and
+back-translate functions to expect input in 32-bit form and to deliver
+their output in this form. The input to the compiler (tables) is
+unaffected except that two new escape sequences for 20-bit and 32-bit
+characters are recognized.
+
+Here are the definitions of the eleven liblouis functions and their
+parameters. They are given in terms of 16-bit Unicode. If liblouis has
+been compiled for 32-bit Unicode simply read 32 instead of 16.
+
+@node lou_version, lou_translateString, Overview, Programming with liblouis
+@section lou_version
+
+@example
+char *lou_version ()
+@end example
+
+This function returns a pointer to a character string containing the
+version of liblouis, plus other information, such as the release date
+and perhaps notable changes.
+
+@node lou_translateString, lou_translate, lou_version, Programming with 
liblouis
+@section lou_translateString
+
+@example
+int lou_translateString (
+    const char *const trantab, 
+    const widechar *const inbuf, 
+    int *inlen, 
+    widechar *outbuf, 
+    int *outlen, 
+    char *typeform, 
+    char *spacing, 
+    int mode);
+@end example
+
+This function takes a string of 16-bit Unicode characters in inbuf and
+translates it into a string of 16-bit characters in outbuf. Each
+16-bit character produces a particular dot pattern in one braille cell
+when sent to an embosser or braille display or to a screen typefont.
+Which 16-bit character represents which dot pattern is indicated by
+the character-definition and display opcodes in the translation table.
+
+The trantab parameter points to a list of translation tables separated
+by commas. If only one table is given, no comma should be used after
+it. It is these tables which control just how the translation is made,
+whether in Grade 2, Grade 1, or something else. The first table in the
+list must be a full pathname, unless the tables are in the current
+directory. The pathname is extracted up to the filename. The first
+table is then compiled. The pathname is then added to the name of the
+second table, which is compiled, and so on. The tables in a list are
+all compiled into the same internal table. The list is then regarded
+as the name of this table. As explained in the section @ref{How to
+Write Translation Tables}, each table is a file which may be plain
+text, big-endian Unicode or little-endian Unicode. A table (or list of
+tables) is compiled into an internal representation the first time it
+is used. Liblouis keeps track of which tables have been compiled. For
+this reason, it is essential to call the lou_free function at the end
+of your application to avoid memory leaks. Do @emph{NOT} call
+@code{lou_free} after each translation. This will force liblouis to
+compile the translation tables each time they are used, leading to
+great inefficiency.
+
+Note that both the @code{*inlen} and @code{*outlen} parameters are
+pointers to integers. When the function is called, these integers
+contain the maximum input and output lengths, respectively. When it
+returns, they are set to the actual lengths used.
+
+The typeform parameter is used to indicate italic type, boldface type,
+computer braille, etc. It is a string of characters with the same
+length as the input buffer pointed to by @code{*inbuf}. However, it is
+used to pass back character-by-character results, so enough space must
+be provided to match the @code{*outlen} parameter. Each character
+indicates the typeform of the corresponding character in the input
+buffer. The values are as follows: 0 plain-text; 1 italic; 2 bold; 4
+underline; 8 computer braille. These values can be added for multiple
+emphasis. If this parameter is @code{NULL}, no checking for typeforms
+is done. In addition, if this parameter is not @code{NULL}, it is set
+on return to have an 8 at every position corresponding to a character
+in outbuf which was defined to have a dot representation containing
+dot 7, dot 8 or both, and to 0 otherwise.
+
+The spacing parameter is used to indicate differences in spacing
+between the input string and the translated output string. It is also
+of the same length as the string pointed to by @code{*inbuf}. If this
+parameter is @code{NULL}, no spacing information is computed.
+
+The mode parameter specifies how the translation should be done. The
+valid values of mode are listed in @file{liblouis.h}. They are all
+powers of 2, so that a combined mode can be specified by adding up
+different values.
+
+The function returns 1 if no errors were encountered and 0 if a
+complete translation could not be done.
+
+@node lou_translate, lou_backTranslateString, lou_translateString, Programming 
with liblouis
+@section lou_translate
+
+@example
+int lou_translate (
+    const char *const trantab, 
+    const widechar * const inbuf, 
+    int *inlen, widechar * outbuf, 
+    int *outlen, 
+    char *typeform, 
+    char *spacing, 
+    int *outputPos, 
+    int *inputPos, 
+    int *cursorPos, 
+    int mode);
+@end example
+
+This function adds the parameters `outputPos`, `inputPos` and
+`cursorPos`, to facilitate use in screenreader programs. The
+`outputPos` parameter must point to an array of integers with at least
+outlen elements. On return, this array will contain the position in
+inbuf corresponding to each output position. Similarly, `inputPos`
+must point to an array of integers of at least inlen elements. On
+return, this array will contain the position in outbuf corresponding
+to each position in `inbuf`. `cursorPos` must point to an integer
+containing the position of the cursor in the input. On return, it will
+contain the cursor position in the output. Any parameter after outlen
+may be @code{NULL}. In this case, the actions corresponding to it will not be
+carried out. The mode parameter, however, must be present and must be
+an integer, not a pointer to an integer. If the `compbrlAtCursor` bit
+is set in the mode parameter the space-bounded characters containing
+the cursor will be translated in computer braille.
+
+@node lou_backTranslateString, lou_backTranslate, lou_translate, Programming 
with liblouis
+@section lou_backTranslateString
+
+@example
+int lou_backTranslateString (
+    const char *const trantab, 
+    const widechar *const inbuf, 
+    int *inlen, 
+    widechar *outbuf, 
+    int *outlen, 
+    char *typeform, 
+    char *spacing, 
+    int mode);
+@end example
+
+This is exactly the opposite of @code{lou_translateString}.
+@code{inbuf} is a string of 16-bit Unicode characters representing
+braille. @code{outbuf} will contain a string of 16--bit Unicode
+characters. @code{typeform} will indicate any emphasis found in the
+input string, while @code{spacing} will indicate any differences in
+spacing between the input and output strings. The @code{typeform} and
+@code{spacing} parameters may be @code{NULL} if this information is
+not needed. @code{mode} again specifies how the back-translation
+should be done.
+
+@node lou_backTranslate, lou_hyphenate, lou_backTranslateString, Programming 
with liblouis
+@section lou_backTranslate
+
+@example
+int lou_backTranslate (
+    const char *const trantab, 
+    const widechar *const inbufx, 
+    int *inlen, 
+    widechar * outbuf, 
+    int *outlen, 
+    char *typeform, 
+    char *spacing, 
+    int *outputPos, 
+    int *inputPos, 
+    int *cursorPos, 
+    int mode);
+@end example
+
+This function is exactly the inverse of @code{lou_translate}.
+
+@node lou_hyphenate, lou_logFileName, lou_backTranslate, Programming with 
liblouis
+@section lou_hyphenate
+
+@example
+int lou_hyphenate (
+    const char *const trantab, 
+    const widechar * const inbuf, 
+    int inlen, 
+    char *hyphens, 
+    int mode);
+@end example
+
+This function looks at the characters in @code{inbuf} and if it finds
+a sequence of letters attempts to hyphenate it as a word. Leading and
+trailing punctuation marks are ignored. The table named by the
+@code{trantab} parameter must contain a hyphenation table. If it does
+not, the function does nothing. @code{inlen} is the length of the
+character string in @code{inbuf}. @code{hyphens} is an array of
+characters and must be of size @code{inlen}. If hyphenation is
+successful it will have a 1 at the beginning of each syllable and a 0
+elsewhere. If the @code{mode} parameter is 0 @code{inbuf} is assumed
+to contain untranslated characters. Any nonzero value means that
+@code{inbuf} contains a translation. In this case, it is
+back-translated, hyphenation is performed, and it is retranslated so
+that the hyphens can be placed correctly. The @code{lou_translate} and
+@code{lou_backTranslate} functions are used in this process.
+@code{lou_hyphenate} returns 1 if hyphenation was successful and 0
+otherwise. In the latter case, the contents of the @code{hyphens}
+parameter are undefined. This function was provided for use in
+liblouisxml.
+
+@node lou_logFileName, lou_logPrint, lou_hyphenate, Programming with liblouis
+@section lou_logFileName
+
+@example
+void lou_logFileName (char *fileName);
+@end example
+
+This function is used when it is not convenient either to let messages
+be printed on stderr or to use redirection, as when liblouis is used
+in a GUI application or in liblouisxml. Any error messages generated
+will be printed to the file given in this call. The entire pathname of
+the file must be given.
+
+@node lou_logPrint, lou_getTable, lou_logFileName, Programming with liblouis
+@section lou_logPrint
+
+@example
+void lou_logPrint (char *format, ...);
+@end example
+
+This function is called like @code{fprint}. It can be used by other
+libraries to print messages to the file specified by the call to
+@code{lou_logFileName}. In particular, it is used by the companion
+library liblouisxml.
+
+@node lou_getTable, lou_readCharFromFile, lou_logPrint, Programming with 
liblouis
+@section lou_getTable
+
+@example
+void *lou_getTable (char *tablelist);
+@end example
+
+@code{tablelist} is a list of names of table files separated by
+commas, as explained previously. If no errors are found this function
+returns a pointer to the compiled table. If errors are found messages
+are printed to the log file, which is stderr unless a different
+filename has been given using the @code{lou_logFileName} function.
+Errors result in a @code{NULL} pointer being returned.
+
+@node lou_readCharFromFile, lou_free, lou_getTable, Programming with liblouis
+@section lou_readCharFromFile
+
+@example
+int lou_readCharFromFile (const char *fileName, int *mode);
+@end example
+
+This function is provided for situations where it is necessary to read
+a file which may contain little-endian or big-endian 16-bit Unicode
+characters or ASCII8 characters. The return value is a little-endian
+character, encoded as an integer. The @code{fileName} parameter is the
+name of the file to be read. The @code{mode} parameter is a pointer to
+an integer which must be set to 1 on the first call. After that, the
+function takes care of it. On end-of-file the function returns
+@code{EOF}.
+
+@node lou_free,  , lou_readCharFromFile, Programming with liblouis
+@section lou_free
+
+@example
+void lou_free ();
+@end example
+
+This function should be called at the end of the application to free
+all memory allocated by liblouis. Failure to do so will result in
+memory leaks. Do @emph{NOT} call @code{lou_free} after each
+translation. This will force liblouis to compile the translation
+tables every time they are used, resulting in great inefficiency.
+
+@node Test Programs, How to Write Translation Tables, Programming with 
liblouis, Top
+@chapter Test Programs
+
+Three test programs are provided as part of the liblouis package. They
+are intended for testing liblouis and for debugging tables. None of
+them is suitable for braille transcription. An application that can be
+used for transcription is xml2brl, which is part of the liblouisxml
+package. The source code of the test programs can be studied to learn
+how to use the liblouis library and they can be used to perform the
+following functions.
+
+@menu
+* lou_checktable::              
+* lou_allround::                
+* lou_translate -f | -b tablename::  
+@end menu
+
+@node lou_checktable, lou_allround, Test Programs, Test Programs
+@section lou_checktable
+
+To use this program type @kbd{lou_checktable} followed by a space and
+the name of a table. If the table contains errors, appropriate
+messages will be displayed. If there are no errors the message
+@samp{no errors found.} will be shown.
+
+@node lou_allround, lou_translate -f | -b tablename, lou_checktable, Test 
Programs
+@section lou_allround
+
+This program tests every capability of the liblouis library. It is
+completely interactive. To start it, type @kbd{lou_allround}, enter.
+You will see a few lines telling you how to use the program. Pressing
+one of the letters in parentheses and then enter will take you to a
+message asking for more information or for the answer to a yes/no
+question. Typing the letter @samp{r} and then @key{RET} will take you
+to a screen where you can enter a line to be processed by the library
+and then view the results.
+
+@node lou_translate -f | -b tablename,  , lou_allround, Test Programs
+@section lou_translate -f | -b tablename
+
+This program translates whatever is on the standard input unit and
+prints it on the standard output unit. It is intended for large-scale
+testing of the accuracy of translation and back-translation. The first
+argument must be @option{-f} for forward translation or @option{-b} for
+backward translation. To use it to translate or back-translate a file
+use a line like
+
+@kbd{<liblouis-guide.txt ./lou_translate -f en-us-g2.ctb >testtrans}
+
+@node How to Write Translation Tables, Notes on Back-Translation, Test 
Programs, Top
+@chapter How to Write Translation Tables
+
+Several translation (contraction) tables have already been made up.
+They are included in this distribution and should be studied as part
+of the documentation. The most helpful are listed in the following
+table:
+
+@table @file
+@item chardefs.cti 
+Character definitions for U.S. tables
+@item compress.ctb
+Remove excessive white-space
+@item en-us-g1.ctb
+Uncontracted American English
+@item en-us-g2.ctb
+Contracted or Grade 2 American English
+@item fr-integral.ctb
+Uncontracted Unified French
+@item fr-abrege.ctb
+Contracted Unified French
+@item french.dis
+display entries for french character to braille cells
+@item text.nab.dis
+North American characters to cells associations
+
+@end table
+
+The names used for files containing translation tables are completely
+arbitrary. They are not interpreted in any way by the translator.
+Contraction tables may be 8-bit ASCII files, 16-bit big-endian Unicode
+files or 16-bit little-endian Unicode files. Blank lines are ignored.
+Any leading and trailing white-space (any number of blanks and/or
+tabs) is ignored. Lines which begin with a number sign or hatch mark
+(@samp{#}) are ignored, i.e. they are comments. If the number sign is
+not the first non-blank character in the line, it is treated as an
+ordinary character. Lines which are not blank or comments define table
+entries. The general format of a table entry is:
+
+@example
+opcode operands comments
+@end example
+
+Table entries may not be split between lines. The opcode is a mnemonic
+that specifies what the entry does. The operands may be character
+sequences, braille dot patterns or occasionally something else. They
+are described for each opcode. With some exceptions, opcodes expect a
+certain number of operands. Any text on the line after the last
+operand is ignored, and may be a comment. A few opcodes accept a
+variable number of operands. In this case a number sign begins a
+comment unless it is preceded by a backslash (@samp{\}). For a list of
+opcodes, with a link to each one, see [34]Index of opcodes
+
+Here are some examples of table entries.
+
+@example
+# This is a comment.
+always world 456-2456 A word and the dot pattern of its contraction
+@end example
+
+Most opcodes have both a "characters" operand and a "dots" operand,
+though some have only one and a few have other types.
+
+The characters operand consists of any combination of characters and
+escape sequences proceeded and followed by whitespace. Escape
+sequences are used to represent difficult characters. They begin with
+a backslash (`\`). They are:
+
+@table @kbd
+@item \\ 
+backslash
+@item \f
+form feed
+@item \n
+new line
+@item \r
+carriage return
+@item \s
+blank (space)
+@item \t
+horizontal tab
+@item \v
+vertical tab
+@item \e
+"escape" character (hex 1b, dec 27)
+@item \xhhhh
+4-digit hexadecimal value of a character
+
+@end table
+
+If liblouis has been compiled for 32-bit Unicode the following are
+also recognized.
+
+@table @kbd
+@item \xhhhhh
+5-digit (20 bit) character
+@item \xhhhhhhhh
+Full 32-bit value.
+
+@end table
+
+The dots operand is a braille dot pattern. The real braille dots, 1
+through 8, must be specified with their standard numbers. liblouis
+recognizes "virtual dots," which are used for special purposes, such
+as distinguishing accent marks. There are seven virtual dots. They are
+specified by the number 9 and the letters a through f. For a
+multi-cell dot pattern, the cell specifications must be separated from
+one another by a dash (@samp{-}). For example, the contraction for the
+English word lord (the letter l preceded by dot 5) would be specified
+as 5-123. A space may be specified with the special dot number 0.
+
+An opcode which is helpful in writing translation tables is
+@code{include}. Its format is:
+
+@example
+include filename
+@end example
+
+It reads the file indicated by filename and incorporates or includes
+its entries into the table. Included files can include other files,
+which can include other files, etc. for an example, see what files are
+included by the entry include @file{en-us-g1.ctb} in the table
+@file{en-us-g2.ctb}. If the included file is not in the same directory
+as the main table, use a full pathname for filename.
+
+The order of the various types of opcodes or table entries is
+important. Character-definition opcodes should come first. However, if
+the optional @code{display} opcode is used (See [35]the display
+Opcode) it should precede character-definition opcodes.
+Braille-indicator opcodes should come next. Translation opcodes should
+follow. The @code{context} opcode is a translation opcode, even though
+it is considered along with the multipass opcodes. These latter should
+follow the translation opcodes. the @code{correct} opcode can be used
+anywhere after the character-definition opcodes, but it is probably a
+good idea to group all @code{correct} opcodes together. The
+@code{include} opcode can be used anywhere, but the order of entries
+in the combined table must conform to the order given above. Within
+each type of opcode, the order of entries is generally unimportant.
+Thus the translation entries can be grouped alphabetically or in any
+other order that is convenient.
+
+@menu
+* Hyphenation Tables::          
+* Character-Definition Opcodes::  
+* Braille Indicator Opcodes::   
+* Emphasis Opcodes::            
+* Special Symbol Opcodes::      
+* Special Processing Opcodes::  
+* Translation Opcodes::         
+* Character-Class Opcodes::     
+* Swap Opcodes::                
+* The Context and Multipass Opcodes::  
+* The correct Opcode::          
+* Miscellaneous Opcodes::       
+@end menu
+
+@node Hyphenation Tables, Character-Definition Opcodes, How to Write 
Translation Tables, How to Write Translation Tables
+@section Hyphenation Tables
+
+Hyphenation tables are necessary to make opcodes such as
+@ref{nocross opcode} function properly. There are no opcodes for
+hyphenation table entries because these tables have a special format.
+Therefore, they cannot be specified as part of an ordinary table.
+Rather, they must be included using the @ref{include-opcode}.
+Hyphenation tables must follow character definitions. For an example
+of a hyphenation table, see @file{hyph_en_US.dic}.
+
+@node Character-Definition Opcodes, Braille Indicator Opcodes, Hyphenation 
Tables, How to Write Translation Tables
+@section Character-Definition Opcodes
+
+These opcodes are needed to define attributes such as digit,
+punctuation, letter, etc. for all characters and their dot patterns.
+liblouis has no built-in character definitions, but such definitions
+are essential to the operation of the context opcode, the correct
+opcode, the multipass opcodes and the back-translator. If the dot
+pattern is a single cell, it is used to define the mapping between dot
+patterns and characters, unless a display opcode for that
+character-dot-pattern pair has been used previously. If only a
+single-cell dot pattern has been given for a character, that dot
+pattern is defined with the character's own attributes. If more than
+one cell is given and some of them have not previously been defined as
+single cells, the undefined cells are entered into the dots table with
+the undefined attribute. This is done for backward compatibility with
+old tables, but it may cause problems with the above opcodes or
+back-translation. For this reason, every single-cell dot pattern
+should be defined before it is used in a multi-cell character
+representation. The best way to do this is to use the 8-dot computer
+braille representation for the particular braille code. If a character
+or dot pattern used in any rule, except those with the display,
+repeated or replace opcodes, is not defined by one of the
+character-definition opcodes, liblouis will give an error message and
+refuse to continue until the problem is fixed. If the translator or
+back-translator encounters an undefined character in its input it
+produces a succinct error indication in its output, and the character
+is treated as a space.
+
+@table @code
+@opcode{space, character dots}
+Defines a character as a space and also defines the dot pattern as
+such. for example:
+
+@example
+space \s 0 \s is the escape sequence for blank; 0 means no dots.
+@end example
+
+@opcode{punctuation, character dots}
+Associates a punctuation mark in the particular language with a
+braille representation and defines the character and dot pattern as
+punctuation. For example:
+
+@example
+punctuation . 46 dot pattern for period in NAB computer braille
+@end example
+
+@opcode{digit, character dots}
+Associates a digit with a dot pattern and defines the character as a
+digit. For example:
+
+@example
+digit 0 356 NAB computer braille
+@end example
+
+@opcode{uplow, characters dots@{,dots@}}
+The characters operand must be a pair of letters, of which the first
+is uppercase and the second lowercase. The first dots suboperand
+indicates the dot pattern for the upper-case letter. It may have more
+than one cell. The second dots suboperand must be separated from the
+first by a comma and is optional, as indicated by the square brackets.
+If present, it indicates the dot pattern for the lower-case letter. It
+may also have more than one cell. If the second dots suboperand is not
+present the first is used for the lower-case letter as well as the
+upper-case letter. This opcode is needed because not all languages
+follow a consistent pattern in assigning Unicode codes to upper and
+lower case letters. It should be used even for languages that do. The
+distinction is important in the forward translator. for example:
+
+@example
+uplow Aa 1
+@end example
+
+@opcode{letter, character dots}
+Associates a letter in the language with a braille representation and
+defines the character as a letter. This is intended for letters which
+are neither uppercase nor lowercase.
+
+@opcode{lowercase, character dots}
+Associates a character with a dot pattern and defines the character as
+a lowercase letter. Both the character and the dot pattern have the
+attributes lowercase and letter.
+
+@opcode{uppercase, character dots}
+Associates a character with a dot pattern and defines the character as
+an uppercase letter. Both the character and the dot pattern have the
+attributes uppercase and letter. Lowercase and uppercase should be
+used when a letter has only one case. Otherwise use "uplow".
+
+@opcode{litdigit, digit dots}
+Associates a digit with the dot pattern which should be used to
+represent it in literary texts. For example:
+
+@example
+litdigit 0 245
+litdigit 1 1
+@end example
+
+@opcode{sign, character dots}
+Associates a character with a dot pattern and defines both as a sign.
+This opcode should be used for things like at sign, percent, dollar
+sign, etc. Do not use it to define ordinary punctuation such as period
+and comma. For example:
+
+@example
+sign % 4-25-1234 literary percent sign
+@end example
+
+@opcode{math, character dots}
+Associates a character and a dot pattern and defines them as a
+mathematical symbol. It should be used for less than, greater than,
+equals, plus, etc. For example:
+
+@example
+math + 346 plus
+@end example
+
+@end table
+
+@node Braille Indicator Opcodes, Emphasis Opcodes, Character-Definition 
Opcodes, How to Write Translation Tables
+@section Braille Indicator Opcodes
+
+Braille indicators are dot patterns which are inserted into the
+braille text to indicate such things as capitalization, italic type,
+computer braille, etc. The opcodes which define them are followed only
+by a dot pattern, which may be one or more cells.
+
+@table @code
+@opcode{capsign, dots}
+The dot pattern which indicates capitalization of a single letter. In
+English, this is dot 6. for example:
+
+@example
+capsign 6
+@end example
+
+@opcode{begcaps, dots}
+The dot pattern which begins a block of capital letters. For example:
+
+@example
+begcaps 6-6
+@end example
+
+@opcode{endcaps, dots}
+The dot pattern which ends a block of capital letters within a word.
+For example:
+
+@example
+endcaps 6-3
+@end example
+
+@opcode{letsign, dots}
+This indicator is needed in Grade 2 to show that a single letter is
+not a contraction. It is also used when an abbreviation happens to be
+a sequence of letters that is the same as a contraction. For example:
+
+@example
+letsign 56
+@end example
+
+@opcode{noletsign, letters}
+The letters in the operand will not be proceeded by a letter sign.
+More than one noletsign opcode can be used. This is equivalent to a
+single entry containing all the letters. In addition, if a single
+letter, such as "a" in English, is defined as a word or largesign, it
+will be treated as though it had also been specified in a noletsign
+entry.
+
+@opcode{noletsignbefore, characters}
+If any of the characters proceeds a single letter without a space a
+letter sign is not used. By default the characters apostrophe and
+period have this property. Use of a noletsignbefore entry cancels the
+defaults. If more than one noletsignbefore entry is used, the
+characters in all entries are combined.
+
+@opcode{noletsignafter, characters}
+If any of the characters follows a single letter without a space a
+letter sign is not used. By default the characters apostrophe and
+period have this property. Use of a noletsignafter entry cancels the
+defaults. If more than one noletsignafter entry is used the characters
+in all entries are combined.
+
+@opcode{numsign, dots}
+The translator inserts this indicator before numbers made up of digits
+defined with the litdigit opcode to show that they are a number and
+not letters or some other symbols. For example:
+
+@example
+numsign 3456
+@end example
+
+@end table
+
+@node Emphasis Opcodes, Special Symbol Opcodes, Braille Indicator Opcodes, How 
to Write Translation Tables
+@section Emphasis Opcodes
+
+these also define braille indicators, but they require more
+explanation. There are four sets, for italic, bold, underline and
+computer braille. In each of the first three sets there are seven
+opcodes, for use before the first word of a phrase, for use before the
+last word, for use after the last word, for use before the first
+letter (or character) if emphasis starts in the middle of a word, for
+use after the last letter (or character) if emphasis ends in the
+middle of a word, before a single letter (or character), and to
+specify the length of a phrase to which the first-word and
+last-word-before indicators apply. This rather elaborate set of
+emphasis opcodes was devised to try to meet all contingencies. It is
+unlikely that a translation table will contain all of them. The
+translator checks for their presence. If they are present, it first
+looks to see if the single-letter indicator should be used. Then it
+looks at the word (or phrase) indicators and finally at the
+multi-letter indicators.
+
+The translator will apply up to two emphasis indicators to each phrase
+or string of characters, depending on what the typeform parameter in
+its calling sequence indicates (@pxref{Programming with liblouis}).
+
+For computer braille there are only two braille indicators, for the
+beginning and end of a sequence of characters to be rendered in
+computer braille. Such a sequence may also have other emphasis. The
+computer braille indicators are applied not only when computer braille
+is indicated in the typeform parameter, but also when a sequence of
+characters is determined to be computer braille because it contains a
+subsequence defined by the compbrl or literal opcodes.
+
+Here are the various emphasis opcodes.
+
+@table @code
+
+@opcode{firstwordital, dots}
+This is the braille indicator to be placed before the first word of an
+italicized phrase that is longer than the value given in
+lenitalphrase. For example:
+
+@example
+firstwordital 46-46 English indicator
+@end example
+
+@doubleOpcode{lastworditalbefore, dots, italsign, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed before the last word of an italicized phrase. In addition, if
+firstwordital is not used, this braille indicator is doubled and
+placed before the first word. do not use lastworditalbefore and
+lastworditalafter in the same table. For example:
+
+@example
+lastworditalbefore 4-6
+@end example
+
+@opcode{lastworditalafter, dots}
+This is the braille indicator to be placed after the last word of an
+italicized phrase. Do not use lastworditalbefore and lastworditalafter
+in the same table. @xref{lenitalphrase-opcode,, the lenitalphrase opcode}.
+
+@doubleOpcode{firstletterital, dots,begital, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed before the first letter (or character) if italicization begins
+in the middle of a word.
+
+@doubleOpcode{lastletterital, dots, endital, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed after the last letter (or character) when italicization ends in
+the middle of a word.
+
+@opcode{singleletterital, dots}
+This braille indicator is used if only a single letter (or character)
+is italicized.
+
+@anchor{lenitalphrase-opcode}
+@opcode{lenitalphrase, number}
+if lastworditalbefore is used an italicized phrase is checked to see
+how many words it contains. If this number is less than or equal to
+the number given in the lenitalphrase opcode, the lastworditalbefore
+sign is placed in front of each word. If it is greater, the
+firstwordital indicator is placed before the first word and the
+lastworditalbefore indicator is placed after the last word. Note that
+if the firstwordital opcode is not used its indicator is made up by
+doubling the dot pattern given in the lastworditalbefore entry. For
+example:
+
+@example
+lenitalphrase 4
+@end example
+
+@opcode{firstwordbold, dots}
+This is the braille indicator to be placed before the first word of a
+bold phrase. For example:
+
+@example
+firstwordbold 456-456
+@end example
+
+@doubleOpcode{lastwordboldbefore, dots, boldsign, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed before the last word of a bold phrase. In addition, if
+firstwordbold is not used, this braille indicator is doubled and
+placed before the first word. Do not use lastwordboldbefore and
+lastwordboldafter in the same table. For example:
+
+@example
+lastwordboldbefore 456
+@end example
+
+@opcode{lastwordboldafter, dots}
+This is the braille indicator to be placed after the last word of a
+bold phrase. Do not use lastwordboldbefore and lastwordboldafter in
+the same table.
+
+@doubleOpcode{firstletterbold, dots, begbold, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed before the first letter (or character) if bold emphasis begins
+in the middle of a word.
+
+@doubleOpcode{lastletterbold, dots, endbold, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed after the last letter (or character) when bold emphasis ends in
+the middle of a word.
+
+@opcode{singleletterbold, dots}
+This braille indicator is used if only a single letter (or character)
+is in boldboldface.
+
+@opcode{lenboldphrase, number}
+if lastwordboldbefore is used a bold phrase is checked to see how many
+words it contains. If this number is less than or equal to the number
+given in the lenboldphrase opcode, the lastwordboldbefore sign is
+placed in front of each word. If it is greater, the firstwordbold
+indicator is placed before the first word and the lastwordboldbefore
+indicator is placed after the last word. Note that if the
+firstwordbold opcode is not used its indicator is made up by doubling
+the dot pattern given in the lastwordboldbefore entry.
+
+@opcode{firstwordunder, dots}
+This is the braille indicator to be placed before the first word of an
+underlined phrase.
+
+@doubleOpcode{lastwordunderbefore, dots, undersign, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed before the last word of an underlined phrase. In addition, if
+firstwordunder is not used, this braille indicator is doubled and
+placed before the first word.
+
+@opcode{lastwordunderafter, dots}
+This is the braille indicator to be placed after the last word of an
+underlined phrase.
+
+@doubleOpcode{firstletterunder, dots, begunder, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed before the first letter (or character) if underline emphasis
+begins in the middle of a word.
+
+@doubleOpcode{lastletterunder, dots, endunder, dots}
+These two opcodes are synonyms. This is the braille indicator to be
+placed after the last letter (or character) when underline emphasis
+ends in the middle of a word.
+
+@opcode{singleletterunder, dots}
+This braille indicator is used if only a single letter (or character)
+is underlined.
+
+@opcode{lenunderphrase, number}
+if lastwordunderbefore is used an underlined phrase is checked to see
+how many words it contains. If this number is less than or equal to
+the number given in the lenunderphrase opcode, the lastwordunderbefore
+sign is placed in front of each word. If it is greater, the
+firstwordunder indicator is placed before the first word and the
+lastwordunderbefore indicator is placed after the last word. Note that
+if the firstwordunder opcode is not used its indicator is made up by
+doubling the dot pattern given in the lastwordunderbefore entry.
+
+@opcode{begcomp, dots}
+This braille indicator is placed before a sequence of characters
+translated in computer braille, whether this sequence is indicated in
+the typeform parameter (@pxref{Programming with liblouis}) or inferred
+because it contains a subsequence specified by the
+@ref{compbrl-opcode,,compbrl opcode}.
+
+@opcode{endcomp, dots}
+This braille indicator is placed after a sequence of characters
+translated in computer braille, whether this sequence is indicated in
+the typeform parameter (@pxref{Programming with liblouis}) or inferred
+because it contains a subsequence specified by the
+@ref{compbrl-opcode,,compbrl opcode}.
+
+@end table
+
+@node Special Symbol Opcodes, Special Processing Opcodes, Emphasis Opcodes, 
How to Write Translation Tables
+@section Special Symbol Opcodes
+
+These opcodes define certain symbols, such as the decimal point, which
+require special treatment.
+
+@table @code
+@opcode{decpoint, character dots}
+This opcode defines the decimal point. The character operand must have
+only one character. For example, in @file{en-us-g1.ctb} we have: "decpoint .
+46".
+
+@opcode{hyphen, character dots}
+This opcode defines the hyphen, that is, the character used in
+compound words such as have-nots. The back-translator uses it to
+determine the end of individual words.
+
+@end table
+
+@node Special Processing Opcodes, Translation Opcodes, Special Symbol Opcodes, 
How to Write Translation Tables
+@section Special Processing Opcodes
+
+These opcodes cause special processing to be carried out.
+
+@table @code
+@opcode{capsnocont,}
+This opcode has no operands. If it is specified words or parts of
+words in all caps are not contracted. This is needed for languages
+such as Norwegian.
+
+@end table
+
+@node Translation Opcodes, Character-Class Opcodes, Special Processing 
Opcodes, How to Write Translation Tables
+@section Translation Opcodes
+
+These opcodes define the braille representations for character
+sequences. Each of them defines an entry within the contraction table.
+These entries may be defined in any order except, as noted below, when
+they define alternate representations for the same character sequence.
+
+Each of these opcodes specifies a condition under which the
+translation is legal, and each also has a characters operand and a
+dots operand. The text being translated is processed strictly from
+left to right, character by character, with the most eligible entry
+for each position being used. If there is more than one eligible entry
+for a given position in the text, then the one with the longest
+character string is used. If there is more than one eligible entry for
+the same character string, then the one defined first is is tested for
+legality first. (This is the only case in which the order of the
+entries makes a difference.)
+
+The characters operand is a sequence or string of characters preceded
+and followed by whitespace. Each character can be entered in the
+normal way, or it can be defined as a four-digit hexadecimal number
+preceded by "\x".
+
+The dots operand defines the braille representation for the characters
+operand. It may also be specified as an equals sign (@samp{=}). This
+means that the the default representation for each character
+(@pxref{Character-Definition Opcodes}) within the sequence is to be
+used.
+
+In what follows the word "word" means a sequence of one or more
+consecutive letters between spaces and/or punctuation marks.
+
+@table @code
+
+@anchor{compbrl-opcode}
+@doubleOpcode{compbrl, characters, literal, characters}
+These two opcodes are synonyms. If the characters are found within a
+block of text surrounded by whitespace the entire block is translated
+according to the default braille representations defined by the
+@ref{Character-Definition Opcodes} if 8-dot computer braille is
+enabled or according to the dot patterns given in the [46]comp6 opcode
+if 6-dot computer braille is enabled. For example: compbrl www
+translate URLs in computer braille
+
+@opcode{comp6, character dots}
+This opcode specifies the translation of characters in 6-dot computer
+braille. It is necessary because the translation of a single character
+may require more than one cell. The first operand must be a character
+with a decimal representation from 0 to 255 inclusive. The second
+operand may specify as many cells as necessary. The opcode is somewhat
+of a misnomer, since any dots, not just dots 1 through 6, can be
+specified. This even includes virtual dots.
+
+@opcode{nocont, characters}
+Like compbrl, except that the string is uncontracted. prepunc and
+postpunc rules are applied, however. this is useful for specifying
+that foreign words should not be contracted in an entire document.
+
+@opcode{replace, characters @{characters@}}
+Replace the first set of characters, no matter where they appear, with
+the second. Note that the second operand is @emph{NOT} a dot pattern.
+It is also optional. If it is omitted the character(s) in the first
+operand will be discarded. This is useful for ignoring characters. It
+is possible that the "ignored" characters may still affect the
+translation indirectly. Therefore, it is preferable to use @ref{The
+correct Opcode,, the correct opcode}.
+
+@anchor{always opcode}
+@opcode{always, characters dots}
+Replace the characters with the dot pattern no matter where they
+appear. Do @emph{NOT} use an entry such as "always a 1". Use the uplow,
+letter, etc. character definition opcodes instead. For example:
+
+@example
+always world 456-2456 unconditional translation
+@end example
+
+@opcode{repeated, characters dots}
+Replace the characters with the dot pattern no matter where they
+appear. Ignore any consecutive repetitions of the same character
+sequence. This is useful for shortening long strings of spaces or
+hyphens or periods. For example:
+
+@example
+repeated --- 36-36-36 shorten separator lines made with hyphens
+@end example
+
+@opcode{largesign, characters dots}
+Replace the characters with the dot pattern no matter where they
+appear. In addition, if two words defined as large signs follow each
+other, remove the space between them. For example, in en-us-g2.ctb the
+words "and" and "the" are both defined as large signs. Thus, in the
+phrase "the cat and the dog" the space would be deleted between "and"
+and "the", with the result "the cat and the dog". of course, "and" and
+"the" would be properly contracted. The term "largesign" is a bit of
+braille jargon that pleases braille experts.
+
+@opcode{word, characters dots}
+Replace the characters with the dot pattern if they are a word, that
+is, are surrounded by whitespace and/or punctuation.
+
+@opcode{syllable, characters dots}
+As its name indicates, this opcode defines a "syllable" which must be
+represented by exactly the dot patterns given. Contractions may not
+cross the boundaries of this "syllable" either from left or right. The
+character string defined by this opcode need not be a lexical
+syllable, though it usually will be. For example:
+
+@example
+syllable horse = sawhorse, horseradish
+@end example
+
+@anchor{nocross opcode}
+@opcode{nocross, characters dots}
+Replace the characters with the dot pattern if the characters are all
+in one syllable (do not cross a syllable boundary). For this opcode to
+work, a hyphenation table must be included. If this is not done,
+@code{nocross} behaves like the @ref{always opcode}. For example, if
+the English Grade 2 table is being used and the appropriate
+hyphenation table has been included "nocross sh 146" will cause the sh
+in "monkshood" not to be contracted.
+
+@opcode{joinword, characters dots}
+Replace the characters with the dot pattern if they are a word which
+is followed by whitespace and a letter. In addition remove the
+whitespace. For example, @file{en-us-g2.ctb} has "joinword to 235".
+This means that if the word "to" is followed by another word the
+contraction is to be used and the space is to be omitted. If these
+conditions are not met, the word is translated according to any other
+opcodes that may apply to it.
+
+@opcode{lowword, characters dots}
+Replace the characters with the dot pattern if they are a word
+preceded and followed by whitespace. No punctuation either before or
+after the word is allowed. The term "lowword" derives from the fact
+that in English these contractions are written in the lower part of
+the cell. For example:
+
+@example
+lowword were 2356
+@end example
+
+@opcode{contraction, characters}
+If you look at @file{en-us-g2.ctb} you will see that some words are
+actually contracted into some of their own letters. A famous example
+among braille transcribers is "also", which is contracted as "al". But
+this is also the name of a person. To take another example,
+"altogether" is contracted as "alt", but this is the abbreviation for
+the alternate key on a computer keyboard. Similarly "could" is
+contracted into "cd", but this is the abbreviation for compact disk.
+To prevent confusion in such cases, The letter sign (see the
+[49]letsign opcode) is placed before such letter combinations when
+they actually are abbreviations, not contractions. the contraction
+opcode tells the translator to do this.
+
+@opcode{sufword, characters dots}
+Replace the characters with the dot pattern if they are either a word
+or at the beginning of a word.
+
+@opcode{prfword, characters dots}
+Replace the characters with the dot pattern if they are either a word
+or at the end of a word.
+
+@opcode{begword, characters dots}
+Replace the characters with the dot pattern if they are at the
+beginning of a word.
+
+@opcode{begmidword, characters dots}
+Replace the characters with the dot pattern if they are either at the
+beginning or in the middle of a word.
+
+@opcode{midword, characters dots}
+Replace the characters with the dot pattern if they are in the middle
+of a word.
+
+@opcode{midendword, characters dots}
+Replace the characters with the dot pattern if they are either in the
+middle or at the end of a word.
+
+@opcode{endword, characters dots}
+Replace the characters with the dot pattern if they are at the end of
+a word.
+
+@opcode{partword, characters dots}
+Replace the characters with the dot pattern if the characters are
+anywhere in a word, that is, if they are proceeded or followed by a
+letter.
+
+@opcode{prepunc, characters dots}
+Replace the characters with the dot pattern if they are part of
+punctuation at the beginning of a word.
+
+@opcode{postpunc, characters dots}
+Replace the characters with the dot pattern if they are part of
+punctuation at the end of a word.
+
+@opcode{begnum, characters dots}
+Replace the characters with the dot pattern if they are at the
+beginning of a number, that is, before all its digits. For example, in
+@file{en-us-g1.ctb} we have "begnum # 4".
+
+@opcode{midnum, characters dots}
+Replace the characters with the dot pattern if they are in the middle
+of a number. For example, @file{en-us-g1.ctb} has "midnum . 46". This
+is because the decimal point has a different dot pattern than the
+period.
+
+@opcode{endnum, characters dots}
+Replace the characters with the dot pattern if they are at the end of
+a number. For example en-us-g1.ctb has "endnum th 1456". This handles
+things like 4th. A letter sign is @emph{NOT} inserted.
+
+@opcode{joinnum, characters dots}
+Replace the characters with the dot pattern. In addition, if
+whitespace and a number follows omit the whitespace.
+
+@end table
+
+@node Character-Class Opcodes, Swap Opcodes, Translation Opcodes, How to Write 
Translation Tables
+@section Character-Class Opcodes
+
+These opcodes define and use character classes. A character class
+associates a set of characters with a name. The name then refers to
+any character within the class. A character may belong to more than
+one class.
+
+The basic character classes correspond to the character definition
+opcodes, with the exception of uplow, which defines characters
+belonging to the two classes uppercase and lowercase. These classes
+are:
+
+@table @code
+@item space
+White-space characters such as blank and tab
+@item digit
+Numeric characters
+@item letter
+Both uppercase and lowercase alphabetic characters
+@item lowercase
+Lowercase alphabetic characters
+@item uppercase
+uppercase alphabetic characters
+@item punctuation
+Punctuation marks
+@item sign
+signs such as percent
+@item math
+Mathematical symbols
+@item litdigit
+literary digit
+@item undefined
+Not properly defined
+
+@end table
+
+The opcodes which define and use character classes are shown below.
+For examples see @file{fr-abrege.ctb}.
+
+@table @code
+
+@opcode{class, name characters}
+Define a new character class. The characters operand must be specified
+as a string. A character class may not be used until it has been
+defined.
+
+@opcode{after, class opcode ...}
+The specified opcode is further constrained in that the matched
+character sequence must be immediately preceded by a character
+belonging to the specified class. If this opcode is used more than
+once on the same line then the union of the characters in all the
+classes is used.
+
+@opcode{before, class opcode ...}
+The specified opcode is further constrained in that the matched
+character sequence must be immediately followed by a character
+belonging to the specified class. If this opcode is used more than
+once on the same line then the union of the characters in all the
+classes is used.
+
+@end table
+
+@node Swap Opcodes, The Context and Multipass Opcodes, Character-Class 
Opcodes, How to Write Translation Tables
+@section Swap Opcodes
+
+The swap opcodes are needed to tell the context, correct and multipass
+opcodes which dot patterns to swap for which characters. There are
+two, swapcd and swapdd. the first swaps dot patterns for characters.
+The second swaps dot patterns for dot patterns. The first is used in
+the context opcode and the second is used in the multipass opcodes.
+Dot patterns are separated by commas and may contain more than one
+cell.
+
+@table @code
+
+@findex swapcd
+@item swapcd name characters dots,dots,dots,...
+See above paragraph for explanation. For example:
+
+@example
+swapcd dropped 0123456789 356,2,23,...
+@end example
+
+@findex swapdd
+@item swapdd, name dots,dots,dots... dotpattern1,dotpattern2,dotpattern3,...
+The @code{swapdd} opcode defines substitutions for the multipass
+opcodes. In the second operand the dot patterns must be single cells,
+but in the third operand multi-cell dot patterns are allowed. This is
+because multi-cell patterns in the second operand would lead to
+ambiguities.
+
+@end table
+
+@node The Context and Multipass Opcodes, The correct Opcode, Swap Opcodes, How 
to Write Translation Tables
+@section The Context and Multipass Opcodes
+
+@table @code
+@anchor{context-opcode}
+@findex context
+@findex pass2
+@findex pass3
+@findex pass4
+@item context test action
+@itemx pass2 test action
+@itemx pass3 test action
+@itemx pass4 test action
+The context and multipass opcodes (pass2, pass3 and pass4) provide
+translation capabilities beyond those of the basic translation opcodes
+discussed previously. The multipass opcodes cause additional passes to
+be made over the string to be translated. The number after the word
+"pass" indicates in which fass the entry is to be applied. If no
+multipass opcodes are given, only the first translation pass is made.
+The context opcode is basically a multipass opcode for the first pass.
+It differs slightly from the multipass opcodes per se. The format of
+all these opcodes is:
+
+@example
+opcode test action
+@end example
+
+The test and action operands have suboperands. Each suboperand begins
+with a non-alphanumeric character and ends when another non-alphanumeric
+character is encountered. The suboperands and their initial characters
+are as follows.
+
+@table @kbd
+@item " (double quote) 
+a string of characters. This string must be terminated by another
+double quote. It may contain any characters. If a double quote is
+needed within the string it must be preceded by a backslash (@samp{\}).
+If a space is needed it must be represented by the escape sequence \s.
+This suboperand is valid only in the test part of the context opcode.
+
+@item @@ (at sign)
+a sequence of dot patterns. Cells are separated by hyphens as usual.
+This suboperand is not valid in the test part of the context opcode.
+
+@item $ (dollar sign) 
+a string of attributes, such as d for digit, l for letter, etc. More
+than one attribute can be given. If you wish to check characters with
+any attribute, use the letter a. Input characters are checked to see
+if they have at least one of the attributes. The attribute string can
+be followed by numbers specifying how many characters are to be
+checked. If no numbers are given, 1 is assumed. If two numbers
+separated by a hyphen are given, the input is checked to make sure
+that at least the first number of characters with the attributes are
+present, but no more than the second number. If only one number is
+present, then exactly that many characters must have the attributes. a
+period instead of the numbers indicates an indefinite number of
+characters. This suboperand is valid in all test parts but not in
+action parts.
+
+@item ! (exclamation point) 
+reverses the logical meaning of the suboperand which follows. For
+example, !$d is true only if the character is @emph{NOT} a digit. This
+suboperand is valid in test parts only.
+
+@item % (percent sign)
+the name of a class defined by the class opcode or the name of a swap
+set defined by the swap opcodes. Names may contain only letters and
+digits. The letters may be upper or lower-case. The case matters.
+Class names may be used in test parts only. Swap names are valid
+everywhere.
+
+@item _ (underscore) 
+Move backward. If a number follows, move backward that number of
+characters. the program never moves backward beyond the beginning of
+the input string. This suboperand is valid only in test parts.
+
+@item [ (left bracket) 
+start replacement here. This suboperand must always be paired with a
+right bracket and is valid only in test parts.
+
+@item ] (right bracket)
+end replacement here. This suboperand must always be paired with a
+left bracket and is valid only in test parts.
+
+@item # (number sign or crosshatch)
+test or set a variable. Variables are referred to by numbers 1 to 50,
+for example, #1, #2, #25. Variables may be set by one context or
+multipass opcode and tested by another. Thus, an operation that occurs
+at one place in a translation can tell an operation that occurs later
+about itself. This feature will be used in math translation, and it
+may also help to alleviate the need for new opcodes. This suboperand
+is valid everywhere.
+
+Variables are set in the action part. To set a variable use an
+expression like #1=1, #2=5, etc. Variables are also incremented and
+decremented in the action part with expressions like #1+, #3-, etc.
+These operators increment or decrement the variable by 1.
+
+Variables are tested in the test part with expressions like #1=2,
+#3<4, 5>6, etc.
+
+@item * (asterisk) 
+Copy the characters or dot patterns in the input within the
+replacement brackets into the output and discard anything else that
+may match. This feature is used, for example, for handling numeric
+subscripts in Nemeth. This suboperand is valid only in action parts.
+
+@item ? (question mark)
+Valid only in the action part. The characters to be replaced are
+simply ignored. That is, they are replaced with nothing.
+
+@end table
+
+@end table
+
+@node The correct Opcode, Miscellaneous Opcodes, The Context and Multipass 
Opcodes, How to Write Translation Tables
+@section The correct Opcode
+
+Because some input (such as that from an OCR program) may contain
+systematic errors, it is sometimes advantageous to use a
+pre-translation pass to remove them. The errors and their corrections
+are specified by the correct opcode. If there are no correct opcodes
+in a table, the pre-translation pass is not used. The format of the
+correct opcode is very similar to that of the @ref{context-opcode}.
+The only difference is that in the action part strings may be used and
+dot patterns may not be used. Some examples of correct opcode entries
+are:
+
+@example
+correct "\\" ? Eliminate backslashes
+correct "cornf" "comf" fix a common "scano"
+correct "cornm" "comm"
+correct "cornp" "comp"
+correct "*" ? Get rid of stray asterisks
+correct "|" ? ditto for vertical bars
+correct "\s?" "?" drop space before question mark
+@end example
+
+@node Miscellaneous Opcodes,  , The correct Opcode, How to Write Translation 
Tables
+@section Miscellaneous Opcodes
+
+@table @code
+@anchor{include-opcode}
+@opcode{include, filename}
+Read the file indicated by filename and incorporate or include its
+entries into the table. Included files can include other files, which
+can include other files, etc. for an example, see what files are
+included by the entry include @file{en-us-g1.ctb} in the table
+@file{en-us-g2.ctb}. If the included file is not in the same directory
+as the main table, use a full pathname for filename.
+
+@opcode{locale, characters}
+Not implemented, but recognized and ignored for backward
+compatibility.
+
+@opcode{display, character dots}
+Associates dot patterns with the characters which will be sent to a
+braille embosser, display or screen font. The character must be in the
+range 0-255 and the dots must specify a single cell. Here are some
+examples:
+
+@example
+display a 1 When the character a is sent to the embosser or display,
+it # will produce a dot 1.
+@end example
+
+@example
+display L 123 When the character L is sent to the display or embosser
+# produces dots 1-2-3.
+@end example
+
+The display opcode is optional. It is used when the embosser or
+display has a different mapping of characters to dot patterns than
+that given in @ref{Character-Definition Opcodes}. If used, display
+entries must proceed character-definition entries.
+
+@opcode{multind, dots opcode opcode ...}
+the multind opcode tells the back-translator that a sequence of
+braille cells represents more than one braille indicator. For example,
+in @file{en-us-g1.ctb} we have "multind 56-6 letsign capsign". The
+back-translator can generally handle single braille indicators, but it
+cannot apply them when they immediately follow each other. It
+recognizes the letter sign if it is followed by a letter and takes
+appropriate action. It also recognizes the capital sign if it is
+followed by a letter. But when there is a letter sign followed by a
+capital sign it fails to recognize the letter sign unless the sequence
+has been defined with multind. A multind entry may not contain a
+comment because liblouis would attempt to interpret it as an opcode.
+
+@end table
+
+@node Notes on Back-Translation, Key Index, How to Write Translation Tables, 
Top
+@chapter Notes on Back-Translation
+
+Back-translation is carried out by the function
+@code{lou_backTranslateString}. Its calling sequence is described in
+@ref{Programming with liblouis}. Tables containing no context,
+multipass or correct opcodes can be used for both forward and backward
+translation. If these opcodes are needed different tables will be
+required. @code{lou_backTranslateString} first performs pass4, if
+present, then pass3, then pass2, then the backtranslation, then
+corrections. Note that this is exactly the inverse of forward
+translation.
+
+@node Key Index,  , Notes on Back-Translation, Top
+@unnumbered Opcode Index
+
+@printindex fn
+
+@bye
+
+
+

Other related posts: