include it in the automake process and add a changelog entry. Also make sure html and txt versions are built on make dist. --- ChangeLog | 6 + doc/Makefile.am | 6 + doc/liblouis-guide.texi | 1653 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 1665 insertions(+), 0 deletions(-) create mode 100644 doc/liblouis-guide.texi
diff --git a/ChangeLog b/ChangeLog index 7227b4e..6756b55 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2008-11-12 Christian Egli <christian.egli@xxxxxxxx> + + * doc/liblouis-guide.texi: Added the guide in texinfo + * doc/Makefile.am (.texi.txt): Integrate the texinfo guide in the + build system. + John J. Boyer john.boyer@xxxxxxxxxxxxxxxx Release liblouis-1.3.8, June 16, 2008 diff --git a/doc/Makefile.am b/doc/Makefile.am index 16d2a4b..0c6d731 100644 --- a/doc/Makefile.am +++ b/doc/Makefile.am @@ -8,3 +8,9 @@ EXTRA_DIST = \ liblouis-guide.html \ liblouis-guide.txt +info_TEXINFOS = liblouis-guide.texi + +SUFFIXES = .txt + +.texi.txt: + $(MAKEINFO) --plaintext $< -o $@ diff --git a/doc/liblouis-guide.texi b/doc/liblouis-guide.texi new file mode 100644 index 0000000..7366a0f --- /dev/null +++ b/doc/liblouis-guide.texi @@ -0,0 +1,1653 @@ +\input texinfo +@c %**start of header +@setfilename liblouis-guide.info +@include version.texi +@settitle Liblouis Programmer's and User's Guide + +@dircategory Misc +@direntry +* Liblouis: (liblouis). A braille translator and back-translator +@end direntry + +@c Version and Contact Info +@set MAINTAINERSITE @uref{http://www.jjb-software.com/liblouis-guide.html,maintainers webpage} +@set AUTHOR John J. Boyer +@set MAINTAINER John J. Boyer +@set MAINTAINEREMAIL @email{john.boyer@xxxxxxxxxxxxxxxx} +@set MAINTAINERCONTACT @uref{mailto:john.boyer@xxxxxxxxxxxxxxxx,contact the maintainer} +@c %**end of header +@finalout + +@c Macro definitions + +@c Opcode. +@macro opcode{name, args} +@findex \name\ +@item \name\ \args\ +@end macro + +@macro doubleOpcode{name1, args1, name2, args2} +@findex \name1\ +@findex \name2\ +@item \name1\ \args1\ +@itemx \name2\ \args2\ +@end macro + +@copying +This manual is for liblouis (version @value{VERSION}, @value{UPDATED}), +a Braille Translation and Back-Translation Library derived from the +Linux screenreader @acronym{BRLTTY}. + +Copyright @copyright{} 1999-2008 by the @acronym{BRLTTY} Team. + +It is also Copyright @copyright{} 2004-2008 by ViewPlus Technologies, +Inc. @uref{www.viewplus.com} and JJB Software, Inc. +@uref{www.jjb-software.com}. + +@quotation +This file is free software; you can redistribute it and/or modify it +under the terms of the GNU Lesser (or library) General Public License +(LGPL) as published by the Free Software Foundation; either version 3, +or (at your option) any later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Lesser (or Library) General Public License LGPL for more details. + +You should have received a copy of the GNU Lesser (or Library) General +Public License (LGPL) along with this program; see the file COPYING. +If not, write to the Free Software Foundation, 51 Franklin Street, +Fifth Floor, Boston, MA 02110-1301, USA. +@end quotation +@end copying + +@titlepage +@title Liblouis Programmer's and User's Guide + +@subtitle for version @value{VERSION}, @value{UPDATED} +@author by John J. Boyer + +@c The following two commands start the copyright page. +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@c Output the table of contents at the beginning. +@contents + +@ifnottex +@node Top, Introduction, (dir), (dir) +@top Liblouis Programmer's and User's Guide + +@insertcopying +@end ifnottex + +@menu +* Introduction:: +* Programming with liblouis:: +* Test Programs:: +* How to Write Translation Tables:: +* Notes on Back-Translation:: +* Key Index:: + +@detailmenu + --- The Detailed Node Listing --- + +Programming with liblouis + +* Overview:: +* lou_version:: +* lou_translateString:: +* lou_translate:: +* lou_backTranslateString:: +* lou_backTranslate:: +* lou_hyphenate:: +* lou_logFileName:: +* lou_logPrint:: +* lou_getTable:: +* lou_readCharFromFile:: +* lou_free:: + +Test Programs + +* lou_checktable:: +* lou_allround:: +* lou_translate -f | -b tablename:: + +How to Write Translation Tables + +* Hyphenation Tables:: +* Character-Definition Opcodes:: +* Braille Indicator Opcodes:: +* Emphasis Opcodes:: +* Special Symbol Opcodes:: +* Special Processing Opcodes:: +* Translation Opcodes:: +* Character-Class Opcodes:: +* Swap Opcodes:: +* The Context and Multipass Opcodes:: +* The correct Opcode:: +* Miscellaneous Opcodes:: + +@end detailmenu +@end menu + +@node Introduction, Programming with liblouis, Top, Top +@chapter Introduction + +Liblouis is an open-source braille translator and back-translator +derived from the translation routines in the BRLTTY screenreader for +Linux. It has, however, gone far beyond these routines. It is named in +honor of Louis Braille. In Linux and Mac OSX it is a shared library, +and in Windows it is a DLL. For installation instructions see the +README file. Please report bugs and oddities to the maintainer, +@email{john.boyer@@jjb-software.com} + +This documentation is derived from Chapter 7 of the BRLTTY manual, but +it has been extensively rewritten to cover new features. + +Please read the following copyright and warranty information. Note +that this information also applies to all source code, tables and +other files in this distribution of liblouis. It applies similarly to +the sister library liblouisxml. + +This file is maintained by John J. Boyer +@email{john.boyer@@jjb-software.com}. + +Persons who wish to write translation tables but will not be +programming with liblouis may want to skip ahead to @ref{Test +Programs} or @ref{How to Write Translation Tables}. + +@node Programming with liblouis, Test Programs, Introduction, Top +@chapter Programming with liblouis + +@menu +* Overview:: +* lou_version:: +* lou_translateString:: +* lou_translate:: +* lou_backTranslateString:: +* lou_backTranslate:: +* lou_hyphenate:: +* lou_logFileName:: +* lou_logPrint:: +* lou_getTable:: +* lou_readCharFromFile:: +* lou_free:: +@end menu + +@node Overview, lou_version, Programming with liblouis, Programming with liblouis +@section Overview + +You use the liblouis library by calling eleven functions, +@code{lou_translateString}, @code{lou_backTranslateString}, +@code{lou_logFileName}, @code{lou_logPrint}, @code{lou_getTable}, +@code{lou_translate}, @code{lou_backTranslate}, @code{lou_hyphenate}, +@code{lou_readCharFromFile} and @code{lou_free}. These are described +below. The header file, @file{liblouis.h}, also contains brief +descriptions. Liblouis is written in straight C. It has just three +code modules, @file{compileTranslationTable.c}, +@file{lou_translateString.c} and @file{lou_backTranslateString.c}. In +addition, there are two header files, @file{liblouis.h}, which defines +the API, and @file{louis.h}, used only internally. The latter includes +@file{liblouis.h}. + +@file{compileTranslationTable.c} keeps track of all translation tables +which an application has used. It is called by the translation, +hyphenation and checking functions when they start. If a table has not +yet been compiled @file{compileTranslationTable.c} checks it for +correctness and compiles it into an efficient internal representation. +The main entry point is @code{lou_getTable}. Since it is the module +that keeps track of memory usage, it also contains the @code{lou_free} +function. In addition, it contains the @code{lou_logFileName} and +@code{lou_logPrint} functions, plus some utility functions which are +used by the other modules. + +By default, liblouis handles all characters internally as 16-bit +unsigned integers. It can be compiled for 32-bit characters as +explained below. The meanings of these integers are not hard-coded. +rather they are defined by the character-definition opcodes. However, +the standard printable characters, from decimal 32 to 126 are +recognized for the purpose of processing the opcodes. Hence, the +following definition is included in @file{liblouis.h}. It is correct +for computers with at least 32-bit processors. + +@example +#define widechar unsigned short int +@end example + +To make liblouis handle 32-bit Unicode simply remove the word +@code{short} in the above define. This will cause the translate and +back-translate functions to expect input in 32-bit form and to deliver +their output in this form. The input to the compiler (tables) is +unaffected except that two new escape sequences for 20-bit and 32-bit +characters are recognized. + +Here are the definitions of the eleven liblouis functions and their +parameters. They are given in terms of 16-bit Unicode. If liblouis has +been compiled for 32-bit Unicode simply read 32 instead of 16. + +@node lou_version, lou_translateString, Overview, Programming with liblouis +@section lou_version + +@example +char *lou_version () +@end example + +This function returns a pointer to a character string containing the +version of liblouis, plus other information, such as the release date +and perhaps notable changes. + +@node lou_translateString, lou_translate, lou_version, Programming with liblouis +@section lou_translateString + +@example +int lou_translateString ( + const char *const trantab, + const widechar *const inbuf, + int *inlen, + widechar *outbuf, + int *outlen, + char *typeform, + char *spacing, + int mode); +@end example + +This function takes a string of 16-bit Unicode characters in inbuf and +translates it into a string of 16-bit characters in outbuf. Each +16-bit character produces a particular dot pattern in one braille cell +when sent to an embosser or braille display or to a screen typefont. +Which 16-bit character represents which dot pattern is indicated by +the character-definition and display opcodes in the translation table. + +The trantab parameter points to a list of translation tables separated +by commas. If only one table is given, no comma should be used after +it. It is these tables which control just how the translation is made, +whether in Grade 2, Grade 1, or something else. The first table in the +list must be a full pathname, unless the tables are in the current +directory. The pathname is extracted up to the filename. The first +table is then compiled. The pathname is then added to the name of the +second table, which is compiled, and so on. The tables in a list are +all compiled into the same internal table. The list is then regarded +as the name of this table. As explained in the section @ref{How to +Write Translation Tables}, each table is a file which may be plain +text, big-endian Unicode or little-endian Unicode. A table (or list of +tables) is compiled into an internal representation the first time it +is used. Liblouis keeps track of which tables have been compiled. For +this reason, it is essential to call the lou_free function at the end +of your application to avoid memory leaks. Do @emph{NOT} call +@code{lou_free} after each translation. This will force liblouis to +compile the translation tables each time they are used, leading to +great inefficiency. + +Note that both the @code{*inlen} and @code{*outlen} parameters are +pointers to integers. When the function is called, these integers +contain the maximum input and output lengths, respectively. When it +returns, they are set to the actual lengths used. + +The typeform parameter is used to indicate italic type, boldface type, +computer braille, etc. It is a string of characters with the same +length as the input buffer pointed to by @code{*inbuf}. However, it is +used to pass back character-by-character results, so enough space must +be provided to match the @code{*outlen} parameter. Each character +indicates the typeform of the corresponding character in the input +buffer. The values are as follows: 0 plain-text; 1 italic; 2 bold; 4 +underline; 8 computer braille. These values can be added for multiple +emphasis. If this parameter is @code{NULL}, no checking for typeforms +is done. In addition, if this parameter is not @code{NULL}, it is set +on return to have an 8 at every position corresponding to a character +in outbuf which was defined to have a dot representation containing +dot 7, dot 8 or both, and to 0 otherwise. + +The spacing parameter is used to indicate differences in spacing +between the input string and the translated output string. It is also +of the same length as the string pointed to by @code{*inbuf}. If this +parameter is @code{NULL}, no spacing information is computed. + +The mode parameter specifies how the translation should be done. The +valid values of mode are listed in @file{liblouis.h}. They are all +powers of 2, so that a combined mode can be specified by adding up +different values. + +The function returns 1 if no errors were encountered and 0 if a +complete translation could not be done. + +@node lou_translate, lou_backTranslateString, lou_translateString, Programming with liblouis +@section lou_translate + +@example +int lou_translate ( + const char *const trantab, + const widechar * const inbuf, + int *inlen, widechar * outbuf, + int *outlen, + char *typeform, + char *spacing, + int *outputPos, + int *inputPos, + int *cursorPos, + int mode); +@end example + +This function adds the parameters `outputPos`, `inputPos` and +`cursorPos`, to facilitate use in screenreader programs. The +`outputPos` parameter must point to an array of integers with at least +outlen elements. On return, this array will contain the position in +inbuf corresponding to each output position. Similarly, `inputPos` +must point to an array of integers of at least inlen elements. On +return, this array will contain the position in outbuf corresponding +to each position in `inbuf`. `cursorPos` must point to an integer +containing the position of the cursor in the input. On return, it will +contain the cursor position in the output. Any parameter after outlen +may be @code{NULL}. In this case, the actions corresponding to it will not be +carried out. The mode parameter, however, must be present and must be +an integer, not a pointer to an integer. If the `compbrlAtCursor` bit +is set in the mode parameter the space-bounded characters containing +the cursor will be translated in computer braille. + +@node lou_backTranslateString, lou_backTranslate, lou_translate, Programming with liblouis +@section lou_backTranslateString + +@example +int lou_backTranslateString ( + const char *const trantab, + const widechar *const inbuf, + int *inlen, + widechar *outbuf, + int *outlen, + char *typeform, + char *spacing, + int mode); +@end example + +This is exactly the opposite of @code{lou_translateString}. +@code{inbuf} is a string of 16-bit Unicode characters representing +braille. @code{outbuf} will contain a string of 16--bit Unicode +characters. @code{typeform} will indicate any emphasis found in the +input string, while @code{spacing} will indicate any differences in +spacing between the input and output strings. The @code{typeform} and +@code{spacing} parameters may be @code{NULL} if this information is +not needed. @code{mode} again specifies how the back-translation +should be done. + +@node lou_backTranslate, lou_hyphenate, lou_backTranslateString, Programming with liblouis +@section lou_backTranslate + +@example +int lou_backTranslate ( + const char *const trantab, + const widechar *const inbufx, + int *inlen, + widechar * outbuf, + int *outlen, + char *typeform, + char *spacing, + int *outputPos, + int *inputPos, + int *cursorPos, + int mode); +@end example + +This function is exactly the inverse of @code{lou_translate}. + +@node lou_hyphenate, lou_logFileName, lou_backTranslate, Programming with liblouis +@section lou_hyphenate + +@example +int lou_hyphenate ( + const char *const trantab, + const widechar * const inbuf, + int inlen, + char *hyphens, + int mode); +@end example + +This function looks at the characters in @code{inbuf} and if it finds +a sequence of letters attempts to hyphenate it as a word. Leading and +trailing punctuation marks are ignored. The table named by the +@code{trantab} parameter must contain a hyphenation table. If it does +not, the function does nothing. @code{inlen} is the length of the +character string in @code{inbuf}. @code{hyphens} is an array of +characters and must be of size @code{inlen}. If hyphenation is +successful it will have a 1 at the beginning of each syllable and a 0 +elsewhere. If the @code{mode} parameter is 0 @code{inbuf} is assumed +to contain untranslated characters. Any nonzero value means that +@code{inbuf} contains a translation. In this case, it is +back-translated, hyphenation is performed, and it is retranslated so +that the hyphens can be placed correctly. The @code{lou_translate} and +@code{lou_backTranslate} functions are used in this process. +@code{lou_hyphenate} returns 1 if hyphenation was successful and 0 +otherwise. In the latter case, the contents of the @code{hyphens} +parameter are undefined. This function was provided for use in +liblouisxml. + +@node lou_logFileName, lou_logPrint, lou_hyphenate, Programming with liblouis +@section lou_logFileName + +@example +void lou_logFileName (char *fileName); +@end example + +This function is used when it is not convenient either to let messages +be printed on stderr or to use redirection, as when liblouis is used +in a GUI application or in liblouisxml. Any error messages generated +will be printed to the file given in this call. The entire pathname of +the file must be given. + +@node lou_logPrint, lou_getTable, lou_logFileName, Programming with liblouis +@section lou_logPrint + +@example +void lou_logPrint (char *format, ...); +@end example + +This function is called like @code{fprint}. It can be used by other +libraries to print messages to the file specified by the call to +@code{lou_logFileName}. In particular, it is used by the companion +library liblouisxml. + +@node lou_getTable, lou_readCharFromFile, lou_logPrint, Programming with liblouis +@section lou_getTable + +@example +void *lou_getTable (char *tablelist); +@end example + +@code{tablelist} is a list of names of table files separated by +commas, as explained previously. If no errors are found this function +returns a pointer to the compiled table. If errors are found messages +are printed to the log file, which is stderr unless a different +filename has been given using the @code{lou_logFileName} function. +Errors result in a @code{NULL} pointer being returned. + +@node lou_readCharFromFile, lou_free, lou_getTable, Programming with liblouis +@section lou_readCharFromFile + +@example +int lou_readCharFromFile (const char *fileName, int *mode); +@end example + +This function is provided for situations where it is necessary to read +a file which may contain little-endian or big-endian 16-bit Unicode +characters or ASCII8 characters. The return value is a little-endian +character, encoded as an integer. The @code{fileName} parameter is the +name of the file to be read. The @code{mode} parameter is a pointer to +an integer which must be set to 1 on the first call. After that, the +function takes care of it. On end-of-file the function returns +@code{EOF}. + +@node lou_free, , lou_readCharFromFile, Programming with liblouis +@section lou_free + +@example +void lou_free (); +@end example + +This function should be called at the end of the application to free +all memory allocated by liblouis. Failure to do so will result in +memory leaks. Do @emph{NOT} call @code{lou_free} after each +translation. This will force liblouis to compile the translation +tables every time they are used, resulting in great inefficiency. + +@node Test Programs, How to Write Translation Tables, Programming with liblouis, Top +@chapter Test Programs + +Three test programs are provided as part of the liblouis package. They +are intended for testing liblouis and for debugging tables. None of +them is suitable for braille transcription. An application that can be +used for transcription is xml2brl, which is part of the liblouisxml +package. The source code of the test programs can be studied to learn +how to use the liblouis library and they can be used to perform the +following functions. + +@menu +* lou_checktable:: +* lou_allround:: +* lou_translate -f | -b tablename:: +@end menu + +@node lou_checktable, lou_allround, Test Programs, Test Programs +@section lou_checktable + +To use this program type @kbd{lou_checktable} followed by a space and +the name of a table. If the table contains errors, appropriate +messages will be displayed. If there are no errors the message +@samp{no errors found.} will be shown. + +@node lou_allround, lou_translate -f | -b tablename, lou_checktable, Test Programs +@section lou_allround + +This program tests every capability of the liblouis library. It is +completely interactive. To start it, type @kbd{lou_allround}, enter. +You will see a few lines telling you how to use the program. Pressing +one of the letters in parentheses and then enter will take you to a +message asking for more information or for the answer to a yes/no +question. Typing the letter @samp{r} and then @key{RET} will take you +to a screen where you can enter a line to be processed by the library +and then view the results. + +@node lou_translate -f | -b tablename, , lou_allround, Test Programs +@section lou_translate -f | -b tablename + +This program translates whatever is on the standard input unit and +prints it on the standard output unit. It is intended for large-scale +testing of the accuracy of translation and back-translation. The first +argument must be @option{-f} for forward translation or @option{-b} for +backward translation. To use it to translate or back-translate a file +use a line like + +@kbd{<liblouis-guide.txt ./lou_translate -f en-us-g2.ctb >testtrans} + +@node How to Write Translation Tables, Notes on Back-Translation, Test Programs, Top +@chapter How to Write Translation Tables + +Several translation (contraction) tables have already been made up. +They are included in this distribution and should be studied as part +of the documentation. The most helpful are listed in the following +table: + +@table @file +@item chardefs.cti +Character definitions for U.S. tables +@item compress.ctb +Remove excessive white-space +@item en-us-g1.ctb +Uncontracted American English +@item en-us-g2.ctb +Contracted or Grade 2 American English +@item fr-integral.ctb +Uncontracted Unified French +@item fr-abrege.ctb +Contracted Unified French +@item french.dis +display entries for french character to braille cells +@item text.nab.dis +North American characters to cells associations + +@end table + +The names used for files containing translation tables are completely +arbitrary. They are not interpreted in any way by the translator. +Contraction tables may be 8-bit ASCII files, 16-bit big-endian Unicode +files or 16-bit little-endian Unicode files. Blank lines are ignored. +Any leading and trailing white-space (any number of blanks and/or +tabs) is ignored. Lines which begin with a number sign or hatch mark +(@samp{#}) are ignored, i.e. they are comments. If the number sign is +not the first non-blank character in the line, it is treated as an +ordinary character. Lines which are not blank or comments define table +entries. The general format of a table entry is: + +@example +opcode operands comments +@end example + +Table entries may not be split between lines. The opcode is a mnemonic +that specifies what the entry does. The operands may be character +sequences, braille dot patterns or occasionally something else. They +are described for each opcode. With some exceptions, opcodes expect a +certain number of operands. Any text on the line after the last +operand is ignored, and may be a comment. A few opcodes accept a +variable number of operands. In this case a number sign begins a +comment unless it is preceded by a backslash (@samp{\}). For a list of +opcodes, with a link to each one, see [34]Index of opcodes + +Here are some examples of table entries. + +@example +# This is a comment. +always world 456-2456 A word and the dot pattern of its contraction +@end example + +Most opcodes have both a "characters" operand and a "dots" operand, +though some have only one and a few have other types. + +The characters operand consists of any combination of characters and +escape sequences proceeded and followed by whitespace. Escape +sequences are used to represent difficult characters. They begin with +a backslash (`\`). They are: + +@table @kbd +@item \\ +backslash +@item \f +form feed +@item \n +new line +@item \r +carriage return +@item \s +blank (space) +@item \t +horizontal tab +@item \v +vertical tab +@item \e +"escape" character (hex 1b, dec 27) +@item \xhhhh +4-digit hexadecimal value of a character + +@end table + +If liblouis has been compiled for 32-bit Unicode the following are +also recognized. + +@table @kbd +@item \xhhhhh +5-digit (20 bit) character +@item \xhhhhhhhh +Full 32-bit value. + +@end table + +The dots operand is a braille dot pattern. The real braille dots, 1 +through 8, must be specified with their standard numbers. liblouis +recognizes "virtual dots," which are used for special purposes, such +as distinguishing accent marks. There are seven virtual dots. They are +specified by the number 9 and the letters a through f. For a +multi-cell dot pattern, the cell specifications must be separated from +one another by a dash (@samp{-}). For example, the contraction for the +English word lord (the letter l preceded by dot 5) would be specified +as 5-123. A space may be specified with the special dot number 0. + +An opcode which is helpful in writing translation tables is +@code{include}. Its format is: + +@example +include filename +@end example + +It reads the file indicated by filename and incorporates or includes +its entries into the table. Included files can include other files, +which can include other files, etc. for an example, see what files are +included by the entry include @file{en-us-g1.ctb} in the table +@file{en-us-g2.ctb}. If the included file is not in the same directory +as the main table, use a full pathname for filename. + +The order of the various types of opcodes or table entries is +important. Character-definition opcodes should come first. However, if +the optional @code{display} opcode is used (See [35]the display +Opcode) it should precede character-definition opcodes. +Braille-indicator opcodes should come next. Translation opcodes should +follow. The @code{context} opcode is a translation opcode, even though +it is considered along with the multipass opcodes. These latter should +follow the translation opcodes. the @code{correct} opcode can be used +anywhere after the character-definition opcodes, but it is probably a +good idea to group all @code{correct} opcodes together. The +@code{include} opcode can be used anywhere, but the order of entries +in the combined table must conform to the order given above. Within +each type of opcode, the order of entries is generally unimportant. +Thus the translation entries can be grouped alphabetically or in any +other order that is convenient. + +@menu +* Hyphenation Tables:: +* Character-Definition Opcodes:: +* Braille Indicator Opcodes:: +* Emphasis Opcodes:: +* Special Symbol Opcodes:: +* Special Processing Opcodes:: +* Translation Opcodes:: +* Character-Class Opcodes:: +* Swap Opcodes:: +* The Context and Multipass Opcodes:: +* The correct Opcode:: +* Miscellaneous Opcodes:: +@end menu + +@node Hyphenation Tables, Character-Definition Opcodes, How to Write Translation Tables, How to Write Translation Tables +@section Hyphenation Tables + +Hyphenation tables are necessary to make opcodes such as +@ref{nocross opcode} function properly. There are no opcodes for +hyphenation table entries because these tables have a special format. +Therefore, they cannot be specified as part of an ordinary table. +Rather, they must be included using the @ref{include-opcode}. +Hyphenation tables must follow character definitions. For an example +of a hyphenation table, see @file{hyph_en_US.dic}. + +@node Character-Definition Opcodes, Braille Indicator Opcodes, Hyphenation Tables, How to Write Translation Tables +@section Character-Definition Opcodes + +These opcodes are needed to define attributes such as digit, +punctuation, letter, etc. for all characters and their dot patterns. +liblouis has no built-in character definitions, but such definitions +are essential to the operation of the context opcode, the correct +opcode, the multipass opcodes and the back-translator. If the dot +pattern is a single cell, it is used to define the mapping between dot +patterns and characters, unless a display opcode for that +character-dot-pattern pair has been used previously. If only a +single-cell dot pattern has been given for a character, that dot +pattern is defined with the character's own attributes. If more than +one cell is given and some of them have not previously been defined as +single cells, the undefined cells are entered into the dots table with +the undefined attribute. This is done for backward compatibility with +old tables, but it may cause problems with the above opcodes or +back-translation. For this reason, every single-cell dot pattern +should be defined before it is used in a multi-cell character +representation. The best way to do this is to use the 8-dot computer +braille representation for the particular braille code. If a character +or dot pattern used in any rule, except those with the display, +repeated or replace opcodes, is not defined by one of the +character-definition opcodes, liblouis will give an error message and +refuse to continue until the problem is fixed. If the translator or +back-translator encounters an undefined character in its input it +produces a succinct error indication in its output, and the character +is treated as a space. + +@table @code +@opcode{space, character dots} +Defines a character as a space and also defines the dot pattern as +such. for example: + +@example +space \s 0 \s is the escape sequence for blank; 0 means no dots. +@end example + +@opcode{punctuation, character dots} +Associates a punctuation mark in the particular language with a +braille representation and defines the character and dot pattern as +punctuation. For example: + +@example +punctuation . 46 dot pattern for period in NAB computer braille +@end example + +@opcode{digit, character dots} +Associates a digit with a dot pattern and defines the character as a +digit. For example: + +@example +digit 0 356 NAB computer braille +@end example + +@opcode{uplow, characters dots@{,dots@}} +The characters operand must be a pair of letters, of which the first +is uppercase and the second lowercase. The first dots suboperand +indicates the dot pattern for the upper-case letter. It may have more +than one cell. The second dots suboperand must be separated from the +first by a comma and is optional, as indicated by the square brackets. +If present, it indicates the dot pattern for the lower-case letter. It +may also have more than one cell. If the second dots suboperand is not +present the first is used for the lower-case letter as well as the +upper-case letter. This opcode is needed because not all languages +follow a consistent pattern in assigning Unicode codes to upper and +lower case letters. It should be used even for languages that do. The +distinction is important in the forward translator. for example: + +@example +uplow Aa 1 +@end example + +@opcode{letter, character dots} +Associates a letter in the language with a braille representation and +defines the character as a letter. This is intended for letters which +are neither uppercase nor lowercase. + +@opcode{lowercase, character dots} +Associates a character with a dot pattern and defines the character as +a lowercase letter. Both the character and the dot pattern have the +attributes lowercase and letter. + +@opcode{uppercase, character dots} +Associates a character with a dot pattern and defines the character as +an uppercase letter. Both the character and the dot pattern have the +attributes uppercase and letter. Lowercase and uppercase should be +used when a letter has only one case. Otherwise use "uplow". + +@opcode{litdigit, digit dots} +Associates a digit with the dot pattern which should be used to +represent it in literary texts. For example: + +@example +litdigit 0 245 +litdigit 1 1 +@end example + +@opcode{sign, character dots} +Associates a character with a dot pattern and defines both as a sign. +This opcode should be used for things like at sign, percent, dollar +sign, etc. Do not use it to define ordinary punctuation such as period +and comma. For example: + +@example +sign % 4-25-1234 literary percent sign +@end example + +@opcode{math, character dots} +Associates a character and a dot pattern and defines them as a +mathematical symbol. It should be used for less than, greater than, +equals, plus, etc. For example: + +@example +math + 346 plus +@end example + +@end table + +@node Braille Indicator Opcodes, Emphasis Opcodes, Character-Definition Opcodes, How to Write Translation Tables +@section Braille Indicator Opcodes + +Braille indicators are dot patterns which are inserted into the +braille text to indicate such things as capitalization, italic type, +computer braille, etc. The opcodes which define them are followed only +by a dot pattern, which may be one or more cells. + +@table @code +@opcode{capsign, dots} +The dot pattern which indicates capitalization of a single letter. In +English, this is dot 6. for example: + +@example +capsign 6 +@end example + +@opcode{begcaps, dots} +The dot pattern which begins a block of capital letters. For example: + +@example +begcaps 6-6 +@end example + +@opcode{endcaps, dots} +The dot pattern which ends a block of capital letters within a word. +For example: + +@example +endcaps 6-3 +@end example + +@opcode{letsign, dots} +This indicator is needed in Grade 2 to show that a single letter is +not a contraction. It is also used when an abbreviation happens to be +a sequence of letters that is the same as a contraction. For example: + +@example +letsign 56 +@end example + +@opcode{noletsign, letters} +The letters in the operand will not be proceeded by a letter sign. +More than one noletsign opcode can be used. This is equivalent to a +single entry containing all the letters. In addition, if a single +letter, such as "a" in English, is defined as a word or largesign, it +will be treated as though it had also been specified in a noletsign +entry. + +@opcode{noletsignbefore, characters} +If any of the characters proceeds a single letter without a space a +letter sign is not used. By default the characters apostrophe and +period have this property. Use of a noletsignbefore entry cancels the +defaults. If more than one noletsignbefore entry is used, the +characters in all entries are combined. + +@opcode{noletsignafter, characters} +If any of the characters follows a single letter without a space a +letter sign is not used. By default the characters apostrophe and +period have this property. Use of a noletsignafter entry cancels the +defaults. If more than one noletsignafter entry is used the characters +in all entries are combined. + +@opcode{numsign, dots} +The translator inserts this indicator before numbers made up of digits +defined with the litdigit opcode to show that they are a number and +not letters or some other symbols. For example: + +@example +numsign 3456 +@end example + +@end table + +@node Emphasis Opcodes, Special Symbol Opcodes, Braille Indicator Opcodes, How to Write Translation Tables +@section Emphasis Opcodes + +these also define braille indicators, but they require more +explanation. There are four sets, for italic, bold, underline and +computer braille. In each of the first three sets there are seven +opcodes, for use before the first word of a phrase, for use before the +last word, for use after the last word, for use before the first +letter (or character) if emphasis starts in the middle of a word, for +use after the last letter (or character) if emphasis ends in the +middle of a word, before a single letter (or character), and to +specify the length of a phrase to which the first-word and +last-word-before indicators apply. This rather elaborate set of +emphasis opcodes was devised to try to meet all contingencies. It is +unlikely that a translation table will contain all of them. The +translator checks for their presence. If they are present, it first +looks to see if the single-letter indicator should be used. Then it +looks at the word (or phrase) indicators and finally at the +multi-letter indicators. + +The translator will apply up to two emphasis indicators to each phrase +or string of characters, depending on what the typeform parameter in +its calling sequence indicates (@pxref{Programming with liblouis}). + +For computer braille there are only two braille indicators, for the +beginning and end of a sequence of characters to be rendered in +computer braille. Such a sequence may also have other emphasis. The +computer braille indicators are applied not only when computer braille +is indicated in the typeform parameter, but also when a sequence of +characters is determined to be computer braille because it contains a +subsequence defined by the compbrl or literal opcodes. + +Here are the various emphasis opcodes. + +@table @code + +@opcode{firstwordital, dots} +This is the braille indicator to be placed before the first word of an +italicized phrase that is longer than the value given in +lenitalphrase. For example: + +@example +firstwordital 46-46 English indicator +@end example + +@doubleOpcode{lastworditalbefore, dots, italsign, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed before the last word of an italicized phrase. In addition, if +firstwordital is not used, this braille indicator is doubled and +placed before the first word. do not use lastworditalbefore and +lastworditalafter in the same table. For example: + +@example +lastworditalbefore 4-6 +@end example + +@opcode{lastworditalafter, dots} +This is the braille indicator to be placed after the last word of an +italicized phrase. Do not use lastworditalbefore and lastworditalafter +in the same table. @xref{lenitalphrase-opcode,, the lenitalphrase opcode}. + +@doubleOpcode{firstletterital, dots,begital, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed before the first letter (or character) if italicization begins +in the middle of a word. + +@doubleOpcode{lastletterital, dots, endital, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed after the last letter (or character) when italicization ends in +the middle of a word. + +@opcode{singleletterital, dots} +This braille indicator is used if only a single letter (or character) +is italicized. + +@anchor{lenitalphrase-opcode} +@opcode{lenitalphrase, number} +if lastworditalbefore is used an italicized phrase is checked to see +how many words it contains. If this number is less than or equal to +the number given in the lenitalphrase opcode, the lastworditalbefore +sign is placed in front of each word. If it is greater, the +firstwordital indicator is placed before the first word and the +lastworditalbefore indicator is placed after the last word. Note that +if the firstwordital opcode is not used its indicator is made up by +doubling the dot pattern given in the lastworditalbefore entry. For +example: + +@example +lenitalphrase 4 +@end example + +@opcode{firstwordbold, dots} +This is the braille indicator to be placed before the first word of a +bold phrase. For example: + +@example +firstwordbold 456-456 +@end example + +@doubleOpcode{lastwordboldbefore, dots, boldsign, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed before the last word of a bold phrase. In addition, if +firstwordbold is not used, this braille indicator is doubled and +placed before the first word. Do not use lastwordboldbefore and +lastwordboldafter in the same table. For example: + +@example +lastwordboldbefore 456 +@end example + +@opcode{lastwordboldafter, dots} +This is the braille indicator to be placed after the last word of a +bold phrase. Do not use lastwordboldbefore and lastwordboldafter in +the same table. + +@doubleOpcode{firstletterbold, dots, begbold, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed before the first letter (or character) if bold emphasis begins +in the middle of a word. + +@doubleOpcode{lastletterbold, dots, endbold, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed after the last letter (or character) when bold emphasis ends in +the middle of a word. + +@opcode{singleletterbold, dots} +This braille indicator is used if only a single letter (or character) +is in boldboldface. + +@opcode{lenboldphrase, number} +if lastwordboldbefore is used a bold phrase is checked to see how many +words it contains. If this number is less than or equal to the number +given in the lenboldphrase opcode, the lastwordboldbefore sign is +placed in front of each word. If it is greater, the firstwordbold +indicator is placed before the first word and the lastwordboldbefore +indicator is placed after the last word. Note that if the +firstwordbold opcode is not used its indicator is made up by doubling +the dot pattern given in the lastwordboldbefore entry. + +@opcode{firstwordunder, dots} +This is the braille indicator to be placed before the first word of an +underlined phrase. + +@doubleOpcode{lastwordunderbefore, dots, undersign, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed before the last word of an underlined phrase. In addition, if +firstwordunder is not used, this braille indicator is doubled and +placed before the first word. + +@opcode{lastwordunderafter, dots} +This is the braille indicator to be placed after the last word of an +underlined phrase. + +@doubleOpcode{firstletterunder, dots, begunder, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed before the first letter (or character) if underline emphasis +begins in the middle of a word. + +@doubleOpcode{lastletterunder, dots, endunder, dots} +These two opcodes are synonyms. This is the braille indicator to be +placed after the last letter (or character) when underline emphasis +ends in the middle of a word. + +@opcode{singleletterunder, dots} +This braille indicator is used if only a single letter (or character) +is underlined. + +@opcode{lenunderphrase, number} +if lastwordunderbefore is used an underlined phrase is checked to see +how many words it contains. If this number is less than or equal to +the number given in the lenunderphrase opcode, the lastwordunderbefore +sign is placed in front of each word. If it is greater, the +firstwordunder indicator is placed before the first word and the +lastwordunderbefore indicator is placed after the last word. Note that +if the firstwordunder opcode is not used its indicator is made up by +doubling the dot pattern given in the lastwordunderbefore entry. + +@opcode{begcomp, dots} +This braille indicator is placed before a sequence of characters +translated in computer braille, whether this sequence is indicated in +the typeform parameter (@pxref{Programming with liblouis}) or inferred +because it contains a subsequence specified by the +@ref{compbrl-opcode,,compbrl opcode}. + +@opcode{endcomp, dots} +This braille indicator is placed after a sequence of characters +translated in computer braille, whether this sequence is indicated in +the typeform parameter (@pxref{Programming with liblouis}) or inferred +because it contains a subsequence specified by the +@ref{compbrl-opcode,,compbrl opcode}. + +@end table + +@node Special Symbol Opcodes, Special Processing Opcodes, Emphasis Opcodes, How to Write Translation Tables +@section Special Symbol Opcodes + +These opcodes define certain symbols, such as the decimal point, which +require special treatment. + +@table @code +@opcode{decpoint, character dots} +This opcode defines the decimal point. The character operand must have +only one character. For example, in @file{en-us-g1.ctb} we have: "decpoint . +46". + +@opcode{hyphen, character dots} +This opcode defines the hyphen, that is, the character used in +compound words such as have-nots. The back-translator uses it to +determine the end of individual words. + +@end table + +@node Special Processing Opcodes, Translation Opcodes, Special Symbol Opcodes, How to Write Translation Tables +@section Special Processing Opcodes + +These opcodes cause special processing to be carried out. + +@table @code +@opcode{capsnocont,} +This opcode has no operands. If it is specified words or parts of +words in all caps are not contracted. This is needed for languages +such as Norwegian. + +@end table + +@node Translation Opcodes, Character-Class Opcodes, Special Processing Opcodes, How to Write Translation Tables +@section Translation Opcodes + +These opcodes define the braille representations for character +sequences. Each of them defines an entry within the contraction table. +These entries may be defined in any order except, as noted below, when +they define alternate representations for the same character sequence. + +Each of these opcodes specifies a condition under which the +translation is legal, and each also has a characters operand and a +dots operand. The text being translated is processed strictly from +left to right, character by character, with the most eligible entry +for each position being used. If there is more than one eligible entry +for a given position in the text, then the one with the longest +character string is used. If there is more than one eligible entry for +the same character string, then the one defined first is is tested for +legality first. (This is the only case in which the order of the +entries makes a difference.) + +The characters operand is a sequence or string of characters preceded +and followed by whitespace. Each character can be entered in the +normal way, or it can be defined as a four-digit hexadecimal number +preceded by "\x". + +The dots operand defines the braille representation for the characters +operand. It may also be specified as an equals sign (@samp{=}). This +means that the the default representation for each character +(@pxref{Character-Definition Opcodes}) within the sequence is to be +used. + +In what follows the word "word" means a sequence of one or more +consecutive letters between spaces and/or punctuation marks. + +@table @code + +@anchor{compbrl-opcode} +@doubleOpcode{compbrl, characters, literal, characters} +These two opcodes are synonyms. If the characters are found within a +block of text surrounded by whitespace the entire block is translated +according to the default braille representations defined by the +@ref{Character-Definition Opcodes} if 8-dot computer braille is +enabled or according to the dot patterns given in the [46]comp6 opcode +if 6-dot computer braille is enabled. For example: compbrl www +translate URLs in computer braille + +@opcode{comp6, character dots} +This opcode specifies the translation of characters in 6-dot computer +braille. It is necessary because the translation of a single character +may require more than one cell. The first operand must be a character +with a decimal representation from 0 to 255 inclusive. The second +operand may specify as many cells as necessary. The opcode is somewhat +of a misnomer, since any dots, not just dots 1 through 6, can be +specified. This even includes virtual dots. + +@opcode{nocont, characters} +Like compbrl, except that the string is uncontracted. prepunc and +postpunc rules are applied, however. this is useful for specifying +that foreign words should not be contracted in an entire document. + +@opcode{replace, characters @{characters@}} +Replace the first set of characters, no matter where they appear, with +the second. Note that the second operand is @emph{NOT} a dot pattern. +It is also optional. If it is omitted the character(s) in the first +operand will be discarded. This is useful for ignoring characters. It +is possible that the "ignored" characters may still affect the +translation indirectly. Therefore, it is preferable to use @ref{The +correct Opcode,, the correct opcode}. + +@anchor{always opcode} +@opcode{always, characters dots} +Replace the characters with the dot pattern no matter where they +appear. Do @emph{NOT} use an entry such as "always a 1". Use the uplow, +letter, etc. character definition opcodes instead. For example: + +@example +always world 456-2456 unconditional translation +@end example + +@opcode{repeated, characters dots} +Replace the characters with the dot pattern no matter where they +appear. Ignore any consecutive repetitions of the same character +sequence. This is useful for shortening long strings of spaces or +hyphens or periods. For example: + +@example +repeated --- 36-36-36 shorten separator lines made with hyphens +@end example + +@opcode{largesign, characters dots} +Replace the characters with the dot pattern no matter where they +appear. In addition, if two words defined as large signs follow each +other, remove the space between them. For example, in en-us-g2.ctb the +words "and" and "the" are both defined as large signs. Thus, in the +phrase "the cat and the dog" the space would be deleted between "and" +and "the", with the result "the cat and the dog". of course, "and" and +"the" would be properly contracted. The term "largesign" is a bit of +braille jargon that pleases braille experts. + +@opcode{word, characters dots} +Replace the characters with the dot pattern if they are a word, that +is, are surrounded by whitespace and/or punctuation. + +@opcode{syllable, characters dots} +As its name indicates, this opcode defines a "syllable" which must be +represented by exactly the dot patterns given. Contractions may not +cross the boundaries of this "syllable" either from left or right. The +character string defined by this opcode need not be a lexical +syllable, though it usually will be. For example: + +@example +syllable horse = sawhorse, horseradish +@end example + +@anchor{nocross opcode} +@opcode{nocross, characters dots} +Replace the characters with the dot pattern if the characters are all +in one syllable (do not cross a syllable boundary). For this opcode to +work, a hyphenation table must be included. If this is not done, +@code{nocross} behaves like the @ref{always opcode}. For example, if +the English Grade 2 table is being used and the appropriate +hyphenation table has been included "nocross sh 146" will cause the sh +in "monkshood" not to be contracted. + +@opcode{joinword, characters dots} +Replace the characters with the dot pattern if they are a word which +is followed by whitespace and a letter. In addition remove the +whitespace. For example, @file{en-us-g2.ctb} has "joinword to 235". +This means that if the word "to" is followed by another word the +contraction is to be used and the space is to be omitted. If these +conditions are not met, the word is translated according to any other +opcodes that may apply to it. + +@opcode{lowword, characters dots} +Replace the characters with the dot pattern if they are a word +preceded and followed by whitespace. No punctuation either before or +after the word is allowed. The term "lowword" derives from the fact +that in English these contractions are written in the lower part of +the cell. For example: + +@example +lowword were 2356 +@end example + +@opcode{contraction, characters} +If you look at @file{en-us-g2.ctb} you will see that some words are +actually contracted into some of their own letters. A famous example +among braille transcribers is "also", which is contracted as "al". But +this is also the name of a person. To take another example, +"altogether" is contracted as "alt", but this is the abbreviation for +the alternate key on a computer keyboard. Similarly "could" is +contracted into "cd", but this is the abbreviation for compact disk. +To prevent confusion in such cases, The letter sign (see the +[49]letsign opcode) is placed before such letter combinations when +they actually are abbreviations, not contractions. the contraction +opcode tells the translator to do this. + +@opcode{sufword, characters dots} +Replace the characters with the dot pattern if they are either a word +or at the beginning of a word. + +@opcode{prfword, characters dots} +Replace the characters with the dot pattern if they are either a word +or at the end of a word. + +@opcode{begword, characters dots} +Replace the characters with the dot pattern if they are at the +beginning of a word. + +@opcode{begmidword, characters dots} +Replace the characters with the dot pattern if they are either at the +beginning or in the middle of a word. + +@opcode{midword, characters dots} +Replace the characters with the dot pattern if they are in the middle +of a word. + +@opcode{midendword, characters dots} +Replace the characters with the dot pattern if they are either in the +middle or at the end of a word. + +@opcode{endword, characters dots} +Replace the characters with the dot pattern if they are at the end of +a word. + +@opcode{partword, characters dots} +Replace the characters with the dot pattern if the characters are +anywhere in a word, that is, if they are proceeded or followed by a +letter. + +@opcode{prepunc, characters dots} +Replace the characters with the dot pattern if they are part of +punctuation at the beginning of a word. + +@opcode{postpunc, characters dots} +Replace the characters with the dot pattern if they are part of +punctuation at the end of a word. + +@opcode{begnum, characters dots} +Replace the characters with the dot pattern if they are at the +beginning of a number, that is, before all its digits. For example, in +@file{en-us-g1.ctb} we have "begnum # 4". + +@opcode{midnum, characters dots} +Replace the characters with the dot pattern if they are in the middle +of a number. For example, @file{en-us-g1.ctb} has "midnum . 46". This +is because the decimal point has a different dot pattern than the +period. + +@opcode{endnum, characters dots} +Replace the characters with the dot pattern if they are at the end of +a number. For example en-us-g1.ctb has "endnum th 1456". This handles +things like 4th. A letter sign is @emph{NOT} inserted. + +@opcode{joinnum, characters dots} +Replace the characters with the dot pattern. In addition, if +whitespace and a number follows omit the whitespace. + +@end table + +@node Character-Class Opcodes, Swap Opcodes, Translation Opcodes, How to Write Translation Tables +@section Character-Class Opcodes + +These opcodes define and use character classes. A character class +associates a set of characters with a name. The name then refers to +any character within the class. A character may belong to more than +one class. + +The basic character classes correspond to the character definition +opcodes, with the exception of uplow, which defines characters +belonging to the two classes uppercase and lowercase. These classes +are: + +@table @code +@item space +White-space characters such as blank and tab +@item digit +Numeric characters +@item letter +Both uppercase and lowercase alphabetic characters +@item lowercase +Lowercase alphabetic characters +@item uppercase +uppercase alphabetic characters +@item punctuation +Punctuation marks +@item sign +signs such as percent +@item math +Mathematical symbols +@item litdigit +literary digit +@item undefined +Not properly defined + +@end table + +The opcodes which define and use character classes are shown below. +For examples see @file{fr-abrege.ctb}. + +@table @code + +@opcode{class, name characters} +Define a new character class. The characters operand must be specified +as a string. A character class may not be used until it has been +defined. + +@opcode{after, class opcode ...} +The specified opcode is further constrained in that the matched +character sequence must be immediately preceded by a character +belonging to the specified class. If this opcode is used more than +once on the same line then the union of the characters in all the +classes is used. + +@opcode{before, class opcode ...} +The specified opcode is further constrained in that the matched +character sequence must be immediately followed by a character +belonging to the specified class. If this opcode is used more than +once on the same line then the union of the characters in all the +classes is used. + +@end table + +@node Swap Opcodes, The Context and Multipass Opcodes, Character-Class Opcodes, How to Write Translation Tables +@section Swap Opcodes + +The swap opcodes are needed to tell the context, correct and multipass +opcodes which dot patterns to swap for which characters. There are +two, swapcd and swapdd. the first swaps dot patterns for characters. +The second swaps dot patterns for dot patterns. The first is used in +the context opcode and the second is used in the multipass opcodes. +Dot patterns are separated by commas and may contain more than one +cell. + +@table @code + +@findex swapcd +@item swapcd name characters dots,dots,dots,... +See above paragraph for explanation. For example: + +@example +swapcd dropped 0123456789 356,2,23,... +@end example + +@findex swapdd +@item swapdd, name dots,dots,dots... dotpattern1,dotpattern2,dotpattern3,... +The @code{swapdd} opcode defines substitutions for the multipass +opcodes. In the second operand the dot patterns must be single cells, +but in the third operand multi-cell dot patterns are allowed. This is +because multi-cell patterns in the second operand would lead to +ambiguities. + +@end table + +@node The Context and Multipass Opcodes, The correct Opcode, Swap Opcodes, How to Write Translation Tables +@section The Context and Multipass Opcodes + +@table @code +@anchor{context-opcode} +@findex context +@findex pass2 +@findex pass3 +@findex pass4 +@item context test action +@itemx pass2 test action +@itemx pass3 test action +@itemx pass4 test action +The context and multipass opcodes (pass2, pass3 and pass4) provide +translation capabilities beyond those of the basic translation opcodes +discussed previously. The multipass opcodes cause additional passes to +be made over the string to be translated. The number after the word +"pass" indicates in which fass the entry is to be applied. If no +multipass opcodes are given, only the first translation pass is made. +The context opcode is basically a multipass opcode for the first pass. +It differs slightly from the multipass opcodes per se. The format of +all these opcodes is: + +@example +opcode test action +@end example + +The test and action operands have suboperands. Each suboperand begins +with a non-alphanumeric character and ends when another non-alphanumeric +character is encountered. The suboperands and their initial characters +are as follows. + +@table @kbd +@item " (double quote) +a string of characters. This string must be terminated by another +double quote. It may contain any characters. If a double quote is +needed within the string it must be preceded by a backslash (@samp{\}). +If a space is needed it must be represented by the escape sequence \s. +This suboperand is valid only in the test part of the context opcode. + +@item @@ (at sign) +a sequence of dot patterns. Cells are separated by hyphens as usual. +This suboperand is not valid in the test part of the context opcode. + +@item $ (dollar sign) +a string of attributes, such as d for digit, l for letter, etc. More +than one attribute can be given. If you wish to check characters with +any attribute, use the letter a. Input characters are checked to see +if they have at least one of the attributes. The attribute string can +be followed by numbers specifying how many characters are to be +checked. If no numbers are given, 1 is assumed. If two numbers +separated by a hyphen are given, the input is checked to make sure +that at least the first number of characters with the attributes are +present, but no more than the second number. If only one number is +present, then exactly that many characters must have the attributes. a +period instead of the numbers indicates an indefinite number of +characters. This suboperand is valid in all test parts but not in +action parts. + +@item ! (exclamation point) +reverses the logical meaning of the suboperand which follows. For +example, !$d is true only if the character is @emph{NOT} a digit. This +suboperand is valid in test parts only. + +@item % (percent sign) +the name of a class defined by the class opcode or the name of a swap +set defined by the swap opcodes. Names may contain only letters and +digits. The letters may be upper or lower-case. The case matters. +Class names may be used in test parts only. Swap names are valid +everywhere. + +@item _ (underscore) +Move backward. If a number follows, move backward that number of +characters. the program never moves backward beyond the beginning of +the input string. This suboperand is valid only in test parts. + +@item [ (left bracket) +start replacement here. This suboperand must always be paired with a +right bracket and is valid only in test parts. + +@item ] (right bracket) +end replacement here. This suboperand must always be paired with a +left bracket and is valid only in test parts. + +@item # (number sign or crosshatch) +test or set a variable. Variables are referred to by numbers 1 to 50, +for example, #1, #2, #25. Variables may be set by one context or +multipass opcode and tested by another. Thus, an operation that occurs +at one place in a translation can tell an operation that occurs later +about itself. This feature will be used in math translation, and it +may also help to alleviate the need for new opcodes. This suboperand +is valid everywhere. + +Variables are set in the action part. To set a variable use an +expression like #1=1, #2=5, etc. Variables are also incremented and +decremented in the action part with expressions like #1+, #3-, etc. +These operators increment or decrement the variable by 1. + +Variables are tested in the test part with expressions like #1=2, +#3<4, 5>6, etc. + +@item * (asterisk) +Copy the characters or dot patterns in the input within the +replacement brackets into the output and discard anything else that +may match. This feature is used, for example, for handling numeric +subscripts in Nemeth. This suboperand is valid only in action parts. + +@item ? (question mark) +Valid only in the action part. The characters to be replaced are +simply ignored. That is, they are replaced with nothing. + +@end table + +@end table + +@node The correct Opcode, Miscellaneous Opcodes, The Context and Multipass Opcodes, How to Write Translation Tables +@section The correct Opcode + +Because some input (such as that from an OCR program) may contain +systematic errors, it is sometimes advantageous to use a +pre-translation pass to remove them. The errors and their corrections +are specified by the correct opcode. If there are no correct opcodes +in a table, the pre-translation pass is not used. The format of the +correct opcode is very similar to that of the @ref{context-opcode}. +The only difference is that in the action part strings may be used and +dot patterns may not be used. Some examples of correct opcode entries +are: + +@example +correct "\\" ? Eliminate backslashes +correct "cornf" "comf" fix a common "scano" +correct "cornm" "comm" +correct "cornp" "comp" +correct "*" ? Get rid of stray asterisks +correct "|" ? ditto for vertical bars +correct "\s?" "?" drop space before question mark +@end example + +@node Miscellaneous Opcodes, , The correct Opcode, How to Write Translation Tables +@section Miscellaneous Opcodes + +@table @code +@anchor{include-opcode} +@opcode{include, filename} +Read the file indicated by filename and incorporate or include its +entries into the table. Included files can include other files, which +can include other files, etc. for an example, see what files are +included by the entry include @file{en-us-g1.ctb} in the table +@file{en-us-g2.ctb}. If the included file is not in the same directory +as the main table, use a full pathname for filename. + +@opcode{locale, characters} +Not implemented, but recognized and ignored for backward +compatibility. + +@opcode{display, character dots} +Associates dot patterns with the characters which will be sent to a +braille embosser, display or screen font. The character must be in the +range 0-255 and the dots must specify a single cell. Here are some +examples: + +@example +display a 1 When the character a is sent to the embosser or display, +it # will produce a dot 1. +@end example + +@example +display L 123 When the character L is sent to the display or embosser +# produces dots 1-2-3. +@end example + +The display opcode is optional. It is used when the embosser or +display has a different mapping of characters to dot patterns than +that given in @ref{Character-Definition Opcodes}. If used, display +entries must proceed character-definition entries. + +@opcode{multind, dots opcode opcode ...} +the multind opcode tells the back-translator that a sequence of +braille cells represents more than one braille indicator. For example, +in @file{en-us-g1.ctb} we have "multind 56-6 letsign capsign". The +back-translator can generally handle single braille indicators, but it +cannot apply them when they immediately follow each other. It +recognizes the letter sign if it is followed by a letter and takes +appropriate action. It also recognizes the capital sign if it is +followed by a letter. But when there is a letter sign followed by a +capital sign it fails to recognize the letter sign unless the sequence +has been defined with multind. A multind entry may not contain a +comment because liblouis would attempt to interpret it as an opcode. + +@end table + +@node Notes on Back-Translation, Key Index, How to Write Translation Tables, Top +@chapter Notes on Back-Translation + +Back-translation is carried out by the function +@code{lou_backTranslateString}. Its calling sequence is described in +@ref{Programming with liblouis}. Tables containing no context, +multipass or correct opcodes can be used for both forward and backward +translation. If these opcodes are needed different tables will be +required. @code{lou_backTranslateString} first performs pass4, if +present, then pass3, then pass2, then the backtranslation, then +corrections. Note that this is exactly the inverse of forward +translation. + +@node Key Index, , Notes on Back-Translation, Top +@unnumbered Opcode Index + +@printindex fn + +@bye + + +