John: I will try that tonight and let you know how things go. Also deal with opening the brf files. F. On Tue, Aug 7, 2012 at 4:41 PM, John J. Boyer <john.boyer@xxxxxxxxxxxxxxxxx> wrote: > Hi Francois, > > liblouisutdml treats blank lines in a plain text file as paragraph > separators, so if the file has those we will hopefully get sensible > paragraphs. > > John > > On Tue, Aug 07, 2012 at 02:26:22PM -0400, Fran�ois Ouellette wrote: >> John: I tried a few approached when importing non-XML files with Tika, >> one that produces text only, that we just display on the daisy view, >> and one that produces an XML file (in fact in is a XHTML file). When >> walking the XHTML with XOM we get what is displayed today on the daisy >> view. I tried opening the XHTML with liblouisutdml using a sem file >> but the results were not very good. The problem is that with Tika we >> usually get meta, heading and title elements, but only one <p> element >> that holds all the extracted text. I have not tried processing the >> Tika text file with liblouisutdml as you indicated earlier. This may >> be the better option, as we would also get a UTD structure and an >> initial translation. >> >> F. >> >> On Tue, Aug 7, 2012 at 1:47 PM, John J. Boyer >> <john.boyer@xxxxxxxxxxxxxxxxx> wrote: >> > UTF-8 should be translated only for display purposes. liblouisutdml >> > requires the opriginal UTF-8. >> > >> > When I said that text files should be procedssed by calling >> > translateTextFile with formatFor utd I was thinkinng of plain text, not >> > text derived from imported files such as pdf. It would probably be more >> > consistent to let tika handle even plain text, converting it to xml. >> > >> > John >> > >> > On Tue, Aug 07, 2012 at 11:45:15AM -0400, Fran�ois Ouellette wrote: >> >> When importing non-XML documents with foreign or special characters >> >> they may contain Unicode expressions such as \u00e9 since they were >> >> not processed by liblouisutdml. I have a routine to find the >> >> corresponding codepoints and display the corresponding character. I >> >> haven't done much testing yet but I guess that when saving as UTD >> >> these should be processed correctly. >> >> >> >> François. >> >> >> >> On Tue, Aug 7, 2012 at 10:04 AM, John J. Boyer <john@xxxxxxxxxxxxxx> >> >> wrote: >> >> > Hi Francois, >> >> > >> >> > What context are you considering when you ask about UTF-8? If these >> >> > codes occur in xml documents they are automatically handled by >> >> > liblouisutdml on translation. What does Java do when you attempt to >> >> > display them? >> >> > >> >> > Thanks, >> >> > John >> >> > >> >> > On Tue, Aug 07, 2012 at 07:59:06AM -0400, Fran�ois Ouellette wrote: >> >> >> John: thanks for the clarifications. We are half-way through for the >> >> >> brf files, I will add a method to read and backtranslate them. >> >> >> >> >> >> What about UTF-8? Is BB supposed to recognize the \u sequences and >> >> >> change them to the corresponding characters? >> >> >> >> >> >> Thanks. >> >> >> François. >> >> >> >> >> >> On Mon, Aug 6, 2012 at 8:53 PM, John J. Boyer >> >> >> <john.boyer@xxxxxxxxxxxxxxxxx> wrote: >> >> >> > My vision is that BrailleBlaster will be able to display and edit any >> >> >> > flavor of xml, just as liblouisutdml can translate any flavor. >> >> >> > Liblouisutdml accompliishes this by using a sort of pattern-matching >> >> >> > virtual machine. The semantic-action files are the "programs" for >> >> >> > this >> >> >> > VM. If I had it to do over again I would format them somewhat >> >> >> > differently. Each line would contain first the pattern, then the >> >> >> > "instruction", then parameters, separated by white space. Optionally, >> >> >> > an equals sign could be inserted between the patterns and the >> >> >> > instructions, so Java could accept them as properties files. >> >> >> > >> >> >> > Most of the patterns are literal such as "p" "span,class,italic", >> >> >> > and so >> >> >> > on. Patterns can also be XPath expressions. >> >> >> > >> >> >> > The instruction is either the name of an action to be applied to the >> >> >> > pattern, a style or a macro. >> >> >> > >> >> >> > The parameters are bits of text to be inserted between the texts >> >> >> > contained in the subtree of the patterns. For an example, see >> >> >> > nemeth.sem >> >> >> > >> >> >> > For BrailleBlaster, the patterns would be similar, actions would >> >> >> > also be >> >> >> > similar in many cases, except that those having to do with Braille >> >> >> > would >> >> >> > be dropped, and others, having to do with displaying on a screen >> >> >> > would >> >> >> > be added. >> >> >> > >> >> >> > This describes the display virtual machine. The edit virtual machine >> >> >> > would be more complex, since there are two types of editing, changing >> >> >> > the text in a text node and adding or deleting nodes. The former is >> >> >> > quite straightforward. The latter will generally require selecting >> >> >> > the >> >> >> > name of a style. The definition of style will have to include the >> >> >> > name >> >> >> > of the element and any relevant attribute names and values. >> >> >> > >> >> >> > On other clarifications: The best way to handle text files is to use >> >> >> > the >> >> >> > translateTextFile method with the configuration setting formatFor utd >> >> >> > This will result in an output file with text paragraphs (separated by >> >> >> > blank lines) enclosed in <p> tags and the Braille translation >> >> >> > enclosed >> >> >> > in <brl> tags, as normal. This can then be handled by BrailleBlaster >> >> >> > like any other utd file. >> >> >> > >> >> >> > BrailleBlaster is also supposed to handle natively brf files. When >> >> >> > these >> >> >> > are recognized they should be displayed in the Braille view. The >> >> >> > method >> >> >> > to use is backTranslateFile formatFor utd should also be specified. >> >> >> > Again the resulting output file can be handled like any other utd >> >> >> > file. >> >> >> > >> >> >> > John >> >> >> > >> >> >> > -- >> >> >> > John J. Boyer; President, Chief Software Developer >> >> >> > Abilitiessoft, Inc. >> >> >> > http://www.abilitiessoft.com >> >> >> > Madison, Wisconsin USA >> >> >> > Developing software for people with disabilities >> >> >> > >> >> >> > >> >> > >> >> > -- >> >> > John J. Boyer, Executive Director >> >> > GodTouches Digital Ministry, Inc. >> >> > http://www.godtouches.org >> >> > Madison, Wisconsin, USA >> >> > Peace, Love, Service >> >> > >> >> > >> > >> > -- >> > John J. Boyer; President, Chief Software Developer >> > Abilitiessoft, Inc. >> > http://www.abilitiessoft.com >> > Madison, Wisconsin USA >> > Developing software for people with disabilities >> > >> > > > -- > John J. Boyer; President, Chief Software Developer > Abilitiessoft, Inc. > http://www.abilitiessoft.com > Madison, Wisconsin USA > Developing software for people with disabilities > >