[brailleblaster] Re: More Clarifications

John: I will try that tonight and let you know how things go. Also
deal with opening the brf files.

F.

On Tue, Aug 7, 2012 at 4:41 PM, John J. Boyer
<john.boyer@xxxxxxxxxxxxxxxxx> wrote:
> Hi Francois,
>
> liblouisutdml treats blank lines in a plain text file as paragraph
> separators, so if the file has those we will hopefully get sensible
> paragraphs.
>
> John
>
> On Tue, Aug 07, 2012 at 02:26:22PM -0400, Fran�ois Ouellette wrote:
>> John: I tried a few approached when importing non-XML files with Tika,
>> one that produces text only, that we just display on the daisy view,
>> and one that produces an XML file (in fact in is a XHTML file). When
>> walking the XHTML with XOM we get what is displayed today on the daisy
>> view. I tried opening the XHTML with liblouisutdml using a sem file
>> but the results were not very good. The problem is that with Tika we
>> usually get meta, heading and title elements, but only one <p> element
>> that holds all the extracted text. I have not tried processing the
>> Tika text file with liblouisutdml as you indicated earlier. This may
>> be the better option, as we would also get a UTD structure and an
>> initial translation.
>>
>> F.
>>
>> On Tue, Aug 7, 2012 at 1:47 PM, John J. Boyer
>> <john.boyer@xxxxxxxxxxxxxxxxx> wrote:
>> > UTF-8 should be translated only for display purposes. liblouisutdml
>> > requires the opriginal UTF-8.
>> >
>> > When I said that text files should be procedssed by calling
>> > translateTextFile with formatFor utd I was thinkinng of plain text, not
>> > text derived from imported files such as pdf. It would probably be more
>> > consistent to let tika handle even plain text, converting it to xml.
>> >
>> > John
>> >
>> > On Tue, Aug 07, 2012 at 11:45:15AM -0400, Fran�ois Ouellette wrote:
>> >> When importing non-XML documents with foreign or special characters
>> >> they may contain Unicode expressions such as \u00e9 since they were
>> >> not processed by  liblouisutdml. I have a routine to find the
>> >> corresponding codepoints and display the corresponding character. I
>> >> haven't done much testing yet but I guess that when saving as UTD
>> >> these should be processed correctly.
>> >>
>> >> François.
>> >>
>> >> On Tue, Aug 7, 2012 at 10:04 AM, John J. Boyer <john@xxxxxxxxxxxxxx> 
>> >> wrote:
>> >> > Hi Francois,
>> >> >
>> >> > What context are you considering when you ask about UTF-8? If these
>> >> > codes occur in xml documents they are automatically handled by
>> >> > liblouisutdml on translation. What does Java do when you attempt to
>> >> > display them?
>> >> >
>> >> > Thanks,
>> >> > John
>> >> >
>> >> > On Tue, Aug 07, 2012 at 07:59:06AM -0400, Fran�ois Ouellette wrote:
>> >> >> John: thanks for the clarifications. We are half-way through for the
>> >> >> brf files, I will add a method to read and backtranslate them.
>> >> >>
>> >> >> What about UTF-8? Is BB supposed to recognize the \u sequences and
>> >> >> change them to the corresponding characters?
>> >> >>
>> >> >> Thanks.
>> >> >> François.
>> >> >>
>> >> >> On Mon, Aug 6, 2012 at 8:53 PM, John J. Boyer
>> >> >> <john.boyer@xxxxxxxxxxxxxxxxx> wrote:
>> >> >> > My vision is that BrailleBlaster will be able to display and edit any
>> >> >> > flavor of xml, just as liblouisutdml can translate any flavor.
>> >> >> > Liblouisutdml accompliishes this by using a sort of pattern-matching
>> >> >> > virtual machine. The semantic-action files are the "programs" for 
>> >> >> > this
>> >> >> > VM. If I had it to do over again I would format them somewhat
>> >> >> > differently. Each line would contain first the pattern, then the
>> >> >> > "instruction", then parameters, separated by white space. Optionally,
>> >> >> > an equals sign could be inserted between the patterns and the
>> >> >> > instructions, so Java could accept them as properties files.
>> >> >> >
>> >> >> > Most of the patterns are literal such as "p" "span,class,italic", 
>> >> >> > and so
>> >> >> > on. Patterns can also be XPath expressions.
>> >> >> >
>> >> >> > The instruction is either the name of an action to be applied to the
>> >> >> > pattern, a style or a macro.
>> >> >> >
>> >> >> > The parameters are bits of text to be inserted between the texts
>> >> >> > contained in the subtree of the patterns. For an example, see 
>> >> >> > nemeth.sem
>> >> >> >
>> >> >> > For BrailleBlaster, the patterns would be similar, actions would 
>> >> >> > also be
>> >> >> > similar in many cases, except that those having to do with Braille 
>> >> >> > would
>> >> >> > be dropped, and others, having to do with displaying on a screen 
>> >> >> > would
>> >> >> > be added.
>> >> >> >
>> >> >> > This describes the display virtual machine. The edit virtual machine
>> >> >> > would be more complex, since there are two types of editing, changing
>> >> >> > the text in a text node and adding or deleting nodes. The former is
>> >> >> > quite straightforward. The latter will generally require selecting 
>> >> >> > the
>> >> >> > name of a style. The definition of style will have to include the 
>> >> >> > name
>> >> >> > of the element and any relevant attribute names and values.
>> >> >> >
>> >> >> > On other clarifications: The best way to handle text files is to use 
>> >> >> > the
>> >> >> > translateTextFile method with the configuration setting formatFor utd
>> >> >> > This will result in an output file with text paragraphs (separated by
>> >> >> > blank lines) enclosed in <p> tags and the Braille translation 
>> >> >> > enclosed
>> >> >> > in <brl> tags, as normal. This can then be handled by BrailleBlaster
>> >> >> > like any other utd file.
>> >> >> >
>> >> >> > BrailleBlaster is also supposed to handle natively brf files. When 
>> >> >> > these
>> >> >> > are recognized they should be displayed in the Braille view. The 
>> >> >> > method
>> >> >> > to use is backTranslateFile formatFor utd should also be specified.
>> >> >> > Again the resulting output file can be handled like any other utd 
>> >> >> > file.
>> >> >> >
>> >> >> > John
>> >> >> >
>> >> >> > --
>> >> >> > John J. Boyer; President, Chief Software Developer
>> >> >> > Abilitiessoft, Inc.
>> >> >> > http://www.abilitiessoft.com
>> >> >> > Madison, Wisconsin USA
>> >> >> > Developing software for people with disabilities
>> >> >> >
>> >> >> >
>> >> >
>> >> > --
>> >> > John J. Boyer, Executive Director
>> >> > GodTouches Digital Ministry, Inc.
>> >> > http://www.godtouches.org
>> >> > Madison, Wisconsin, USA
>> >> > Peace, Love, Service
>> >> >
>> >> >
>> >
>> > --
>> > John J. Boyer; President, Chief Software Developer
>> > Abilitiessoft, Inc.
>> > http://www.abilitiessoft.com
>> > Madison, Wisconsin USA
>> > Developing software for people with disabilities
>> >
>> >
>
> --
> John J. Boyer; President, Chief Software Developer
> Abilitiessoft, Inc.
> http://www.abilitiessoft.com
> Madison, Wisconsin USA
> Developing software for people with disabilities
>
>

Other related posts: