[brailleblaster] Re: Unit test for index values

  • From: Keith Creasy <kcreasy@xxxxxxx>
  • To: "brailleblaster@xxxxxxxxxxxxx" <brailleblaster@xxxxxxxxxxxxx>
  • Date: Fri, 17 May 2013 16:30:08 +0000

Right, John. Unfortunately files are seldom exactly what they should be. That's 
been the case since we began automated braille translation and the problem 
persists.

I don't know the answer to your question about an unbreakable space. Maybe. A 
pre-processor could identify identicle inline elements with space between them 
and combine them. Certainly elements that are designated as emphasis or 
"italics" could be done this way. Maybe it's also safe to assume that a 
new-line followed by spaces and then an opening element is extraneous. I can 
also think of no reason that a block element (para) should start with white 
space and, in fact, putting it there probably messes up the presentation in a 
web browser or XML editor. The same applies to headings.


Maybe these are enhancements to LibLouisUTDML. I'm not sure.


-----Original Message-----
From: brailleblaster-bounce@xxxxxxxxxxxxx 
[mailto:brailleblaster-bounce@xxxxxxxxxxxxx] On Behalf Of John J. Boyer
Sent: Friday, May 17, 2013 12:06 PM
To: brailleblaster@xxxxxxxxxxxxx
Subject: [brailleblaster] Re: Unit test for index values

In the sample text everything in the paragraph should have been inside a single 
<strong> element for proper Braille translation.

If a space between elements "should" be there it should be represented by an 
unbreakable space. Can a preprocessor do this?

The file2brl method and program puts a newline between adjacent > and < for 
easier reading in case one wants to look at the file2brl.temp file for 
debugging purposes. These newlines are later dropped, because they create text 
nodes with all white space.

John

On Fri, May 17, 2013 at 01:29:28PM +0000, Keith Creasy wrote:
> Yes, I know it is supposed to but something is wrong. I note a lot of white 
> space in certain documents that shouldn't be there to begin with. Here is an 
> example:
> 
> <p>
>         <strong>Page 8</strong>
>         <strong>Sample Presentations</strong> </p>
> 
> 
> The problem here is that according to the DTBook DTD a paragraph can contain 
> CData and so the extra white space, put there obviously for visual formatting 
> in a text editor, becomes part of the document content.
> 
> It should be:
> 
> <p><strong>Page 8 Sample Presentations</strong></p>
> 
> You also can't arbitrarily throw out spaces. For example, here a space 
> between elements in valid and relevant:
> 
> <p><strong>Page 8</strong> <strong>Sample Presentations</strong></p>
> 
> This would be typically done by a word processor like Word where it marked up 
> the text correctly but the space, since it really can't have a "strong" 
> style" was left alone between elements.
> 
> 
> There are a lot of variations and in order to make the conversion to braille 
> smooth we may want to pre-process documents to provide some consistency. The 
> DAISY Pipeline does have a DTBook fixer. I may try it to see how much it 
> helps.
>  
> 
> 
> 
> Keith Creasy
> Software Developer
> American Printing House for the Blind
> KCreasy@xxxxxxx
> Phone: 502.895.2405
> Skype: keith537
> 
> 
> -----Original Message-----
> From: brailleblaster-bounce@xxxxxxxxxxxxx 
> [mailto:brailleblaster-bounce@xxxxxxxxxxxxx] On Behalf Of John J. 
> Boyer
> Sent: Friday, May 17, 2013 9:10 AM
> To: brailleblaster@xxxxxxxxxxxxx
> Subject: [brailleblaster] Re: Unit test for index values
> 
> Preferences.cfg uses the table list compress.cti,en-us-g2.ctb . The 
> compress.cti gets rid of extraneous whitespace, but might affect index values.
> 
> John
> 
> On Fri, May 17, 2013 at 12:10:40PM +0000, Keith Creasy wrote:
> > Hi John.
> > 
> > I wasn't suggesting you do it. I was hoping to get someone else to take an 
> > interest.
> > 
> > I understand how it works but they are still coming out wrong. It seems to 
> > be mostly related to white space. I've just about decided that we'll have 
> > to pre-process the XML to get rid of extraneous white space before we 
> > translate the text. This seems especially true of DTBook documents produced 
> > in Word or the DAISY Pipeline rtf2dtbook conversion.
> > 
> > 
> > Keith Creasy
> > Software Developer
> > American Printing House for the Blind KCreasy@xxxxxxx
> > Phone: 502.895.2405
> > Skype: keith537
> > 
> > -----Original Message-----
> > From: brailleblaster-bounce@xxxxxxxxxxxxx
> > [mailto:brailleblaster-bounce@xxxxxxxxxxxxx] On Behalf Of John J. 
> > Boyer
> > Sent: Friday, May 17, 2013 7:27 AM
> > To: brailleblaster@xxxxxxxxxxxxx
> > Subject: [brailleblaster] Re: Unit test for index values
> > 
> > Unit testing would be a good idea, but i'm not familiar with it, and I want 
> > to concentrate on the code of liblouis and liblouisutdml.
> > The index values are provided by liblouis. If you have a paragraph with 
> > only one block of text they are unaltered by liblouisutdml .
> > 
> > John
> > 
> > On Thu, May 16, 2013 at 03:26:17PM +0000, Keith Creasy wrote:
> > >    Everyone.
> > > 
> > > 
> > > 
> > >    It would be really great if someone could try and put together a unit 
> > > test
> > >    for the index attribute values. Basically just a test document like the
> > >    one John B. sent out. The test could run the file through LibLouisUTDML
> > >    and compare the output with known correct values, printing a report on 
> > > any
> > >    errors. This is still the most critical aspect of this whole project 
> > > and
> > >    we've made some great progress. I'd like to get as close as we can to
> > >    tying up the remaining loose ends.
> > > 
> > > 
> > > 
> > >    I wonder if it might even be possible to test the values produced with
> > >    LibLouisUTDML with values from LibLouis when the same text is processed
> > >    without the extra XML markup.
> > > 
> > > 
> > > 
> > > 
> > > 
> > >    All of us who are involved in this are pretty covered up so if one or 
> > > two
> > >    others could jump in it would help a lot.
> > > 
> > > 
> > > 
> > >    Thanks!
> > > 
> > > 
> > > 
> > >    Keith
> > > 
> > > 
> > > 
> > > 
> > > 
> > >    Keith Creasy
> > > 
> > >    Software Developer
> > > 
> > >    American Printing House for the Blind
> > > 
> > >    KCreasy@xxxxxxx
> > > 
> > >    Phone: 502.895.2405
> > > 
> > >    Skype: keith537
> > > 
> > > 
> > 
> > --
> > John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc.
> > http://www.abilitiessoft.com
> > Madison, Wisconsin USA
> > Developing software for people with disabilities
> > 
> > 
> 
> --
> John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc.
> http://www.abilitiessoft.com
> Madison, Wisconsin USA
> Developing software for people with disabilities
> 
> 

--
John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc.
http://www.abilitiessoft.com
Madison, Wisconsin USA
Developing software for people with disabilities



Other related posts: