It looks to me as if the coordinate information in the <newline> tag in UTDML is not being used. The x coordinate specifies indenting in terms of a resolution of 20 dpi. John B On Fri, Jul 20, 2012 at 12:17:57AM -0400, Vic Beckley wrote: > John G, > > On my brief experimenting with *.docx, *.rtf, and *.pdf, it seems the > structure is being preserved. The paragraphs don't show up in the imported > text that I can tell. I don't know if they are showing up visually or not. > The text in the XML view is not noticeably indented where it should be. If > you say no to the translation prompt for UTD, then the paragraphs are there > in the formatted Braille. If you say yes to UTD then the formatting is not > shown. It was this way before when I was working with UTD files. I think it > is just that this feature needs work on how they are displayed. These are > just my thoughts based on limited testing. > > > Best regards from Ohio, U.S.A., > > Vic > E-mail: vic.beckley3@xxxxxxxxx > > > -----Original Message----- > From: brailleblaster-bounce@xxxxxxxxxxxxx > [mailto:brailleblaster-bounce@xxxxxxxxxxxxx] On Behalf Of John Gardner > Sent: Thursday, July 19, 2012 11:32 PM > To: brailleblaster@xxxxxxxxxxxxx > Subject: [brailleblaster] Re: Compiled > > Hello all, we should distinguish between formatting and structure. We need > to capture the structure - which includes paragraphs, headings, tables, etc. > But we don't need to capture the formatting of those things - whether the > paragraph is indented or double spaced, whether the heading is bold or > centered. Or... From the on-going conversation, it isn't clear to me that we > are even capturing structure. I hope I am misunderstanding. > > Thanks. > John G > > > > -----Original Message----- > From: brailleblaster-bounce@xxxxxxxxxxxxx > [mailto:brailleblaster-bounce@xxxxxxxxxxxxx] On Behalf Of Fran�ois Ouellette > Sent: Thursday, July 19, 2012 3:25 PM > To: brailleblaster@xxxxxxxxxxxxx > Subject: [brailleblaster] Re: Compiled > > I don't know what was there at the beginning, but it looks like it has > improved over time if we read the notes from the consecutve releases. > Again, it is not a content formatter, it is a content extractor! But we can > get XML or XHTML from a file through SAX classes and decide on the resulting > format. I am following-up on this. Currently we only get unformatted text and > it is a start point. > > Fran�ois > > On Thu, Jul 19, 2012 at 5:40 PM, Michael Whapples <mwhapples@xxxxxxx> wrote: > > Hello, > > I remember John Gardner mentioning Tika near the beginning of the > > Brailleblaster project, but at that time we concluded the formatting > > from it was not really good enough. Has it improved? > > > > Michael Whapples > > On 19/07/2012 22:17, Fran�ois Ouellette wrote: > >> > >> John: Exactly! Transformation should be a breeze with the sem > >> statements. I will sure follow up. > >> > >> Fran�ois. > >> > >> On Thu, Jul 19, 2012 at 3:37 PM, John J. Boyer > >> <john.boyer@xxxxxxxxxxxxxxxxx> wrote: > >>> > >>> Hi Francois, > >>> > >>> It is very desirable to get xml output from tika. liblouisutdml may > >>> already have a .sem file to handle it. If not, one can be created > >>> easily. > >>> > >>> John > >>> > >>> On Thu, Jul 19, 2012 at 03:03:41PM -0400, Fran ois Ouellette wrote: > >>>> > >>>> (follow-up on previous email) > >>>> Vic: it seems like we can produce formatted XML or HTML from the > >>>> extraction, in which case we could retrieve the main formatting > >>>> elements and replicate them in BB. Let me check on this. > >>>> > >>>> Fran�ois. > >>>> > >>>> On Thu, Jul 19, 2012 at 12:26 PM, Vic Beckley > >>>> <vic.beckley3@xxxxxxxxx> > >>>> wrote: > >>>>> > >>>>> John and Fran�ois, > >>>>> > >>>>> I got it to compile. I opened a Word 2010 document with it. It > >>>>> seemed the format of the text was missing. I don't think the > >>>>> paragraphs were still intact. > >>>>> > >>>>> I will do more testing later. I am a little under the weather > >>>>> today and I think I am going to go rest now. More later. Looks > >>>>> good so far. > >>>>> > >>>>> > >>>>> Best regards from Ohio, U.S.A., > >>>>> > >>>>> Vic > >>>>> E-mail: vic.beckley3@xxxxxxxxx > >>>>> > >>>>> > >>>>> > >>>>> > >>> -- > >>> John J. Boyer; President, Chief Software Developer Abilitiessoft, > >>> Inc. > >>> http://www.abilitiessoft.com > >>> Madison, Wisconsin USA > >>> Developing software for people with disabilities > >>> > >>> > > > > > > > -- John J. Boyer; President, Chief Software Developer Abilitiessoft, Inc. http://www.abilitiessoft.com Madison, Wisconsin USA Developing software for people with disabilities