[liblouis-liblouisxml] [brailleblaster] Re: Some Thoughts on BrailleBlaster

  • From: "John J. Boyer" <johnjboyer@xxxxxxxxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Tue, 7 Sep 2010 14:55:14 -0500

Laura,

If you want the job of improving tika I think nobody else will claim it. 
You should probably join the tika developers' list. They probably have
their own methods for handling patches. You can also use the issue
tracker to post bugs. Nothing in xml2brl is likely to be helpful, since
tika takes a different approach.
It would also be desirable to add some additional file types to tika's 
repertoire, such as Duxbury and Megadots. I don't know where we would 
get the file specificationns, however.

John B.

On Tue, Sep 07, 2010 at 10:08:28AM -0500, qubit wrote:
> I think modifying what's there may be preferable to starting from scratch. 
> If we look outside of tika, I don't know if John's xml2brl code is 
> appropriate to borrow from -- John, please comment.  Does it take a "tag 
> soup" approach to parsing the html in a document?
> Is it easily maintainable and flexible to add new features as they come?
> It is true, tika 0.7 munges the placement of newlines, but I think this 
> could be corrected.  I need to experiment to see if other features need 
> fixing/implementing.
> I say me, because no one else is claiming ownership of this part of the 
> code.  Presumably John is busy with the latest fixes to the C libraries, and 
> Yuemei will be doing UI stuff.
> I'm just poking around.  If others are poking in the same place, please 
> post.
> Happy hacking.
> --le
> 
> 
> 
> 
> 
> ----- Original Message ----- 
> From: "Michael Whapples" <mwhapples@xxxxxxx>
> To: <brailleblaster@xxxxxxxxxxxxx>
> Sent: Tuesday, September 07, 2010 8:07 AM
> Subject: [brailleblaster] Re: Some Thoughts on BrailleBlaster
> 
> 
> Hello,
> If you are suggesting not using tika, then these are the alternatives I 
> believe exist:
> * Find some other library to help with importing different formats. 
> Previously it looked like we had no single solution and so would be using 
> bits and pieces to do the task and so not have a common import API. Any 
> further thoughts in that direction?
> * Write our own import code. Do you think such an option would actually be 
> less than improving the existing parsers in tika? If the existing parsers in 
> tika really are in such a state they could never be improved to what we need 
> then would reimplementation of some of the tika parsers be more work than 
> writing our own custom import code?
> 
> OK, I think my direction is known now, unless there's a compelling 
> alternative for tika, then we should possibly improve tika, the apache tika 
> developers may actually be pleased to receive patches for these issues.
> 
> Michael Whapples
> 
> On 6 Sep 2010, at 10:43, John J. Boyer wrote:
> 
> > It has been proposed that we use tika for file conversion, hunspell for
> > spellchecking and itex2MML for TeX to MathML translation.
> >
> > I have been experimenting with tika, and I don't think it is adequate
> > for our needs. On text files, it does not recognize blank linnes as
> > paragraph separators. On rtf files, it does not seem to recognize
> > paragraphs. Results with simple doc files are good. I donn't know how it
> > would perform with more complex files.
> >
> > I think it is a good idea to avoid any software that is licensed solely
> > under the GPL. Hunspell is licensed under the LGPL itex2MML is licensed
> > under the IPL, GPLk and LGPL. I think that is acceptable.
> >
> > John B.
> >
> > -- 
> > John J. Boyer; President, Chief Software Developer
> > Abilitiessoft, Inc.
> > http://www.abilitiessoft.com
> > Madison, Wisconsin USA
> > Developing software for people with disabilities
> >
> >
> 
> 
> 

-- 
My websites:
GodTouches Digital Ministry, Inc. http://www.godtouches.org
Abilitiessoft, Inc. http://www.abilitiessoft.com
Location: Madison, WI, USA


----- End forwarded message -----

-- 
My websites:
GodTouches Digital Ministry, Inc. http://www.godtouches.org
Abilitiessoft, Inc. http://www.abilitiessoft.com
Location: Madison, WI, USA

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts:

  • » [liblouis-liblouisxml] [brailleblaster] Re: Some Thoughts on BrailleBlaster - John J. Boyer