[bksvol-discuss] Re: Validating and Paragraph Marks

  • From: Stephen Baum <steve@xxxxxxxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Wed, 08 Jun 2005 14:15:23 -0400

Its perhaps worth noting that K1000 will attempt to identify "real" paragraph endings in a text file that is formatted such that each line is a single line paragraph. Its not perfect at doing so - end of paragraph decisions are actually quite hard to make without understanding the text that is being read - but its better than nothing. So, you can open a text file that is formatted in that manner, and then save the opened file as, say, RTF, which is a more intelligent format in terms of differentiating between lines and paragraphs. Or you can save it again as text, after setting the maximum line length in the general settings dialog to some very large number. That will cause each paragraph, or at least the K1000's notion of paragraph, on a single line, which is probably what you want if you must use text.

Stephen

At 08:06 AM 6/8/2005, you wrote:

Monica, check if perhaps real paragraphs are denoted by two paragraph marks in sequence. If that were the case, do a mass replacement of paragraph marks pairs with a unique word. Then do a mass deletion of all remaining paragraph marks. Finally replace back the unique word with paragraph mark pairs.

Guido


Guido Dante Corona IBM Accessibility Center, Austin Tx. Research Division, Phone: 512. 838. 9735. Email: guidoc@xxxxxxxxxxx Web: http://www.ibm.com/able



"Monica Ballard" <MBallard1@xxxxxxxxxxx>
Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx

06/08/2005 05:58 AM
Please respond to
bksvol-discuss

To
<bksvol-discuss@xxxxxxxxxxxxx>
cc
Subject
[bksvol-discuss] Validating and Paragraph Marks




?OCR puts those paragraph symbolss in ?

Iâ??ve seen lots of files like that. Sometimes a line by line match is important so it must be common for many OCRs. To get around the tediousness of not being able to do a global replace on them, sometimes itâ??s faster to go through the document and paste a unique word at every genuine paragraph break, then do a global replace to get rid of all paragraph marks. Finally, Iâ??ll go back and replace my unique word with paragraph marks.

Monica



Other related posts: