I hope the following answers your questions. >> Words and characters are determined by characters defined as letters. If the >> emphasis markings do not start at the beginning of a word it is shifted to >> the >> beginning of the first word. If the emphasis markings do not end at a word >> end, it is shifted to the end of the last word. >> >> Characters marked as capital are merged with other characters marked as >> capitals if the characters between them are defined as spaces. >I don't quite understand. Does this mean that A B C is treated as a single >uppercase word? A B C is treated as a passage: ,,,a ;b ;c,' . The previous approach did not work well with the UEB standard, which is why it was replaced. The current behavior is as follows (and subject to change). This behavior seems to hit the majority of examples in the UEB best. 1. All words are marked using *word and *wordstop, checking if singleletter* is to be used instead. Also checked is whether a whole word is completely covered, i.e. all caps, all underlined, etc. 2. All consecutive whole words (words completely covered) that are more than or equal to len*phrase are converted to passages using firstletter* and lastletter* (or firstword* and lastwordbefore* or lastwordafter*, see next). 3. (Capitalization only) All words that are not in a passage are checked for word resets, i.e. hyphens, apostrophes, etc.. Words are determined as symbols-sequences (Rules of Unified English Braille 2013, page 8): symbols-sequence: an unbroken string of braille signs, whether alphabetic or non-alphabetic, preceded and followed by space (also referred to as symbols-word) If any of the opcodes are not defined then the corresponding stage is skipped, the resulting translation is undefined. >I'm curious, are you keeping firstletter* and lastletter* solely to preserve >backwards-compatibility, or do they still have a real function in the new >design? In the former case, I vote for dropping them for the benefit of >simplicity. We don't have to worry about backwards-compatibility too much. It's >easy enough to write a conversion script to update existing tables to the new >syntax. They would not be necessary for UEB, so yes it will be for backwards-compatibility. In the most recent update they are not implemented, but are still there. It would not be a problem to implement them, but I was going to ask this list whether or not they should be. If someone does want that behavior then they would have to be implemented, as I don't think the remaining opcodes can replicate that behaviour. >My first reaction was that while this adds opcodes and therefore complexity, it >still isn't obvious to me whether it actually covers more cases than before or >not. (I'm only talking about emphasis now, for capitals it is obvious!) Perhaps >I should have a look at some concrete UEB examples before questioning, but >anyway. I felt that it is better to have a bunch of opcodes all do one thing each rather than have opcodes do several things depending on when, how, where, etc. they were used. I originally had some opcodes used for several things, but I decided that giving each opcode just one function will make it more flexible, easier to document and implememt. Also, the emphases are all the same set of opcodes so one would only have to understand that set of opcodes. I wrote a tool which allows me to test examples directly from the UEB standard. For capitalizations it is 80% correct for the examples from chapter 8 of Rules of Unified English Braille (not including examples dealing with large text elements). 10% of those require the what is described next. The majority of remaining failures seem to have to do with how LibLouis handles letter indicators. I have attached the most recent list. >One thing that would be useful is to be able to define characters that "break" >an uppercase passage, and characters that don't. For example in Dutch, the >characters that are not breaking (apart from letters), are minus, plus, >ampersand, full stop, and apostrophe. How does this work in UEB and how are you >handling that? I added a passage_break bit and a word_reset bit to the typeform array so users can specifiy these things manually. The examples below are from the UEB standard. The middle line, if there, is the emphasis, and mono-spaced font works best for viewing. The passage_break bit signifies that a new passage starts on this character and any other passages must stop before it. Examples (@ indicates passage_break): He worked for the ABC. A BBC journalist reported ... 00000000000000000000000@0000000000000000000000000000 ,he "w$ = ! ,,abc4 ,a ,,bbc j|rnali/ report$ 444 STOP RUNNING NOW! It's dangerous. 000000000000000000@00000000000000 ,,,/op runn+ n{6,' ,x's dang}|s4 INITIALS OF WRITER/initials of secretary 000000000000000000@000000000000000000000 ,,,9itials ( writ},'_/9itials ( secret>y The word_reset bit specifies that a word indicator stops at that point in the word and will need to be repeated if it continues. I was originally going to automatically add a word_reset to the hyphen opcode, and create apostrophe(') and initial(.) opcodes for this purpose, but just have the word reset on any non-alphabetic character worked just as well. It would not be a problem to add an opcode so that these characters could be designated in the table files. Examples (P indicates word reset): McGRAW-HILL ,mc,,graw-,,hill UPPERCASE-lowercase ,,upp}case-l{}case MERRY-GO-ROUND ,,m}ry-,,g-,,r.d WELCOME TO McDONALD'S ,,welcome ,,to ,mc,,donald',s www.BLASTSoundMachine.com 000000000P000000000000000 www4,,bla/,s.d,ma*9e4com ATandT 0P0000 ,a,t&,t I have added the 5 transcriber-defined typeform indicators. Their behavior is the same as the rest of the emphases. Each of the five follow the same design as the other emphases: ? = 1, 2, 3, 4, 5 singlelettertrans? trans?word trans?wordstop lentrans?phrase firstwordtrans? lastwordaftertrans? firstlettertrans? lastlettertrans? Please let me know if these changes causes any problems with any other languages as I have so far just focused on UEB. MRG
ÿþ# 8.3.1 in: 20B ueb: #bj,b lou: #bj;,b #~FAIL en-ueb-g1.ctb: noletsignafter . en-ueb-g2.ctb: noletsignafter . in: C. O. Linkletter ueb: ;,c4 ,o4 ,l9klett} lou: ,c4 ,o4 ,l9klett} in: B-E-L-I-E-V-E ueb: ;;,b-,e-,l-,i-,e-,v-,e lou: ;,b-;,e-;,l-,i-;,e-;,v-,e # 8.3.2 # 8.3.3 in: Voyage À Nice ueb: ,voyage ,~*a ,nice lou: ,voyage ;,~*a ,nice # 8.4.2 in: (R)AC ueb: "<,r">,,ac lou: "<;,r">,,ac in: B&B ueb: ,b`&,b lou: ;,b`&,b in: AT&T ueb: ,,at`&,t lou: ,,at`&;,t #~FAIL FOR SALE: 1975 FIREBIRD works? #~~emp #~000000@000 in: SWIFT & CO. ueb: ,,swift `& ,,co4 lou: ,,,swift `& co4,' # 8.5.3 #~FAIL en-ueb-g1.ctb: midnum / 456-34 in: BUY FAHRENHEIT 9/11 ON E-BAY emp: 0000111111111111111000000000 ueb: ,,,buy .1fahr5heit .1#i_/#aa on ;e-bay,' lou: ,,,buy .1fahr5heit .1#i_/aa on ;e-bay,' # 8.5.4 in: "... at 11:00 AM" MARKHAM ECONOMIST AND SUN ueb: 8444 at #aa3#jj ,,am0,-,,,m>kham economi/ & sun,' lou: "8444 at #aa3jj ,,,am0",-m>kham economi/ & sun,' in: & (See Attachment A). A CSP (Carriage Service Provider) has obligations to & ueb: 444 "<,see ,atta*;t ,a">4 ,a ,,csp "<,c>riage ,s}vice ,provid}"> has obliga;ns to 444 lou: ' "<,see ,atta*;t ,,,a">4 a csp,' "<,c>riage ,s}vice ,provid}"> has obliga;ns to ' # 8.6.2 in: XXIInd ueb: ,,xxii,'nd lou: ,,xxi9,'d in: B-U-S ueb: ;;,b-,u-,s lou: ;,b-;,u-;,s in: [£] ueb: .<,.s.> lou: .<;,.s.> in: Voyage À Nice emp: 1111111111111 ueb: .7,voyage ,~*a ,nice.' lou: .7,voyage ;,~*a ,nice.' in: CD ueb: ;,,cd lou: ,,cd in: AC SMITH ueb: ;,,ac ,,smi? lou: ,,ac ,,smi? in: V-NECK SWEATERS FOR SALE! ueb: ;,,,v-neck sw1t}s = sale6,' lou: ,,;,v-neck sw1t}s = sale6,' in: CD CDs ueb: ;,,cd ,,cd,'s lou: ,,cd ,,cd,'s