[liblouis-liblouisxml] Re: UTDML and where brl noes can appear

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Fri, 18 Oct 2013 20:37:05 +0100

It is being done for ViewPlus, however if it leads to improvements in semantic action files I do not know whether those will be given back to the community. In short it is not my decision, but I would hope the semantic action files could be given back.


However, it looks like we might need to do some cleaning up of the word XML. There are cases where one can delete characters and retype them (IE. in Word itself the document looks the same as it started) but when one refers to the actual XML Word breaks these into multiple w:t elements and so could lead to liblouisutdml not translating properly.

Such a cleanup tool might need some work and so I might be less expectant for that to be given back to the community, however again it is not my decision.

Hope that at least gives you an idea of the state of XML word can give you.

We are not even sure whether in ViewPlus for the project which might use Word XML whether we will even continue with that format as Word is so bad at producing the XML in a useful form.

Michael Whapples
On 18/10/2013 18:38, Vic Beckley wrote:
Michael,

Is this work you are doing with Word files for ViewPlus or for BB? I am
extremely interested in more direct access to Word files, even if that means
saving them in XML format. At least that would bypass the DAISY plug-in. If
you can't talk about it, I know how that goes. Thanks for any info.


Best regards from Ohio,

Vic

-----Original Message-----
From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Michael
Whapples
Sent: Friday, October 18, 2013 12:38 PM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: UTDML and where brl noes can appear

So from what you are saying are the files wordDocument.sem and
w_wordDocument.sem no longer needed? If so why have they not been
deleted from the project?

I must stress though, this is not docx, it is Word XML, like if you
select the format Word XML 2003 in the save dialog of word (it is a
single XML not an archive containing multiple files). Will docx.sem work
on this?

Will I need to create a config file to ensure docx.sem is used?

Michael Whapples
On 18/10/2013 17:25, John J. Boyer wrote:
Separate semantic-action files are not needed for documents that use
namespaces. Semantic-action files use only the local names. Given the
possibility of name conflicts, maybe this should be changed.

Both of the old MSWord semantic-action files are wrong. Look at docx.sem
in the lbu_files directory of liblouisutdmll. You can correct or delete
the old files.

John

On Fri, Oct 18, 2013 at 04:37:35PM +0100, Michael Whapples wrote:
If that is so then the Word XML semantic action files (I believe both
wordDocument.sem and w_wordDocument.sem are wrong as they assign
document action to wordDocument and w:wordDocument elements.

PS. Why do we need separate semantic action files for when a namespace
is used or not? Also relying on the XML to alias the namespace as w:
seems to be wrong as potentially one can alias it as they like: eg.
xmlns:w="..." or xmlns:word="...", both would be correct XML and mean
the same.

Michael Whapples
On 18/10/2013 02:16, John J. Boyer wrote:
In dtbook.sem ad nimas.sem the action document is given to <book>
because it contains the actual docummennt. <dtbook> contains the head
node as well. So for the xml part of docx <body> must be given the
action document. The root element <document> gets no action. That is the
way things are set up.

John

On Thu, Oct 17, 2013 at 09:09:52PM +0100, Michael Whapples wrote:
No the document node should be w:wordDocument.

Looks like w:body is given the action no.

In Word XML w:body is like body in HTML, where the body of the document
will be found, so why should brl nodes appear outside that?

Also I notice two .sem files for Word documents, looks like one is for
when there is a XML namespace in the Word XML and the other is for
where
there is not. There seems to be differences between these (eg.
w:wordDocument has action no where as wordDocument has action
document).
Michael Whapples
On 17/10/2013 18:09, John J. Boyer wrote:
What semantic action is asigned to the w:body node? It should be
document. Lok at the docx.sem file in lbu_files .

John

On Thu, Oct 17, 2013 at 05:46:50PM +0100, Michael Whapples wrote:
Where can the brl nodes appear in a UTDML document?

We have a Word XML file which has been marked with UTDML and the last
brl node for the last page number appears outside the document body
(w:body) element. This looks strange and means that one has to get
the
parser to look outside the body of the document to get all Braille,
which is certainly sub-optimal as one will need to filter out much
unwanted stuff.

Is LibLouisUTDML working correctly in placing this brl node outside
the
body element?

Michael Whapples
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: