[liblouis-liblouisxml] Re: Python package for easy installation of liblouis - announcing Transcribo, a Braille type-setting system - feedback and help wanted

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Sun, 19 Jul 2009 16:43:33 +0100

OK, I will actually be less mysterious about what my thought was, it would actually require some alterations to the bindings to work and I am unsure whether people would agree to such a change.


Essentially what I was thinking of is the setup.py script would either use an UCS4 dll of liblouis on windows or compile an UCS4 version on other platforms. Then the python bindings would rather than pass in the actual unicode, it would create a string with the unicode encoded as UTF32 (eg. unicode.encode(input, "UTF-32") ).

The above seemed such a good plan until just now when I went to check the UTF-32 encoding in python, my version doesn't have UTF-32, it has UTF7, UTF8 and UTF-16 but not UTF-32.

OK, so it seems we are back to needing to work out which unicode size is used in python and then compiling or copying dll's as needed.

Michael Whapples
On 19/07/09 16:22, Michael Whapples wrote:
I have had a thought how we could overcome this, I only thought of it after writing the below, but I am leaving it there as there's probably useful stuff. I will go away now and see if my thought works.

I guess as you say for windows most will just use official python binaries so only a dll for that could be shipped. If they have compiled it with a different size unicode then it might be a fair assumption they have a C compiler or know how to set one up. Also as you point out may be the dll isn't the bulkiest part of liblouis. Build processes on windows is certainly not my speciality so if anyone has a view as to which would be better (IE. ship all possibly required dll's, or compile for unusual cases) then please advise.

As for not finding issues with pyhyphen it could be as you suggested or is it like brltty which I think encodes the unicode into UTF-8 for communication between bindings and the C code so not being affected by the unicode size of python. Ideally this is how liblouis should be (providing I have understood brltty correctly). It also is probably worth pointing out the difference of UCS2 and UTF-16, UCS2 is a fixed length representation of unicode,but can only represent characters possible with 16-bits whereas UTF-16 is a variable length encoding, normally 16-bits for the lower characters but using 32-bits for the ones which require 32-bits. If I could guarantee how liblouis might respond when encountering characters only representable in 32-bits then I might suggest we avoid all this trouble of needing to pair the correct liblouis dll with the python unicode size by using UTF-16 for UCS2 builds of liblouis and UTF-32 for UCS4 builds of liblouis. Please also note with this, in python when specifying the encoding UTF-16 it adds byte order bytes at the beginning, liblouis doesn't use these so we would need to use an encoding which specifies the byte order eg. utf-16le or utf-16be.

Michael Whapples

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: