[liblouis-liblouisxml] Re: Python package for easy installation of liblouis - announcing Transcribo, a Braille type-setting system - feedback and help wanted

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Sun, 19 Jul 2009 16:22:58 +0100

I have had a thought how we could overcome this, I only thought of it after writing the below, but I am leaving it there as there's probably useful stuff. I will go away now and see if my thought works.


I guess as you say for windows most will just use official python binaries so only a dll for that could be shipped. If they have compiled it with a different size unicode then it might be a fair assumption they have a C compiler or know how to set one up. Also as you point out may be the dll isn't the bulkiest part of liblouis. Build processes on windows is certainly not my speciality so if anyone has a view as to which would be better (IE. ship all possibly required dll's, or compile for unusual cases) then please advise.

As for not finding issues with pyhyphen it could be as you suggested or is it like brltty which I think encodes the unicode into UTF-8 for communication between bindings and the C code so not being affected by the unicode size of python. Ideally this is how liblouis should be (providing I have understood brltty correctly). It also is probably worth pointing out the difference of UCS2 and UTF-16, UCS2 is a fixed length representation of unicode,but can only represent characters possible with 16-bits whereas UTF-16 is a variable length encoding, normally 16-bits for the lower characters but using 32-bits for the ones which require 32-bits. If I could guarantee how liblouis might respond when encountering characters only representable in 32-bits then I might suggest we avoid all this trouble of needing to pair the correct liblouis dll with the python unicode size by using UTF-16 for UCS2 builds of liblouis and UTF-32 for UCS4 builds of liblouis. Please also note with this, in python when specifying the encoding UTF-16 it adds byte order bytes at the beginning, liblouis doesn't use these so we would need to use an encoding which specifies the byte order eg. utf-16le or utf-16be.

Michael Whapples
On 19/07/09 13:14, Leo wrote:
All that makes a lot of sense. Similar issues could occur with regard to
32/64 bit windows versions. So might we end up with 4 packaged dll's?

Luckily, DLL's are independent of the Python version. In PyHyphen, I have
now included five win32 binaries vor Python 2.4 to 3.1. And admittedly I did
not pay attention to potential USC problems. So far, noone has complained
though which makes me optimistic that most people use the standard Python
distribution on Windows rather than compiling their own one with different
USC. Frankly: why should one do so? - So your point might luckily be not
very significant in practice. Anyway, the setup script should check that
USC's match rather than blindly installing the wrong DLL.

Another issue are the tables. They amount to 3 MByte in liblouis' source
distribution. In PyHyphen I have just included one dictionary plus a module
to download additional dictionaries as needed. An alternative might be to
have PyLouis ship with no tables at all and have the setup script download
some tables, say, from googlecode. This would clearly require pre-packaged
tables to be there.


-----Ursprüngliche Nachricht-----
Von: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx]Im Auftrag von Michael
Whapples
Gesendet: Sonntag, 19. Juli 2009 13:19
An: liblouis-liblouisxml@xxxxxxxxxxxxx
Betreff: [liblouis-liblouisxml] Re: AW: Re: Python package for easy
installation of liblouis - announcing Transcribo, a Braille type-setting
system - feedback and help wanted


I've just thought and remembered something which could be a big catch,
it is this UCS2 or UCS4 thing. This makes me think then it would be
better to use ./configure and make where possible. I believe ./configure
and make works for mingw but I don't know about it working for MSVC
(although mingw can produce output compatible with MSVC I believe).
However for most users on windows I imagine having a C compiler is not
usual, so may be a binary dll should be provided, although then we get
back to UCS2 and UCS4 again.

May be I should briefly say what the UCS2 and UCS4 problem is. Basically
python can be compiled for 16-bit or 32-bit unicode and so can liblouis
(16-bit unicode is UCS2 and 32-bit unicode is UCS4). Should we have a
16-bit unicode version of python then for the bindings to work we need
an UCS2 build of liblouis and if we have a 32-bit unicode version of
python then we need a version of liblouis compiled for UCS4, we cannot
have a mixture (IE. 16-bit python with ucs4 liblouis will not work and
neither will 32-bit python with UCS2). If we have one of those cases
where python and liblouis use different size unicode then at best output
from liblouis will be nonsense and I think in the worst case can lead to
a crash of python with no way that python apps can recover (I am not
quite sure if it is segmentation fault, but it is something just as
serious).

So my thought is if such a setup.py script is to be generated then we do
the following:
Provide a dll for the binary version of python on windows (IE. which
ever unicode size is used in the official python builds). We would
detect this by checking the platform and checking sys.maxunicode (which
is greater than 65536 if 32-bit unicode). We could provide a second dll
for the other unicode size, but this obviously starts increasing the
package size, or we could just try and compile.
If the platform is not windows, I believe the compile process is the
./configure and make procedure, so we could just do this. We can pass
the configure script the correct option for the unicode size of the
python being used (this again can be got by checking sys.maxunicode).

Also I would say the above should be considered a source package, I
don't think it would be possible to create a binary package (due to the
UCS2 and UCS4 problem).

The only other thing I will say is that certainly on Linux where there
are advanced paqckage management tools (such as apt on debian)
easy_install is a very basic and would not be considered a preferred
choice. Therefore users of linux probably will get liblouis via their
distributions package system and all the UCS2 and UCS4 are dealt with.
Also on debian liblouis and its bindings are packaged separately to give
users choice.

Michael Whapples

On 19/07/09 01:07, Leo wrote:
I haven't tried, and I hope I won't need to. If you knew how poor my
knowledge on C compilers is... but I think what you write is a very good
starting point.

Here are some further thoughts to increase confusion:

The whole thing has to be portable. So if the configure script runs on all
platforms with all compilers (eg. mingw and MSVC on Windows), there is
probably nothing to object against your distutils-free approach which is
easier to maintain as you rightly point out. A no-brainer would probably
call 'make' on Unix-like OS's and use the ready-made DLL on win32.
Perfectionists would probably use setuptools as it abstracts from all the
platform and compiler specificities. If liblouis' configure script does
that
job, I don't know. I would assume that smooth compiling with mingw and
MSVC
should be the perfectionist's bottom line. But others on this list are
much
better placed to judge this.

Leo

-----Ursprüngliche Nachricht-----
Von: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx]Im Auftrag von Michael
Whapples
Gesendet: Sonntag, 19. Juli 2009 01:28
An: liblouis-liblouisxml@xxxxxxxxxxxxx
Betreff: [liblouis-liblouisxml] Re: Python package for easy installation
of liblouis - announcing Transcribo, a Braille type-setting system -
feedback and help wanted


Further to my thoughts yesterday, I have now managed to do what you are
asking (well not quite yet for liblouis, but for a very, and I do mean
very, basic example of a shared library). There is another solution I
could guarantee to work for liblouis but may not sit well for a python
developer, you could always have the setup script execute the configure
script and make file.

Anyway back to getting setuptools to actually perform the compile.

So taking the C file mylib.c:

#include<mylib.h>
int addNums(int val1, int val2) {
       return val1 + val2;
}

and the header file mylib.h:

int addNums(int val1, int val2);

and then creating the setup.py script:

from setuptools import setup
from setuptools.extension import Library
setup(name="mylib",
       version="1.0",
       description="An example of compiling a library",
       ext_modules=[Library("mylib", ["mylib.c"], include_dirs=["."])]
)

Now run:

python setup.py build

You should find a shared object file in the build directory (look at the
output from the setup script to get the exact file name). I checked that
the shared object file worked as a proper shared object file by using
stypes in python to load it and use the addNums function.

I don't think this makes use of the build_clib step I mentioned
yesterday, I think setuptools.extension.Library is a replacement for an
Extension object which deals with stand alone C libraries and so
compilation happens as part of the build_ext step. Whether this has any
affect on needing to be careful about order extensions/libraries are
listed I don't know.

Also I am unsure whether defining all the compilation stuff like this in
setup.py is a good idea, IE. we would have two versions of the build
system, one using make the other using setuptools, and so both would
need maintaining and could get out of sync.

Does the above help at all?

Michael Whapples
On 17/07/09 22:36, Leo wrote:

Hello,

I am new to this list. So let me briefly explain who I am, why I've
joined
and what I want.

1. I am using Braille in different languages and contexts, mainly English
and German, simple text and music, both on refreshable displays and
paper.
None of the software transforming something into ready-to-emboss plain

text

appeals to me as it is either closed-source, costly, inflexible,

inaccurate,

complicated or a combination of these. Admittedly I haven't tried out
liblouisxml. But here, already the name is complicated and I anticipate
difficulties compiling it on Windows.

2. I like Python for its almost ideal combination of clear syntax,
conciseness, user-friendliness and speed. My first project has been
PyHyphen, a Python wrapper around a C library for multilingual
hyphenation
that is used, eg, in OpenOffice.org (see
http://pypi.python.org/pypi/PyHyphen/).

3. At some point I took a look at reStructuredText (rST), a light-weight,
extensible markup language that is predominantly used to write software
documentation, eg. for the entire Python distribution (see
http://en.wikipedia.org/wiki/ReStructuredText). reStructuredText is very
easy to learn, powerful and clear to read. I am convinced it could serve

as

an excellent input format for high-end Braille layout. Its features

include

sections, bullet and enumerated lists, definition lists, tables,

references

such as auto-numbered footnotes, tables of contents, bibliographical
information to name but a few. What's more, rST can be extended through
custom directives and so-called interpreted text roles. Hence, it seemed
possible to use rST to mark-up text such that the output back-end would

use

different Braille translators as required, eg. for text including math,
music etc.

4. The reference implementation to process rST sources is Docutils
(http://docutils.sourceforge.net/). It can generate HTML, LaTeX, Beamer,
pdf, OpenOffice and other output formats from rST sources. Why not
ready-to-emboss plain text?

5. So a few months ago I started Transcribo.
(Homepage: http://transcribo.berlios.de
    Download daily snapshots from the Mercurial repository at:
http://hg.berlios.de/repos/transcribo/archive/tip.tar.bz2

Transcribo is currently a plain text back-end for Docutils. However, its
three-tier architecture makes it open to other input formats such as

LaTeX,

odf, RTF, xml, plain text or whatever. The core of Transcribo is a

rendering

package that generates a tree structure of frames. A frame can be thought

of

as a freely placeable, rectangular area on paper. The frames API is

flexible

enough to represent all kinds of lists, tables, multiple columns,
centered
headings, and much more. Each frame may contain objects carrying content

of

any type. Each content object may be given dedicated translator
instances,
wrappers with or without hyphenation etc. In particular, Transcribo

supports

liblouis as a translator for content to populate frames. Finally, the
frame-tree representation of the input file is assembled to form a plain
text file.

The bridge between Docutils and the frame renderer (in Docutils

terminology

this is called a writer) supports a subset of reStructuredText. Current
features include headings, paragraphs, hyperlinks, emphasized text style,
multi-level bullet lists and enumerations. Adding new features is often a
matter of a few lines of code. Transcribo's renderer is configured
through
Python dictionaries. Future versions may prefer other formats such as
JSON
or xml. The Docutils writer is mainly configured using the Docutils
configuration system, i.e. a config file and command line options. But

this

is still somewhat rudimentary. However, a command line option to choose

the

default translator is already implemented.

6. While Transcribo works with various translators, liblouis is currently
the most important one as it supports so many languages and math.

7. Transcribo might benefit from some refinement, testing and bug-fixing
before the first public release. Also, I'd like to make sure that users

can

easily install liblouis. When I tried to install it, I had some problems:
- finding the dll which is not on googlecode. John kindly pointed me to

the

page.
- copying the dll manually into the Windows/system32 directory
- downloading the liblouis sources
- installing the Python bindings
- copying some tables to a reasonable place

8. I'd like to see liblouis on the Python package index (pypi) so it can

be

installed automatically using setuptools. To this end, the dll needs to
be
bundled with the Python bindings, some tables and the C sources. On Unix
systems, the sources would need to be compiled, on Win32, the dll needs
to
be installed, preferrably in the package directory rather than the
windows/system32 dir as users do not always have admin privileges. It

would

be just great if the Python gurus on this list could make an effort.
Clearly, I would help write the setup script, although I don't know off

hand

how to tell distutils to compile a shared library that is not a C

extension

module.

Also, I would welcome any feedback and/or help on Transcribo. There is a
mailing list (see the homepage). It is not yet in use though. So feel
free
to join.

Warm regards

Leo



For a description of the software and to download it go to
http://www.jjb-software.com


For a description of the software and to download it go to
http://www.jjb-software.com

For a description of the software and to download it go to
http://www.jjb-software.com

For a description of the software and to download it go to
http://www.jjb-software.com

For a description of the software and to download it go to
http://www.jjb-software.com

For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: