I have had a thought how we could overcome this, I only thought of it after writing the below, but I am leaving it there as there's probably useful stuff. I will go away now and see if my thought works.
I guess as you say for windows most will just use official python binaries so only a dll for that could be shipped. If they have compiled it with a different size unicode then it might be a fair assumption they have a C compiler or know how to set one up. Also as you point out may be the dll isn't the bulkiest part of liblouis. Build processes on windows is certainly not my speciality so if anyone has a view as to which would be better (IE. ship all possibly required dll's, or compile for unusual cases) then please advise.
As for not finding issues with pyhyphen it could be as you suggested or is it like brltty which I think encodes the unicode into UTF-8 for communication between bindings and the C code so not being affected by the unicode size of python. Ideally this is how liblouis should be (providing I have understood brltty correctly). It also is probably worth pointing out the difference of UCS2 and UTF-16, UCS2 is a fixed length representation of unicode,but can only represent characters possible with 16-bits whereas UTF-16 is a variable length encoding, normally 16-bits for the lower characters but using 32-bits for the ones which require 32-bits. If I could guarantee how liblouis might respond when encountering characters only representable in 32-bits then I might suggest we avoid all this trouble of needing to pair the correct liblouis dll with the python unicode size by using UTF-16 for UCS2 builds of liblouis and UTF-32 for UCS4 builds of liblouis. Please also note with this, in python when specifying the encoding UTF-16 it adds byte order bytes at the beginning, liblouis doesn't use these so we would need to use an encoding which specifies the byte order eg. utf-16le or utf-16be.
Michael Whapples On 19/07/09 13:14, Leo wrote:
All that makes a lot of sense. Similar issues could occur with regard to 32/64 bit windows versions. So might we end up with 4 packaged dll's? Luckily, DLL's are independent of the Python version. In PyHyphen, I have now included five win32 binaries vor Python 2.4 to 3.1. And admittedly I did not pay attention to potential USC problems. So far, noone has complained though which makes me optimistic that most people use the standard Python distribution on Windows rather than compiling their own one with different USC. Frankly: why should one do so? - So your point might luckily be not very significant in practice. Anyway, the setup script should check that USC's match rather than blindly installing the wrong DLL. Another issue are the tables. They amount to 3 MByte in liblouis' source distribution. In PyHyphen I have just included one dictionary plus a module to download additional dictionaries as needed. An alternative might be to have PyLouis ship with no tables at all and have the setup script download some tables, say, from googlecode. This would clearly require pre-packaged tables to be there. -----Ursprüngliche Nachricht----- Von: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx [mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx]Im Auftrag von Michael Whapples Gesendet: Sonntag, 19. Juli 2009 13:19 An: liblouis-liblouisxml@xxxxxxxxxxxxx Betreff: [liblouis-liblouisxml] Re: AW: Re: Python package for easy installation of liblouis - announcing Transcribo, a Braille type-setting system - feedback and help wanted I've just thought and remembered something which could be a big catch, it is this UCS2 or UCS4 thing. This makes me think then it would be better to use ./configure and make where possible. I believe ./configure and make works for mingw but I don't know about it working for MSVC (although mingw can produce output compatible with MSVC I believe). However for most users on windows I imagine having a C compiler is not usual, so may be a binary dll should be provided, although then we get back to UCS2 and UCS4 again. May be I should briefly say what the UCS2 and UCS4 problem is. Basically python can be compiled for 16-bit or 32-bit unicode and so can liblouis (16-bit unicode is UCS2 and 32-bit unicode is UCS4). Should we have a 16-bit unicode version of python then for the bindings to work we need an UCS2 build of liblouis and if we have a 32-bit unicode version of python then we need a version of liblouis compiled for UCS4, we cannot have a mixture (IE. 16-bit python with ucs4 liblouis will not work and neither will 32-bit python with UCS2). If we have one of those cases where python and liblouis use different size unicode then at best output from liblouis will be nonsense and I think in the worst case can lead to a crash of python with no way that python apps can recover (I am not quite sure if it is segmentation fault, but it is something just as serious). So my thought is if such a setup.py script is to be generated then we do the following: Provide a dll for the binary version of python on windows (IE. which ever unicode size is used in the official python builds). We would detect this by checking the platform and checking sys.maxunicode (which is greater than 65536 if 32-bit unicode). We could provide a second dll for the other unicode size, but this obviously starts increasing the package size, or we could just try and compile. If the platform is not windows, I believe the compile process is the ./configure and make procedure, so we could just do this. We can pass the configure script the correct option for the unicode size of the python being used (this again can be got by checking sys.maxunicode). Also I would say the above should be considered a source package, I don't think it would be possible to create a binary package (due to the UCS2 and UCS4 problem). The only other thing I will say is that certainly on Linux where there are advanced paqckage management tools (such as apt on debian) easy_install is a very basic and would not be considered a preferred choice. Therefore users of linux probably will get liblouis via their distributions package system and all the UCS2 and UCS4 are dealt with. Also on debian liblouis and its bindings are packaged separately to give users choice. Michael Whapples On 19/07/09 01:07, Leo wrote:I haven't tried, and I hope I won't need to. If you knew how poor my knowledge on C compilers is... but I think what you write is a very good starting point. Here are some further thoughts to increase confusion: The whole thing has to be portable. So if the configure script runs on all platforms with all compilers (eg. mingw and MSVC on Windows), there is probably nothing to object against your distutils-free approach which is easier to maintain as you rightly point out. A no-brainer would probably call 'make' on Unix-like OS's and use the ready-made DLL on win32. Perfectionists would probably use setuptools as it abstracts from all the platform and compiler specificities. If liblouis' configure script doesthatjob, I don't know. I would assume that smooth compiling with mingw andMSVCshould be the perfectionist's bottom line. But others on this list aremuchbetter placed to judge this. Leo -----Ursprüngliche Nachricht----- Von: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx [mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx]Im Auftrag von Michael Whapples Gesendet: Sonntag, 19. Juli 2009 01:28 An: liblouis-liblouisxml@xxxxxxxxxxxxx Betreff: [liblouis-liblouisxml] Re: Python package for easy installation of liblouis - announcing Transcribo, a Braille type-setting system - feedback and help wanted Further to my thoughts yesterday, I have now managed to do what you are asking (well not quite yet for liblouis, but for a very, and I do mean very, basic example of a shared library). There is another solution I could guarantee to work for liblouis but may not sit well for a python developer, you could always have the setup script execute the configure script and make file. Anyway back to getting setuptools to actually perform the compile. So taking the C file mylib.c: #include<mylib.h> int addNums(int val1, int val2) { return val1 + val2; } and the header file mylib.h: int addNums(int val1, int val2); and then creating the setup.py script: from setuptools import setup from setuptools.extension import Library setup(name="mylib", version="1.0", description="An example of compiling a library", ext_modules=[Library("mylib", ["mylib.c"], include_dirs=["."])] ) Now run: python setup.py build You should find a shared object file in the build directory (look at the output from the setup script to get the exact file name). I checked that the shared object file worked as a proper shared object file by using stypes in python to load it and use the addNums function. I don't think this makes use of the build_clib step I mentioned yesterday, I think setuptools.extension.Library is a replacement for an Extension object which deals with stand alone C libraries and so compilation happens as part of the build_ext step. Whether this has any affect on needing to be careful about order extensions/libraries are listed I don't know. Also I am unsure whether defining all the compilation stuff like this in setup.py is a good idea, IE. we would have two versions of the build system, one using make the other using setuptools, and so both would need maintaining and could get out of sync. Does the above help at all? Michael Whapples On 17/07/09 22:36, Leo wrote:Hello, I am new to this list. So let me briefly explain who I am, why I'vejoinedand what I want. 1. I am using Braille in different languages and contexts, mainly English and German, simple text and music, both on refreshable displays andpaper.None of the software transforming something into ready-to-emboss plaintextappeals to me as it is either closed-source, costly, inflexible,inaccurate,complicated or a combination of these. Admittedly I haven't tried out liblouisxml. But here, already the name is complicated and I anticipate difficulties compiling it on Windows. 2. I like Python for its almost ideal combination of clear syntax, conciseness, user-friendliness and speed. My first project has been PyHyphen, a Python wrapper around a C library for multilingualhyphenationthat is used, eg, in OpenOffice.org (see http://pypi.python.org/pypi/PyHyphen/). 3. At some point I took a look at reStructuredText (rST), a light-weight, extensible markup language that is predominantly used to write software documentation, eg. for the entire Python distribution (see http://en.wikipedia.org/wiki/ReStructuredText). reStructuredText is very easy to learn, powerful and clear to read. I am convinced it could serveasan excellent input format for high-end Braille layout. Its featuresincludesections, bullet and enumerated lists, definition lists, tables,referencessuch as auto-numbered footnotes, tables of contents, bibliographical information to name but a few. What's more, rST can be extended through custom directives and so-called interpreted text roles. Hence, it seemed possible to use rST to mark-up text such that the output back-end wouldusedifferent Braille translators as required, eg. for text including math, music etc. 4. The reference implementation to process rST sources is Docutils (http://docutils.sourceforge.net/). It can generate HTML, LaTeX, Beamer, pdf, OpenOffice and other output formats from rST sources. Why not ready-to-emboss plain text? 5. So a few months ago I started Transcribo. (Homepage: http://transcribo.berlios.de Download daily snapshots from the Mercurial repository at: http://hg.berlios.de/repos/transcribo/archive/tip.tar.bz2 Transcribo is currently a plain text back-end for Docutils. However, its three-tier architecture makes it open to other input formats such asLaTeX,odf, RTF, xml, plain text or whatever. The core of Transcribo is arenderingpackage that generates a tree structure of frames. A frame can be thoughtofas a freely placeable, rectangular area on paper. The frames API isflexibleenough to represent all kinds of lists, tables, multiple columns,centeredheadings, and much more. Each frame may contain objects carrying contentofany type. Each content object may be given dedicated translatorinstances,wrappers with or without hyphenation etc. In particular, Transcribosupportsliblouis as a translator for content to populate frames. Finally, the frame-tree representation of the input file is assembled to form a plain text file. The bridge between Docutils and the frame renderer (in Docutilsterminologythis is called a writer) supports a subset of reStructuredText. Current features include headings, paragraphs, hyperlinks, emphasized text style, multi-level bullet lists and enumerations. Adding new features is often a matter of a few lines of code. Transcribo's renderer is configuredthroughPython dictionaries. Future versions may prefer other formats such asJSONor xml. The Docutils writer is mainly configured using the Docutils configuration system, i.e. a config file and command line options. Butthisis still somewhat rudimentary. However, a command line option to choosethedefault translator is already implemented. 6. While Transcribo works with various translators, liblouis is currently the most important one as it supports so many languages and math. 7. Transcribo might benefit from some refinement, testing and bug-fixing before the first public release. Also, I'd like to make sure that userscaneasily install liblouis. When I tried to install it, I had some problems: - finding the dll which is not on googlecode. John kindly pointed me tothepage. - copying the dll manually into the Windows/system32 directory - downloading the liblouis sources - installing the Python bindings - copying some tables to a reasonable place 8. I'd like to see liblouis on the Python package index (pypi) so it canbeinstalled automatically using setuptools. To this end, the dll needs tobebundled with the Python bindings, some tables and the C sources. On Unix systems, the sources would need to be compiled, on Win32, the dll needstobe installed, preferrably in the package directory rather than the windows/system32 dir as users do not always have admin privileges. Itwouldbe just great if the Python gurus on this list could make an effort. Clearly, I would help write the setup script, although I don't know offhandhow to tell distutils to compile a shared library that is not a Cextensionmodule. Also, I would welcome any feedback and/or help on Transcribo. There is a mailing list (see the homepage). It is not yet in use though. So feelfreeto join. Warm regards Leo For a description of the software and to download it go to http://www.jjb-software.comFor a description of the software and to download it go to http://www.jjb-software.com For a description of the software and to download it go to http://www.jjb-software.comFor a description of the software and to download it go to http://www.jjb-software.com For a description of the software and to download it go to http://www.jjb-software.com
For a description of the software and to download it go to http://www.jjb-software.com