Hello,This looks very puzzling and at the moment I don't know what is going on. I could give this a look at some point.
Another alternative, which has been in my mind for a little time, the python bindings aren't complete, I was considering completing them. Now I sort of have a view on which I prefer out of ctypes and cython, I probably would prefer cython although I could make it have precisely the same API as the current bindings. However if people would prefer me to complete the ctypes ones I could.
OK, I will leave it there for now as I need to get on but I will get back to this in the evening.
Michael Whapples On 02/03/2010 11:09 AM, Christian Egli wrote:
Hi all I've been plagued with segmentation faults in the liblouis Python interface the last couple of days. Essentially I'm invoking the libxslt extension that I wrote (see python/examples/liblouisxslt.py in svn). If the xml that I pass in contains certain words in emphasis the Python interpreter crashes with a segmentation fault. I instrumented the Python interface a bit to get some more debugging information (see attached file). When I invoke this it seems to work: $ python louis_instrumented.py 1.8.0 {'tran_tables': ['de-ch-g2.ctb'], 'outbuf':<ctypes.c_wchar_Array_18 object at 0xb78064f4>, 'inbuf': u'USA Today', 'outlen': c_long(18), 'typeform': '\x01\x01\x01\x01\x01\x01\x01\x01\x01', 'tablesString': 'de-ch-g2.ctb', 'mode': 0, 'x': 'de-ch-g2.ctb', 'inlen': c_long(9)} _>USA __TODA'Y result: _>USA __TODA'Y However if I invoke it with a python interpreter that has debugging symbols enabled (configured with --pydebug) I get the following stack trace: $ python-dbg louis_instrumented.py 1.8.0 {'tran_tables': ['de-ch-g2.ctb'], 'outbuf':<ctypes.c_wchar_Array_18 object at 0x9abf2d4>, 'inbuf': u'USA Today', 'outlen': c_long(18), 'typeform': '\x01\x01\x01\x01\x01\x01\x01\x01\x01', 'tablesString': 'de-ch-g2.ctb', 'mode': 0, 'x': 'de-ch-g2.ctb', 'inlen': c_long(9)} __>USA _TODA'Y Debug memory block at address p=0x9ace5a8: 41 bytes originally requested The 4 pad bytes at p-4 are FORBIDDENBYTE, as expected. The 4 pad bytes at tail=0x9ace5d1 are not all FORBIDDENBYTE (0xfb): at tail+0: 0x30 *** OUCH at tail+1: 0xfb at tail+2: 0xfb at tail+3: 0xfb The block was made by call #26767 to debug malloc/realloc. Data at p: 00 00 00 00 00 00 00 00 ... 30 30 30 30 30 30 30 30 Fatal Python error: bad trailing pad byte Aborted There are two things to notice here: 1) When invoking with the Python interpreter that has debugging symbols the translation is different (__>USA _TODA'Y instead of _>USA __TODA'Y). I have no idea why this could be. 2) The crash seems to happen in the return statement of translateString where the characters __>USA _TODA'Y are packed up and returned to the caller, because the print statement at the caller (in the main block) is not executed. Is there something in the ctypes definition that we need to specify that a Unicode string is returned from translateString? Or is this maybe a problem with UCS2 and UCS4 when building the two versions of the Python interpreter. I'm honestly a bit stumped.
For a description of the software and to download it go to http://www.jjb-software.com