[liblouis-liblouisxml] Re: ctypes and bad trailing pad byte

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Wed, 03 Feb 2010 11:59:45 +0000

Hello,
This looks very puzzling and at the moment I don't know what is going on. I could give this a look at some point.

Another alternative, which has been in my mind for a little time, the python bindings aren't complete, I was considering completing them. Now I sort of have a view on which I prefer out of ctypes and cython, I probably would prefer cython although I could make it have precisely the same API as the current bindings. However if people would prefer me to complete the ctypes ones I could.

OK, I will leave it there for now as I need to get on but I will get back to this in the evening.

Michael Whapples
On 02/03/2010 11:09 AM, Christian Egli wrote:
Hi all

I've been plagued with segmentation faults in the liblouis Python
interface the last couple of days. Essentially I'm invoking the libxslt
extension that I wrote (see python/examples/liblouisxslt.py in svn). If
the xml that I pass in contains certain words in emphasis the Python
interpreter crashes with a segmentation fault. I instrumented the Python
interface a bit to get some more debugging information (see attached
file). When I invoke this it seems to work:

$ python louis_instrumented.py
1.8.0
{'tran_tables': ['de-ch-g2.ctb'], 'outbuf':<ctypes.c_wchar_Array_18 object at 
0xb78064f4>, 'inbuf': u'USA Today', 'outlen': c_long(18), 'typeform': 
'\x01\x01\x01\x01\x01\x01\x01\x01\x01', 'tablesString': 'de-ch-g2.ctb', 'mode': 0, 
'x': 'de-ch-g2.ctb', 'inlen': c_long(9)}
_>USA __TODA'Y
result: _>USA __TODA'Y

However if I invoke it with a python interpreter that has debugging
symbols enabled (configured with --pydebug) I get the following stack
trace:

$ python-dbg louis_instrumented.py
1.8.0
{'tran_tables': ['de-ch-g2.ctb'], 'outbuf':<ctypes.c_wchar_Array_18 object at 
0x9abf2d4>, 'inbuf': u'USA Today', 'outlen': c_long(18), 'typeform': 
'\x01\x01\x01\x01\x01\x01\x01\x01\x01', 'tablesString': 'de-ch-g2.ctb', 'mode': 0, 
'x': 'de-ch-g2.ctb', 'inlen': c_long(9)}
__>USA _TODA'Y
Debug memory block at address p=0x9ace5a8:
     41 bytes originally requested
     The 4 pad bytes at p-4 are FORBIDDENBYTE, as expected.
     The 4 pad bytes at tail=0x9ace5d1 are not all FORBIDDENBYTE (0xfb):
         at tail+0: 0x30 *** OUCH
         at tail+1: 0xfb
         at tail+2: 0xfb
         at tail+3: 0xfb
     The block was made by call #26767 to debug malloc/realloc.
     Data at p: 00 00 00 00 00 00 00 00 ... 30 30 30 30 30 30 30 30
Fatal Python error: bad trailing pad byte
Aborted

There are two things to notice here:

1) When invoking with the Python interpreter that has debugging symbols
the translation is different (__>USA _TODA'Y instead of _>USA __TODA'Y).
I have no idea why this could be.

2) The crash seems to happen in the return statement of translateString
where the characters __>USA _TODA'Y are packed up and returned to the
caller, because the print statement at the caller (in the main block) is
not executed. Is there something in the ctypes definition that we need
to specify that a Unicode string is returned from translateString? Or is
this maybe a problem with UCS2 and UCS4 when building the two versions
of the Python interpreter.

I'm honestly a bit stumped.


For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: