[liblouis-liblouisxml] Re: ctypes and bad trailing pad byte

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Wed, 03 Feb 2010 22:56:33 +0000

Hello,
I've had a look and am concluding it might be a python issue. Are the segmentation faults when using python without debug symbols or only when using a debug build?

Using the file you attach I do observe the same results as you. The problem seems to be at the point the translateString function returns (I tried a minimal version of this in an interactive python session and that's when the debug messages appear).

Here are some thoughts of what I thought may have been going on but ruled out: * Could outbuf.value be returning its full length (18) and so some characters being returned which shouldn't be (actual length of the unicode is 14). Confirmed the right number of characters is there before the return. * Could it be some problem with references to an object. Confirmed it isn't by creating a separate unicode object of outbuf.value and returning the new unicode object. * Just confirmed even more to the point the previous wasn't the case as I converted outbuf.value to an encoded string, certainly a different object as a different type (str). * Could there be some odd time issue (not fully sure but might things be happening too fast?), added a time.sleep(10) call after the ctypes call to liblouis but still getting the problem. Shouldn't be a too slow issue as all the references are still valid, and a print of outbuf.value is fine after the time.sleep call in the translateString function.

All this leads to me narrowing it to the line for return and the assignment on the line making the call to translateString and both of those are very standard core python and are so simple I can't see how its be done wrong in the file you attached.

Also I have tried this on python2.5 and python2.6 in debian, both lead to the same result so don't seem to indicate the problem being in one specific version of python.

Michael Whapples
On 02/03/2010 11:09 AM, Christian Egli wrote:
Hi all

I've been plagued with segmentation faults in the liblouis Python
interface the last couple of days. Essentially I'm invoking the libxslt
extension that I wrote (see python/examples/liblouisxslt.py in svn). If
the xml that I pass in contains certain words in emphasis the Python
interpreter crashes with a segmentation fault. I instrumented the Python
interface a bit to get some more debugging information (see attached
file). When I invoke this it seems to work:

$ python louis_instrumented.py
1.8.0
{'tran_tables': ['de-ch-g2.ctb'], 'outbuf':<ctypes.c_wchar_Array_18 object at 
0xb78064f4>, 'inbuf': u'USA Today', 'outlen': c_long(18), 'typeform': 
'\x01\x01\x01\x01\x01\x01\x01\x01\x01', 'tablesString': 'de-ch-g2.ctb', 'mode': 0, 
'x': 'de-ch-g2.ctb', 'inlen': c_long(9)}
_>USA __TODA'Y
result: _>USA __TODA'Y

However if I invoke it with a python interpreter that has debugging
symbols enabled (configured with --pydebug) I get the following stack
trace:

$ python-dbg louis_instrumented.py
1.8.0
{'tran_tables': ['de-ch-g2.ctb'], 'outbuf':<ctypes.c_wchar_Array_18 object at 
0x9abf2d4>, 'inbuf': u'USA Today', 'outlen': c_long(18), 'typeform': 
'\x01\x01\x01\x01\x01\x01\x01\x01\x01', 'tablesString': 'de-ch-g2.ctb', 'mode': 0, 
'x': 'de-ch-g2.ctb', 'inlen': c_long(9)}
__>USA _TODA'Y
Debug memory block at address p=0x9ace5a8:
     41 bytes originally requested
     The 4 pad bytes at p-4 are FORBIDDENBYTE, as expected.
     The 4 pad bytes at tail=0x9ace5d1 are not all FORBIDDENBYTE (0xfb):
         at tail+0: 0x30 *** OUCH
         at tail+1: 0xfb
         at tail+2: 0xfb
         at tail+3: 0xfb
     The block was made by call #26767 to debug malloc/realloc.
     Data at p: 00 00 00 00 00 00 00 00 ... 30 30 30 30 30 30 30 30
Fatal Python error: bad trailing pad byte
Aborted

There are two things to notice here:

1) When invoking with the Python interpreter that has debugging symbols
the translation is different (__>USA _TODA'Y instead of _>USA __TODA'Y).
I have no idea why this could be.

2) The crash seems to happen in the return statement of translateString
where the characters __>USA _TODA'Y are packed up and returned to the
caller, because the print statement at the caller (in the main block) is
not executed. Is there something in the ctypes definition that we need
to specify that a Unicode string is returned from translateString? Or is
this maybe a problem with UCS2 and UCS4 when building the two versions
of the Python interpreter.

I'm honestly a bit stumped.


For a description of the software and to download it go to
http://www.jjb-software.com

Other related posts: