[liblouis-liblouisxml] Re: Python bindings and output buffer size for lou_translate*

From: Michael Whapples <mwhapples@xxxxxxx>
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Date: Tue, 27 Jul 2010 11:32:36 +0100

Hello,

Firstly I have seen later messages in this thread and agree that itwould not be natural for python programs to have to specify the buffersizes. Anyway getting a python program to specify the buffer sizesdoesn't really solve the problem, it only moves it to all pythonprograms using the python bindings.

So this leaves the task of working out what the ratio should be? I don'tlike the suggestion of try with one ratio, if translation fails retrywith a larger ratio until it succeeds, is there a situation wheretranslation may fail for another reason and how would such a system ofsetting the ratio catch that? OK, I guess we could set an upper limit ofthe ratio for which the bindings will decide the translation is failingfor another reason if it reaches the limit. I would agree this seems toadd complexity to the code and slow things down, so avoided if possible.

Setting the ratio to 8 times seems a bit drastic, and it would need tobe higher if using 32-bit unicode, most of the time I doubt you would begoing anywhere near that sort of ratio. I get the feeling the answer forwhat ratio is needed actually depends on what sort of translation isbeing done (IE. You are much more likely to need 8 times if onlytranslating a character or two but you are probably going to be finewith 2 or 4 when doing longer strings of text). So may be the answer ishave the ratio at a level which should be fine for over 90% of uses butmake the ratio value configurable so that the few who need somethingdifferent can set it appropriately (IE. an application doing lots ofsmall translations may have the line

louis.bufferRatio = 8

). My assumption in this is that a long translation is unlikely to haveall its characters not known in the table but a short one is more likelyas one character is a higher percentage of the translation.


Michael Whapples
On 27/07/10 03:48, James Teh wrote:

Hi all,
For lou_translate* in the Python bindings, we've made an assumptionthat outlen should be 2 * inlen. However, this assumption is verywrong if there are characters in the input which aren't defined in thespecified tables. In the case of undefined characters, the output is"'\xnnnn'" for 16 bit unicode characters, which means that 1 inputchar becomes 8 chars in the output. Assuming that no one does anythingridiculous in tables, this means that an outlen which is 8 * inlenshould cover the worst case scenario. I'd like to change the Pythonbindings to do this and suggest that perhaps the documentation shouldbe updated with a similar guideline.
Note that this does not cover 32 bit unicode characters. I guess it'spossible that the bindings might be used on a 32 bit system. In thiscase, the worst case scenario will be outlen = 12 * inlen.
An alternative is to keep checking whether translation wasn'tcompleted (i.e. inlen is less than its original value) and thenincrease outlen if so, probably multiplying outlen by 2 each time.However, although this is probably rare, it increases code complexityand is quite expensive, since you have to keep re-translating thestring in its entirety until it completes.
What do people think?

Jamie


For a description of the software and to download it go to
http://www.jjb-software.com

Follow-Ups:
- [liblouis-liblouisxml] Re: Python bindings and output buffer size for lou_translate*
  - From: James Teh

References:
- [liblouis-liblouisxml] Python bindings and output buffer size for lou_translate*
  - From: James Teh

[liblouis-liblouisxml] Re: Python bindings and output buffer size for lou_translate*

Other related posts: