[liblouis-liblouisxml] Re: Python bindings and output buffer size for lou_translate*

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Wed, 04 Aug 2010 02:59:14 +0100

Hello,
While memory isn't a huge concern I just like being as efficient as possible. Also could memory be a concern for those using liblouis in embedded devices (although there may be a question of why they would be using python in such a case).

My personal view would be go for something like 12 times but allow the ratio to be configured as desired as it adds very little in complexity (eg. in the python case it is the difference of having the ratio as module public variable or module level private, module level seems most sensible to me as the ratio may be needed in more than one function). An example of what I am meaning is attached (this patch only does the translate function it would be easy enough to do the other functions).

I don't like the set it to a large value for the output buffer with no relation to input buffer as what happens in the case someone chucks a huge amount of input at liblouis? How do you decide what is really big enough?

As for jlouis, currently sets the ratio to 2 times as that was what everyone else seemed to do, but I am probably going to go with the large default ratio but configurable if desired option.

Michael Whapples
On 04/08/10 02:21, James Teh wrote:
The original reason that this arose is now somewhat mitigated by the new undefined opcode, but nevertheless, I think it should probably be resolved. So:

On 27/07/2010 8:32 PM, Michael Whapples wrote:
Firstly I have seen later messages in this thread and agree that it
would not be natural for python programs to have to specify the buffer
sizes.
Michael, how does jlouis handle this inlen/outlen situation? John, how about your JNI bindings? As I understand it, it's not really natural to specify maximum buffer sizes in either language.

So, who is using these bindings? There are three options:
1. Raise the hard-coded output buffer multiplier from 2 to, say, 12.
2. Allow the user to change the output buffer multiplier.
3. Use a hard-coded static output buffer size of, say, 1024. According to John, this is how liblouisxml does it, though I don't know what size it uses.
4. Allow the static output buffer size to be changed by the user.
If (2) or (4), should this be via a module variable or a keyword argument to each function?

To be honest, I'm leaning towards either (1) or (3). Memory just isn't so much of an issue these days, especially for something which will be freed once complete.

Jamie


Index: python/louis/__init__.py.in
===================================================================
--- python/louis/__init__.py.in (revision 370)
+++ python/louis/__init__.py.in (working copy)
@@ -41,6 +41,9 @@
 except NameError:
     # Unix/Cygwin
     _loader = cdll
+
+bufferRatio = 12 
+
 liblouis = _loader["###LIBLOUIS_SONAME###"]
 
 atexit.register(liblouis.lou_free)
@@ -102,7 +105,7 @@
     tablesString = ",".join([str(x) for x in tran_tables])
     inbuf = unicode(inbuf)
     inlen = c_int(len(inbuf))
-    outlen = c_int(inlen.value*2)
+    outlen = c_int(inlen.value*bufferRatio)
     outbuf = create_unicode_buffer(outlen.value)
     typeformbuf = None
     if typeform:

Other related posts: