[liblouis-liblouisxml] Re: Help! C programmers

  • From: Michael Whapples <mwhapples@xxxxxxx>
  • To: liblouis-liblouisxml@xxxxxxxxxxxxx
  • Date: Tue, 06 Aug 2013 12:01:03 +0100

I understand widechar to be a type so that unicode characters can be used. I believe the definition of widechar you cite is for when its using the UCS-2 encoding (NOTE: I have never got an actual answer to whether this is UCS-2 or UTF16, the latter being able to have characters which are above \xffff as it is not a fixed length encoding).


Just take that definition to define how many bytes make up widechar characters, the type its defining against has nothing for the meaning of widechar. When I have to deal with it (normally from other languages such as python or Java), I normally treat it as bytes, remembering that the byte array is inlen/outlen (whichever buffer I am dealing with) multiplied by sizeof(widechar) and then using the correct encoding to create a proper unicode object from the byte array (utf-16 for sizeof(widechar) == 2 and utf-32 for sizeof(widechar) == 4). Official liblouis python bindings just assume that liblouis and python have the same size definition for unicode characters and just cast straight between the two types, I feel an unsafe way of doing it as the two can easily be compiled with different unicode sizes and thus I feel worth checking and explicitly doing the encoding as the types are defined.

Possibly if one were using C++ then one should use wchar for defining widechar (if one felt that just using wchar as a type is not OK). However liblouis is C and not C++ so I believe wchar is not available.

Michael Whapples
On 06/08/2013 11:39, Paul wood wrote:
Hi guys,
I'm trying to truncate the running header at a space so we don't get half a word or in a test case we just got a capitol sign! Anyway I've written a routine to truncate at the last space before the truncate length but then I look at the code and the 'string' definition refers to widechar, but when I find the definition of widechar at line 33 of liblouis.h it says:
#define widechar unsigned short int

How can a string be an int!

Here is my test routine in case it helps.
Thanks
Paul

#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] = "This is t h ";
  char * pch;
  char str1[40];
  int lenstr1;
  int length=10;
  printf ("Looking for the 'space' character in \"%s\"...\n",str);
  pch=str;
  lenstr1=strlen(str);
  while (pch!=NULL && (pch-str)<=length)
  {
   strncpy ( str1, str, pch-str ); /* Save previous string */
   lenstr1=(pch-str); /* Save previous length of string */
   pch=strchr(pch+1,' '); /* Find next space */
    printf ("found at %d\n pch is %s\n",pch-str+1,pch);

  }
  if (lenstr1==0) { /* i.e. Not found any spaces */
      strncpy ( str1, str, strlen(str) );
      lenstr1=strlen(str);
  }
  str1[lenstr1] = '\0'; /* add Null as required */
  printf("string is:\"%s\"\n",str1);
  return 0;
}
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: