[liblouis-liblouisxml] Re: liblouis treats valid UTF-8 sequences as invalid

From: James Teh <jamie@xxxxxxxxxxxx>
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Date: Mon, 10 Sep 2012 13:25:06 +1000

Do you have a reference for the UTF-8 parsing algorithm you used? Idon't understand the code, so debugging it is tricky.


Jamie

On 10/09/2012 1:03 PM, John J. Boyer wrote:

I think this will have to be traced with the Windows debugger. There is
no error in Linux. Use compileError as the breakpoint. The loop appears
to be going too far.

John

On Mon, Sep 10, 2012 at 09:30:06AM +1000, James Teh wrote:

Hi all,

When I try to use the UEBC-g1.utb table in Windows, I get the following
problems:
uebc-g1.utb:309: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:309: error: Character '\x0017' is not defined
uebc-g1.utb:310: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:310: error: Character '\x0017' is not defined
uebc-g1.utb:311: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:311: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:311: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:311: error: Character '\x0007' is not defined
uebc-g1.utb:312: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:312: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:312: warning: invalid UTF-8. Assuming Latin-1.
uebc-g1.utb:312: error: Character '\x0007' is not defined
8 warnings issued
4 errors found.

Lines 309 and 310 are for the × character, while 311 and 312 are for the
÷ character. The file is correctly UTF-8 encoded; × is encoded as
\xc3\x97 and ÷ is encoded as \xc3\xb7, which are both correct.

Any ideas as to what's going on here?

Thanks,
Jamie

--
James Teh
Director, NV Access Limited
Email: jamie@xxxxxxxxxxxx
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com


--
James Teh
Director, NV Access Limited
Email: jamie@xxxxxxxxxxxx
Web site: http://www.nvaccess.org/
Phone: +61 7 5667 8372
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Follow-Ups:
- [liblouis-liblouisxml] Re: liblouis treats valid UTF-8 sequences as invalid
  - From: John J. Boyer

References:
- [liblouis-liblouisxml] liblouis treats valid UTF-8 sequences as invalid
  - From: James Teh
- [liblouis-liblouisxml] Re: liblouis treats valid UTF-8 sequences as invalid
  - From: John J. Boyer

[liblouis-liblouisxml] Re: liblouis treats valid UTF-8 sequences as invalid

Other related posts: