Well, I agree that the input is incorrect. The problem is how to handle it in
liblouis and liblouisutdml to get something more readablee
than
'x\0080''\x0091'
On Mon, Nov 23, 2020 at 11:59:32AM +0000, James Bowden wrote:
Hi John,
No.
If you're getting 0x0080 or 0x009x characters, this is incorrect and invalid
input.
If it is the right single quote, which often doubles as an apostrophe these
days, the correct Unicode value is U+2019.
Encoding this as UTF-8, it is 0xE2 0x80 0x99.
So, if you are truly exporting as UTF-8, you would get the correct sign in
Liblouis; even if your encoding of UTF-8 is bad, you'd get three unrelated
characters.
My suspicion is that you are exporting as ANSI encoding and nothing is
converting this to Unicode.
Solution: open your text file in Notepad; Do a Save As and choose UTF-8
encoding. Then translate.
I trust this helps.
With best regards,
James.
From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx
<liblouis-liblouisxml-bounce@xxxxxxxxxxxxx> On Behalf Of John J. Boyer
Sent: 23 November 2020 11:49
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Re: [EXTERNAL] Untranslated Unicode
characters in UEB
CAUTION: External. Do not click links or open attachments unless you know the
content is safe.
________________________________
Hello James,
I think Adobe Acrobat Reader does save texts in UF-8. Liblouis understands
that. 0x0080 seems to be some sort of prefix.
I think the UEB tables should be modified to ignore 0x0080 and translate the
various 0x009h characters appropriately.
John
On Mon, Nov 23, 2020 at 09:23:11AM +0000, James Bowden wrote:
Hi John,
To add to what Neil correctly suggests:
Assuming you are using Windows Code Page 1252 (yes, remember those things),
0x80 is likely to be a Euro sign.
0x9h is not a valid character, but could be:
0x91: left single quote
0x92: right single quote
0x93: left double quote
0x94: right double quote
... and there are several others.
Try saving your PDF file as UTF-8 or any other Unicode encoding.
Neil is correct in saying the U+0080 - U+009f are control codes. they are
not handled.
I trust this helps.
With best regards,
James.
From:
liblouis-liblouisxml-bounce@xxxxxxxxxxxxx<mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx>
<liblouis-liblouisxml-bounce@xxxxxxxxxxxxx<mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx>>
On Behalf Of John J. Boyer
Sent: 21 November 2020 09:30
To:
liblouis-liblouisxml@xxxxxxxxxxxxx<mailto:liblouis-liblouisxml@xxxxxxxxxxxxx>
Subject: [EXTERNAL] [liblouis-liblouisxml] Untranslated unicode characters
in UEB
CAUTION: External. Do not click links or open attachments unless you know
the content is safe.
________________________________
Hello,
When I translate texts from pdf documents I find a lot of characters for
which the Unified English Braille (UEB) tables do not have translations and
that are represented as 'xhhhh'
The most frequent are x0080 and various x009h where the h is a hex digit. I
don't know how these characters should be translated.
Will some of them be fixed in the upcoming release?
Thanks,
John
--
John J. Boyer
Email: john.boyer@xxxxxxxxxxxxxxxxx<mailto:john.boyer@xxxxxxxxxxxxxxxxx>
website:
http://www.abilitiessoft.org<http://www.abilitiessoft.org><http://www.abilitiessoft.org<http://www.abilitiessoft.org>>
Status: Company dissolved but website and email addresses live.
Location: Madison, Wisconsin, USA
Mission: developing assistive technology software and providing STEM
services
that are available at no cost
For a description of the software, to download it and links to
project pages go to
http://liblouis.org<http://liblouis.org><http://liblouis.org<http://liblouis.org>>
Donate:
http://liblouis.org/sponsoring<http://liblouis.org/sponsoring><http://liblouis.org/sponsoring<http://liblouis.org/sponsoring>>
--
The RNIB See Differently Awards - Coronavirus Heroes celebrates the heroic
actions displayed by both by and for blind and partially sighted people.
Read the incredible stories at https://www.rnib.org.uk/vote-heroes
--
DISCLAIMER:
NOTICE: The information contained in this email and any attachments is
confidential and may be privileged. If you are not the intended recipient
you should not use, disclose, distribute or copy any of the content of it
or of any attachment; you are requested to notify the sender immediately of
your receipt of the email and then to delete it and any attachments from
your system.
RNIB endeavours to ensure that emails and any attachments generated by its
staff are free from viruses or other contaminants. However, it cannot
accept any responsibility for any such which are transmitted.
We therefore recommend you scan all attachments.
Please note that the statements and views expressed in this email and any
attachments are those of the author and do not necessarily represent those
of RNIB.
RNIB Registered Charity Number: 226227
Website: https://www.rnib.org.uk
--
John J. Boyer
Email: john.boyer@xxxxxxxxxxxxxxxxx<mailto:john.boyer@xxxxxxxxxxxxxxxxx>
website: http://www.abilitiessoft.org<http://www.abilitiessoft.org>
Status: Company dissolved but website and email addresses live.
Location: Madison, Wisconsin, USA
Mission: developing assistive technology software and providing STEM services
that are available at no cost
For a description of the software, to download it and links to
project pages go to http://liblouis.org<http://liblouis.org>
Donate: http://liblouis.org/sponsoring<http://liblouis.org/sponsoring>
--
The RNIB See Differently Awards - Coronavirus Heroes celebrates the heroic
actions displayed by both by and for blind and partially sighted people. Read
the incredible stories at https://www.rnib.org.uk/vote-heroes
--
DISCLAIMER:
NOTICE: The information contained in this email and any attachments is
confidential and may be privileged. If you are not the intended recipient
you should not use, disclose, distribute or copy any of the content of it or
of any attachment; you are requested to notify the sender immediately of your
receipt of the email and then to delete it and any attachments from your
system.
RNIB endeavours to ensure that emails and any attachments generated by its
staff are free from viruses or other contaminants. However, it cannot accept
any responsibility for any such which are transmitted.
We therefore recommend you scan all attachments.
Please note that the statements and views expressed in this email and any
attachments are those of the author and do not necessarily represent those of
RNIB.
RNIB Registered Charity Number: 226227
Website: https://www.rnib.org.uk