#14674: StyledEdit Misreads UTF-8 Files as Something Else
---------------------------------------+----------------------------
Reporter: AGMS | Owner: nobody
Type: bug | Status: new
Priority: normal | Milestone: Unscheduled
Component: Applications/StyledEdit | Version: R1/Development
Resolution: | Keywords: Encoding
Blocked By: | Blocking:
Has a Patch: 0 | Platform: All
---------------------------------------+----------------------------
Comment (by Pete):
It turns out I was confused by using two different revs of StyledEdit. My
work partition is about 4 years old, but I initially tested in my latest
hrev51670 from last December. The differences weren't obvious, but I
think I have them sorted now.
I created some short test files: UTF8.txt, ascii.txt, and Windows.txt
(attached). I initially stripped them of all attributes, then loaded them
into (both versions of) StyledEdit.
My older version behaves fairly sanely. If I load either UTF8.txt, or
ascii.txt, they are displayed correctly. If I quit without re-saving, no
encoding attribute is added. If I save, the numeric attribute 65535 gets
added.
If I load Windows.txt, with its non-Unicode characters, the attribute gets
immediately set to 'iso-8859-1' (no saving required).
The new version is just weird. The attribute is set on loading in all
cases (I never re-saved), but totally arbitrarily! Here's what I got
with "catattr be:encoding *.txt", immediately after loading each into SE,
with no saving:
{{{
UTF8.txt : string : UTF-8
Windows.txt : string : ISO-8859-1
ascii.txt : string : ISO-8859-2
BeShareDocs.txt : string : ISO-8859-1
}}}
Notice that a) UTF8.txt and BeShareDocs.txt, which have pretty much the
same extended characters, get differerent encodings, and b) it thinks
ascii.txt -- which has no extended characters -- is East European!!
(BTW I did truncate all those strings as displayed by cataddr, because
they are not null-terminated and get trailing garbage. Attribute length is
correct)
This all seems to show that it is at least trying somehow to decipher the
encoding, but it did rather better a few years ago!
I agree that the main window menu seems superfluous -- especially as it's
not a selectable menu! (I'd never really noticed the Save Panel one! The
one in the Load Panel, BTW, is ignored if be:encoding is set. Not sure if
that's correct behaviour.)
How about replacing the main menu with a field in the menubar that
displays the encoding used?
--
Ticket URL: <https://dev.haiku-os.org/ticket/14674#comment:4>
Haiku <https://dev.haiku-os.org>
The Haiku operating system.