[haiku-development] StyledEdit and STXT translator. Handling bogus data in ht input streams.

  • From: Siarzhuk Zharski <zharik@xxxxxx>
  • To: <haiku-development@xxxxxxxxxxxxx>
  • Date: Sun, 25 Nov 2012 12:09:55 +0100

Hi,

Let me speak a bit more about yet another ticket promoted to tasks for GCI 2012.

This one is: "StyledEdit doesn't check for valid utf-8" http://dev.haiku-os.org/ticket/6447. As far as I see StyledEdit relies on STXT translator support. In mentioned case (#6447) the translator silently put non-valid data into output stream. There are also some another cases of this problem:

a) application complaints about invalid data in the input stream and do open nothing: "[StyledEdit] Bad argument type passed to function" http://dev.haiku-os.org/ticket/3045
Reason: Some data from 0x01 - 0x1F range occurred in the file.

b) application just show the part of loaded file:
"[StyledEdit] shows only small part of file" http://dev.haiku-os.org/ticket/7954
Reason: 0x00 bytes occurred in the text file.

I'm going to propose following as the solution for mentioned above issues: 1) Improve STXT translator to check the streams for valid UTF-8 characters and fail in case some bogus data found (case #6447); 2) Fix handling of the 0x00 bytes in the STXT translator before the end of the stream (case #7954); 3) Populate STXT translator configuration settings with the option "replace non-valid UTF-8 data with readable codes". This option should replace code in 0x00-0x1F range into theirs canonical names like NUL, SOH, ETX, STX and put replacement character like '?' for other non-valid UTF-8 data sets.

Having this features in the STXT translator we can build more robust and user-friendly algorithm of loading text data: Try to load text in strict mode with replace option "of" in case it fails - nag user either with alert asking about using replace option or instantiate STXT translator configuration view for fine tuning the loading process.

Please share your opinion about this problem and my suggestions.

Thank you for attention!

--
Kind Regards,
   S.Zharski

Other related posts: