[haiku-development] Need new UTF-16 to UTF-8 conversion functions in exfat, GPL okay?

  • From: John Scipione <jscipione@xxxxxxxxx>
  • To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
  • Date: Tue, 11 Feb 2014 13:40:57 -0500

The UTF-16 to UTF-8 string conversion functions currently used in the
exfat module don't convert 4-byte UTF-16 surrogate pairs to UTF-8
correctly. You can see this yourself by naming a file on an exfat
volume or the volume name itself using 4-byte UTF-16 characters.

You can find a bunch of characters that require 4 bytes to be
represented in UTF-16 (and UTF-8 too) here:
http://www.i18nguy.com/unicode/supplementary-test.html

There is a function in libntfs that does the conversion correctly,
it's called ntfs_ucstombs() and it's available in unistr.c in the
libntfs dir. I plugged libntfs in as a shared library into the exfat
module and this did the conversions the way exfat expects. We made a
couple of bad assumptions that this library handles for us. The
problem is, this library is GPL licensed which means that the exfat
module would become effectively GPL licensed as well.

I could port a different Unicode conversion library instead. There is
one used by the llvm project that is licensed under a BSD-like license
(Illinois Open Source License) I could possibly use, but, this would
be a lot of work, and, the libntfs driver is already setup to work
nicely with the way Microsoft interfaces with Unicode.

So, I wanted to ask if using a GPL library was okay before I go off
and do a bunch of porting work.

Also... I ran into another problem where filenames can be longer in
exfat than they can in Haiku, 255 UTF-16 characters can take up many
more than 255 bytes after converting to UTF-8 which is the max number
of bytes allowed in a filename in Haiku (B_FILE_NAME_LENGTH = 256).
Right now we just throw the file or folder on the floor (it doesn't
appear in Tracker or Terminal). Other Operating systems like Linux
with similar limitations do little better.

Unfortunately, I don't really know how to do better here. I could
perhaps truncate the file names, but that would be destructive, I
could refuse to mount, but, I'd have to iterate through all the files
in the fs to check. Any advice here?

Other related posts: