[haiku-development] Re: Need new UTF-16 to UTF-8 conversion functions in exfat, GPL okay?

  • From: John Scipione <jscipione@xxxxxxxxx>
  • To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
  • Date: Thu, 13 Feb 2014 21:12:09 -0500

On Tuesday, February 11, 2014, Axel Dörfler <axeld@xxxxxxxxxxxxxxxx> wrote:

> On 02/11/2014 07:40 PM, John Scipione wrote:
>
>> So, I wanted to ask if using a GPL library was okay before I go off
>> and do a bunch of porting work.
>>
>
> Usually GPL is okay in an add-on (like NTFS), but since the exfat code
> currently is not GPL, it would be sad to change this for such a small
> function.
> And there is no iconv in the kernel, btw.


So, I refactored the functions from LLVM to work with the Haiku API. The
license is University of Illinois which is BSD-like. Some have complained
about an advertising clause but we should be able to use this code if we
put a note in About System.

I've also refactored some UTF conversion code that js kindly relicensed MIT
from his ObjFW project. The MIT licensed code is nicer license-wise, but
the LLVM code is probably more performant because it does loop unrolling
and also the LLVM code covers all cases of converting between UTF8, 16, and
32 while the code from js currently only handles converting between UTF8
and 16.

 Also... I ran into another problem where filenames can be longer in
>> exfat than they can in Haiku, 255 UTF-16 characters can take up many
>> more than 255 bytes after converting to UTF-8 which is the max number
>> of bytes allowed in a filename in Haiku (B_FILE_NAME_LENGTH = 256).
>> Right now we just throw the file or folder on the floor (it doesn't
>> appear in Tracker or Terminal). Other Operating systems like Linux
>> with similar limitations do little better.
>>
>> Unfortunately, I don't really know how to do better here. I could
>> perhaps truncate the file names, but that would be destructive, I
>> could refuse to mount, but, I'd have to iterate through all the files
>> in the fs to check. Any advice here?
>>
>
> Why would truncation be destructive? It's one thing you report, and
> another that is on the disk. Only when you would change the name from
> Haiku, you could lose something (but then the user changed the name on his
> own action, and he will see that the name is going to be truncated before
> he presses enter.
> With file systems (and any on-disk format in general) one should always
> aim to be able to read anything, so not mounting is certainly not an option.
>
> Anyway, it's actually quite complicated to do this right. When you cut off
> a path name, two entries might end up with the same name.
> So you would always need a preflight over the directory to determine if it
> has too long names or not, and then check if there would be duplicates. You
> would then need to store (and resolve) those duplicates, and then present
> the entries to the user. And you not only need to do this when iterating
> over a directory, but also when opening a file (which, depending on how
> exfat does that, may slow down opening such files a bit).
>

This is the next task on my todo list for exfat. I assume that BFS probably
does the work of handling name collisions already so perhaps I could copy
that code or export it to the vfs layer. If not, I may need to look at
importing yet more UTF handling code into the kernel. I've looked into what
is required to compare 2 Unicode strings for equality and the process is
far from trivial.

Other related posts: