[haiku-development] Re: Need new UTF-16 to UTF-8 conversion functions in exfat, GPL okay?

  • From: John Scipione <jscipione@xxxxxxxxx>
  • To: "haiku-development@xxxxxxxxxxxxx" <haiku-development@xxxxxxxxxxxxx>
  • Date: Thu, 13 Feb 2014 21:36:04 -0500

On Thursday, February 13, 2014, John Scipione <jscipione@xxxxxxxxx> wrote:

> On Tuesday, February 11, 2014, Axel Dörfler 
> <axeld@xxxxxxxxxxxxxxxx<javascript:_e(%7B%7D,'cvml','axeld@xxxxxxxxxxxxxxxx');>>
> wrote:
>
>> On 02/11/2014 07:40 PM, John Scipione wrote:
>>
>>> So, I wanted to ask if using a GPL library was okay before I go off
>>> and do a bunch of porting work.
>>>
>>
>> Usually GPL is okay in an add-on (like NTFS), but since the exfat code
>> currently is not GPL, it would be sad to change this for such a small
>> function.
>> And there is no iconv in the kernel, btw.
>
>
> So, I refactored the functions from LLVM to work with the Haiku API. The
> license is University of Illinois which is BSD-like. Some have complained
> about an advertising clause but we should be able to use this code if we
> put a note in About System.
>
> I've also refactored some UTF conversion code that js kindly relicensed
> MIT from his ObjFW project. The MIT licensed code is nicer license-wise,
> but the LLVM code is probably more performant because it does loop
> unrolling and also the LLVM code covers all cases of converting between
> UTF8, 16, and 32 while the code from js currently only handles converting
> between UTF8 and 16.
>
>  Also... I ran into another problem where filenames can be longer in
>>> exfat than they can in Haiku, 255 UTF-16 characters can take up many
>>> more than 255 bytes after converting to UTF-8 which is the max number
>>> of bytes allowed in a filename in Haiku (B_FILE_NAME_LENGTH = 256).
>>> Right now we just throw the file or folder on the floor (it doesn't
>>> appear in Tracker or Terminal). Other Operating systems like Linux
>>> with similar limitations do little better.
>>>
>>> Unfortunately, I don't really know how to do better here. I could
>>> perhaps truncate the file names, but that would be destructive, I
>>> could refuse to mount, but, I'd have to iterate through all the files
>>> in the fs to check. Any advice here?
>>>
>>
>> Why would truncation be destructive? It's one thing you report, and
>> another that is on the disk. Only when you would change the name from
>> Haiku, you could lose something (but then the user changed the name on his
>> own action, and he will see that the name is going to be truncated before
>> he presses enter.
>> With file systems (and any on-disk format in general) one should always
>> aim to be able to read anything, so not mounting is certainly not an option.
>>
>> Anyway, it's actually quite complicated to do this right. When you cut
>> off a path name, two entries might end up with the same name.
>> So you would always need a preflight over the directory to determine if
>> it has too long names or not, and then check if there would be duplicates.
>> You would then need to store (and resolve) those duplicates, and then
>> present the entries to the user. And you not only need to do this when
>> iterating over a directory, but also when opening a file (which, depending
>> on how exfat does that, may slow down opening such files a bit).
>>
>
> This is the next task on my todo list for exfat. I assume that BFS
> probably does the work of handling name collisions already so perhaps I
> could copy that code or export it to the vfs layer. If not, I may need to
> look at importing yet more UTF handling code into the kernel. I've looked
> into what is required to compare 2 Unicode strings for equality and the
> process is far from trivial.
>

Actually scratch that I probably don't need to worry about the details of
Unicode string equality as long as I truncate intelligently taking into
account things like combining characters...

Other related posts: