On 2011-02-16 at 11:23:55 [+0100], François Revol <revol@xxxxxxx> wrote: > Le 16 févr. 2011 à 10:56, Ingo Weinhold a écrit : > >> > >> iconv (and ICU) need to open then close the context. With this API, we > >> don't know when to close the context, which has two problems : > > > > But that is, as Michael said, just a problem of our implementation. No one > > forces us to use iconv or ICU to convert between UTF-* and UTF-8. We need > > only 21 bits to represent a Unicode code point and have 32 state bits > > available. So there should be sufficient space for the algorithm to cache > > the not-yet-processed bits of the current/next character, which I believe > > is all that's needed to convert between different Unicode encodings. > > This doesn't solve the pending chars issue though... > We discussed this some time ago but didn't find a solution. Simply don't leave more than one char pending? I don't see a fundamental problem with that solution at least. > Maybe we could agree that calling the thing with a NULL input buffer means > close the context and flush the remaining chars ? Requiring a final call with NULL input (at least when state is != 0) seems reasonable. Though a new three-phase API (init, convert, finish) with an arbitrarily complex context would be even more reasonable, I suppose. On 2011-02-16 at 11:30:19 [+0100], pulkomandy <pulkomandy@xxxxxxxxxxxxxxxxx> wrote: > > But that is, as Michael said, just a problem of our implementation. No > one > > forces us to use iconv or ICU to convert between UTF-* and UTF-8. We > need > > only 21 bits to represent a Unicode code point and have 32 state bits > > available. So there should be sufficient space for the algorithm to > cache > > the not-yet-processed bits of the current/next character, which I > believe > > is all that's needed to convert between different Unicode encodings. > > We need to know that a given call to convert_from_utf8 is the last one, so > that we can insert an end marker in the resulting utf-7 string. > I don't see how we can automagically guess that a given call to the > function will be the last one. Unless we insert the end marker at each > call, and then remove it on the next one. Yes, I would always insert an end marker, if necessary. Removing it in a later call is not possible, but also not necessary. This is potentially less space efficient, but would be correct at least. Space efficiency is a concern only when very small input chunks are used, anyway. So, yes, the API is not optimal, but AFAICT it should be possible to implement it to work correctly for pretty much all conversions. CU, Ingo