On 2011-02-16 at 10:30:30 [+0100], pulkomandy <pulkomandy@xxxxxxxxxxxxxxxxx> wrote: > On Wed, 16 Feb 2011 10:13:29 +0100, Ingo Weinhold <ingo_weinhold@xxxxxx> > wrote: > > On 2011-02-16 at 09:51:02 [+0100], pulkomandy > > <pulkomandy@xxxxxxxxxxxxxxxxx> > > wrote: > >> If you can implement it in a wa that works in all case, we'll accept > the > >> patch. But I know I can't do it without a context token for utf-7 or > >> utf-16. > > > > The convert_{from,to}_utf8() functions do have an "int32* state" > > parameter. > > I'm not familiar with UTF-7, but for UTF-16 that definitely suffices to > > store the first surrogate of a pair. Do I miss something else? > > iconv (and ICU) need to open then close the context. With this API, we > don't know when to close the context, which has two problems : But that is, as Michael said, just a problem of our implementation. No one forces us to use iconv or ICU to convert between UTF-* and UTF-8. We need only 21 bits to represent a Unicode code point and have 32 state bits available. So there should be sufficient space for the algorithm to cache the not-yet-processed bits of the current/next character, which I believe is all that's needed to convert between different Unicode encodings. CU, Ingo