On Wed, 16 Feb 2011 10:13:29 +0100, Ingo Weinhold <ingo_weinhold@xxxxxx> wrote: > On 2011-02-16 at 09:51:02 [+0100], pulkomandy > <pulkomandy@xxxxxxxxxxxxxxxxx> > wrote: >> If you can implement it in a wa that works in all case, we'll accept the >> patch. But I know I can't do it without a context token for utf-7 or >> utf-16. > > The convert_{from,to}_utf8() functions do have an "int32* state" > parameter. > I'm not familiar with UTF-7, but for UTF-16 that definitely suffices to > store the first surrogate of a pair. Do I miss something else? > > CU, Ingo iconv (and ICU) need to open then close the context. With this API, we don't know when to close the context, which has two problems : * we don't know when to insert the end-marker if there's one (utf7 seems to have an "=" char) * we leak all the open contexts for each conversions * There may also be incomplete codepoints in the last conversion for multibyte encodings. In this case, closing will raise an error, whereas it might be valid if there's more bytes to come. And if keeping the context open, we can't warn the caller that there are stray bytes if we keep the context open, either. So the current solution is to open and close the context at each call. This at least avoids memory leaks. -- Adrien.