On Sun, Feb 7, 2010 at 2:35 PM, Travis D. Reed <tdreed@xxxxxxxxx> wrote: > This is the problem. (Ironically, I used to fight against this with PHP, now > it does it by itself.) I'm trying to peal off just 8 bits at a time from the > string, but the way I'm doing it, it counts ellipsis as a char...*sigh* > Still working on it. By the way, I was looking at the problem of generating the catkeys signature and the algorithm is pretty simple. I was trying to code it up in Ruby but the main problem there is that the Haiku C++ code for calculating the fingerprint counts on 32-bit integer overflow arithmetic, whereas Ruby has the Bignum class which automatically handles big numeric calculations, and this makes it harder to reproduce what the C++ code does. The main point of all this though is that you should be able to reproduce this algorithm in PHP, assuming you can treat the UTF-8 strings just as a stream of bytes and do 32-bit unsigned calculations. If you want to look the fingerprint code is in this file (in CatKey::HashFun and BHashMapCatalog::ComputeFingerprint): http://dev.haiku-os.org/browser/haiku/trunk/src/kits/locale/HashMapCatalog.cpp -- Regards, Ryan