Hi, On 05/26/2013 07:51 PM, Diego Biurrun wrote:
On 2013-05-26 18:16, Diego Biurrun wrote:On Sat, May 25, 2013 at 01:37:18PM -0000, noreply@xxxxxxxxxxxxx wrote:------------------------------------------------------------ revno: 6412 fixes bug: https://launchpad.net/bugs/1184142 committer: Miika Komu <miika@xxxxxx> branch nick: hipl timestamp: Sat 2013-05-25 16:36:48 +0300 message: Fixed a checksumming issue As described in lp:1184142, initiating base exchange using a 64-bit initiator and 32-bit responder with IPv6 addresses (e.g. Teredo addresses) results in a checksum error: error(lib/core/builder.c:1884@hip_verify_network_header) HIP checksum failed. I traced the error to hip_checksum_packet() where the variable "p" seems behave oddly. After some testing, I figured out that the issue does not repeat itself when turning gcc optimizations off. I tested that the problem could have been solved with __attribute__((optimize("O0"))) in the function signature, but declaring the problematic variable to static seems to do the trick. Feel free to suggest better solutions to the issue. modified: libcore/gpl/checksum.c--- libcore/gpl/checksum.c 2012-05-12 06:54:33 +0000 +++ libcore/gpl/checksum.c 2013-05-25 13:36:48 +0000 @@ -228,7 +228,7 @@ uint16_t checksum = 0; unsigned long sum = 0; int count = 0, length = 0; - unsigned short *p = NULL; /* 16-bit */ + static unsigned short *p = NULL; /* 16-bit */ struct pseudo_header pseudoh = { { 0 } }; struct pseudo_header6 pseudoh6 = { { 0 } }; uint32_t src_network, dst_network;I think this makes the situation worse, not better. This code is broken and now you rely on some gcc optimization (or lack thereof) to hide the fact. Have you verified this even works with more than one gcc version (not to mention other compilers)?
I didn't test my fix with multiple gcc versions or clang.
Since the bug appears with 32 and 64 bit platforms, one needs to find out why different checksums are generated in the first place. Some parts hip_checksum_packet() must have hidden assumptions about pointer or integer type sizes. Also note the rampant casting that is used to paper over design issues. My money is on the pointer dereferences sum += *p++; count -= 2; should depend on the size of the pointer, which can be 32 or 64 bits, so count cannot be decremented by the fixed size of '2', which is only correct for 32 bits.Hmmm, no, that analysis is not correct. The decrement should work. I suggest starting to debug this by getting rid of all the short and long integer types used in that file and replace them by POSIX types like uint16_t that have guaranteed sizes.
I tried this, but it didn't help, so the problem originates from gcc optimizations.
I agree that this code should be rewritten with integers that have clear sizes and possibly with a union structure. I have reopened the bug, but meanwhile the only working fix will remain in the trunk.