[hipl-dev] Re: [Branch ~hipl-core/hipl/trunk] Rev 6412: Fixed a checksumming issue

  • From: Miika Komu <mkomu@xxxxxxxxx>
  • To: hipl-dev@xxxxxxxxxxxxx
  • Date: Sat, 01 Jun 2013 18:43:20 +0300

Hi,

On 05/31/2013 06:58 PM, Diego Biurrun wrote:
On 2013-05-31 09:43, Miika Komu wrote:
On 05/26/2013 07:51 PM, Diego Biurrun wrote:
On 2013-05-26 18:16, Diego Biurrun wrote:
On Sat, May 25, 2013 at 01:37:18PM -0000, noreply@xxxxxxxxxxxxx wrote:
------------------------------------------------------------
revno: 6412
fixes bug: https://launchpad.net/bugs/1184142
committer: Miika Komu <miika@xxxxxx>
branch nick: hipl
timestamp: Sat 2013-05-25 16:36:48 +0300
message:
   Fixed a checksumming issue

   As described in lp:1184142, initiating base exchange using a 64-bit
   initiator and 32-bit responder with IPv6 addresses (e.g. Teredo
   addresses) results in a checksum error:

   error(lib/core/builder.c:1884@hip_verify_network_header) HIP
checksum
   failed.

   I traced the error to hip_checksum_packet() where the variable "p"
seems
   behave oddly. After some testing, I figured out that the issue
does not
   repeat itself when turning gcc optimizations off.

   I tested that the problem could have been solved with
   __attribute__((optimize("O0"))) in the function signature, but
declaring
   the problematic variable to static seems to do the trick. Feel
free to
   suggest better solutions to the issue.
modified:
   libcore/gpl/checksum.c

--- libcore/gpl/checksum.c    2012-05-12 06:54:33 +0000
+++ libcore/gpl/checksum.c    2013-05-25 13:36:48 +0000
@@ -228,7 +228,7 @@
      uint16_t               checksum = 0;
      unsigned long          sum      = 0;
      int                    count    = 0, length = 0;
-    unsigned short        *p        = NULL; /* 16-bit */
+    static unsigned short  *p       = NULL; /* 16-bit */
      struct pseudo_header   pseudoh  = { { 0 } };
      struct pseudo_header6  pseudoh6 = { { 0 } };
      uint32_t               src_network, dst_network;

I think this makes the situation worse, not better.  This code is
broken and now you rely on some gcc optimization (or lack thereof)
to hide the fact.  Have you verified this even works with more than
one gcc version (not to mention other compilers)?

I didn't test my fix with multiple gcc versions or clang.

So are you confident that the issue is actually fixed and not just
papered over?  I'm convinced that it works on your compiler version out
of sheer good luck.  It may well break on the next system for the next
user.

actually, I tried a few ways to solve the problem, so it's not just sheer luck. And I did verify against the checksum code in wireshark too.

Btw, this is not a new bug. Earlier, I thought that there was just
something strange in sending raw HIP messages over Teredo (=IPv6
tunneling over IPv4). The problem didn't manifest itself because people mostly just run HIP on top of UDP+IPv4 (and requires a specific set up, so the problem has been occurring on specific circumstances.

Since the bug appears with 32 and 64 bit platforms, one needs to find
out why different checksums are generated in the first place.  Some
parts hip_checksum_packet() must have hidden assumptions about pointer
or integer type sizes.  Also note the rampant casting that is used to
paper over design issues.

My money is on the pointer dereferences

         sum   += *p++;
         count -= 2;

should depend on the size of the pointer, which can be 32 or 64 bits,
so count cannot be decremented by the fixed size of '2', which is only
correct for 32 bits.

Hmmm, no, that analysis is not correct.  The decrement should work.

I suggest starting to debug this by getting rid of all the short and
long integer types used in that file and replace them by POSIX types
like uint16_t that have guaranteed sizes.

I tried this, but it didn't help, so the problem originates from gcc
optimizations.

No, the problem is in the code, not in gcc.

the code works if I disable gcc optimizations. If you find it more
suitable, I can replace the static variable by adding
__attribute__((optimize("O0"))) around the function?

I have verified the issue and my fix with gcc version 4.6.3 (Ubuntu LTS) and 4.7.2 (Debian wheezy).

Why not commit your changes or send as a patch?

What do you mean?

I'm convinced it is an improvement to what is in that file right now...

I am because I have tested that the original code breaks
interoperability between 32-bit and 64-bit systems. The I tested the
code using my fix and interoperability works in all the following configurations:

* 64bit <-> 64bit
* 32bit <-> 64bit
* 32bit <-> 32bit

Also note that this code is taken from Boeing - maybe they have updated
or fixed it since?

I have now tested the latest function in our code, but I didn't help.

Also, might there be a problem with struct padding?

Possibly, any concrete suggestions what to try and test?

Diego

P.S.: Have you tried asking Boeing to relicense their code for us?

I think we don't need an Aalto/RWTH-specific copyright file. We could just add a new file (e.g. hip_checksum.c), copy the BSD/MIT
boilerplate from Boeing along with the function to the new
file.

Other related posts: