[nanomsg] Re: [PATCH] Bug fix for the clean-up of removed poll fds, when NN_USE_POLL is selected.

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Thu, 23 Oct 2014 13:02:56 +0200

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Applied to the master.

Thanks!
Martin

On 23/10/14 12:10, Aram Santogidis wrote:
> Hi Martin,
> 
> yes I confirm that the patch is under MIT license. Sorry, missed to
> mention it to my first email.
> 
> Regards, Aram
> 
> -----Original Message----- From: nanomsg-bounce@xxxxxxxxxxxxx
> [mailto:nanomsg-bounce@xxxxxxxxxxxxx] On Behalf Of Martin Sustrik 
> Sent: 23 October 2014 08:09 To: nanomsg@xxxxxxxxxxxxx Subject:
> [nanomsg] Re: [PATCH] Bug fix for the clean-up of removed poll fds,
> when NN_USE_POLL is selected.
> 
> Hi Aram,
> 
> Can you confirm that you are submitting the patch under MIT
> license?
> 
> Thanks! Martin
> 
> On 22/10/14 14:15, Aram Santogidis wrote:
>> Hi,
> 
>> I found a bug and I submit a patch for the fix.
> 
>> Here is the descritption:
> 
>> I selected to use poll as the polling mechanism (i.e. 
>> NN_USE_POLL=1) and then executed the unit tests (make check).
> 
>> The ipc_shutdown and tcp_shutdown failed with the following
>> messages
> 
>> ./test-driver: line 107: 18064 Segmentation fault      "$@" > 
>> $log_file 2>&1 FAIL: tests/ipc_shutdown ./test-driver: line 107: 
>> 20233 Segmentation fault      "$@" > $log_file 2>&1 FAIL: 
>> tests/tcp_shutdown
> 
>> 1) I investigated the issue, and discovered that the following 
>> segmentation fault was the source of the failure.
> 
>> nn_poller_wait (self=self@entry=0x7ffff7dd9608 <self+136>, 
>> timeout=70) at src/aio/poller_poll.inc:130 130 self->hndls
>> [i].hndl->index = i;
> 
>> The statement at line 130 fails because hndl can be NULL in the
>> case when *the last element of the pollset belongs to the list of
>> removed fds, itself*. I solved this by adding a check for NULL
>> before the assignment.
> 
>> 2) I discovered a second bug, also in the nn_poller_wait, however
>> more subtle than the NULL pointer problem.
> 
>> Assume that we add N pollsets to the list of removed fds. If it 
>> happens to be that the N-1 element is at the end of the pollsets,
>> it will take the position of N after the replacement. In this
>> case the .prev index is the same with the i index. The adjustment
>> that follows
> 
>> /*  The fd from the end of the pollset may have been on removed
>> fds list itself. If so, adjust the list. */ [...]
> 
>> corrupts the list because both .next and .prev are equal to i
>> now.
> 
> 
>> The symptoms of this error are not deterministic. Sometimes The
>> test wouldn't terminate at all, some other times it would
>> terminate with a segmentation fault because the *removed* field
>> would have larger value than the size of the array, or the *size*
>> field would take negative values because the loop wouldn't
>> terminate.
> 
>> I solved this problem by adding the following code just before
>> the replacement.
> 
>> +            if (self->hndls [i].next != -1) + self->hndls
>> [self->hndls [i].next].prev = -1;
> 
>> After every replacement, the next element is at the top of the
>> stack, therefore the .prev should be equal to -1. This fixes the
>> problem 2)
> 
>> After these two changes, the tests pass.
> 
> 
>> Best Regards, Aram Santogidis
> 
> 
>> ---------------------------------------- Marie Curie Fellow, CERN
>>  ICE-DIP project Tel: +49 17677192498
> 
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJUSOBgAAoJENTpVjxCNN9Y5sEH/36KKbKYe2cizMVOWWVxDj1a
HZCyRjbxzxyGVN+aqj5aaGAPhgI+I+tSFlQbENtz0pVXRpK8kwX1OCyCK6jR5Uim
X5bsc7mDVsyJMuBo6anhXW06bDhYCBhtRKG48uKgoikj1EdsknQgXnvHQjIKqfOS
pJVGIQ4Pn2JyTOvaI1M21wccovprbV7PrUVfqhFx2sghiF+hbNkLpmyBhS+N3OKX
0tV3KKVJhc4DSeXm9yJvFHzabHW3OcfXkU1WU8cx+TeEIRv3gI9jyF881lWIFKhf
JEs+9Kir2EERNw1Jnrdj6VGmTyPeM5BGzir/ZVjaUpuAo6UayLCI2296mwuNO24=
=YQ0p
-----END PGP SIGNATURE-----

Other related posts: