[nanomsg] [PATCH] Bug fix for the clean-up of removed poll fds, when NN_USE_POLL is selected.

  • From: Aram Santogidis <aram.santogidis@xxxxxxx>
  • To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
  • Date: Wed, 22 Oct 2014 12:15:33 +0000

Hi,

I found a bug and I submit a patch for the fix. 

Here is the descritption:

I selected to use poll as the polling mechanism (i.e. NN_USE_POLL=1) and then 
executed the unit tests (make check).

The ipc_shutdown and tcp_shutdown failed with the following messages

./test-driver: line 107: 18064 Segmentation fault      "$@" > $log_file 2>&1
FAIL: tests/ipc_shutdown
./test-driver: line 107: 20233 Segmentation fault      "$@" > $log_file 2>&1
FAIL: tests/tcp_shutdown

1)
I investigated the issue, and discovered that the following segmentation fault 
was the source of the failure.

nn_poller_wait (self=self@entry=0x7ffff7dd9608 <self+136>, timeout=70) at 
src/aio/poller_poll.inc:130
130                 self->hndls [i].hndl->index = i;

The statement at line 130 fails because hndl can be NULL in the case when *the 
last element of the pollset belongs to the list of removed fds, itself*. I 
solved this by adding a check for NULL before the assignment.

2)
I discovered a second bug, also in the nn_poller_wait, however more subtle than 
the NULL pointer problem.

Assume that we add N pollsets to the list of removed fds. If it happens to be 
that the N-1 element is at the end of the pollsets, it will take the position 
of N after the replacement. In this case the .prev index is the same with the i 
index. The adjustment that follows

     /*  The fd from the end of the pollset may have been on removed fds
            list itself. If so, adjust the list. */
[...]

corrupts the list because both .next and .prev are equal to i now. 

The symptoms of this error are not deterministic. Sometimes The test wouldn't 
terminate at all, some other times it would terminate with a segmentation fault 
because the *removed* field would have larger value than the size of the array, 
or the *size* field would take negative values because the loop wouldn't 
terminate.

I solved this problem by adding the following code just before the replacement.

+            if (self->hndls [i].next != -1)
+                self->hndls [self->hndls [i].next].prev = -1;

After every replacement, the next element is at the top of the stack, therefore 
the .prev should be equal to -1. This fixes the problem 2)

After these two changes, the tests pass. 


Best Regards,
Aram Santogidis


                
----------------------------------------
Marie Curie Fellow, CERN
ICE-DIP project
Tel: +49 17677192498
From de3813a43d451c94e90f668dd061de124501fa8d Mon Sep 17 00:00:00 2001
From: Aram Santogidis <aram.santogidis@xxxxxxx>
Date: Wed, 22 Oct 2014 11:38:50 +0200
Subject: [PATCH] Bug fix for the clean-up of removed poll fds, when
 NN_USE_POLL is selected.

---
 src/aio/poller_poll.inc | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/aio/poller_poll.inc b/src/aio/poller_poll.inc
index 0aa3868..a4b0a76 100644
--- a/src/aio/poller_poll.inc
+++ b/src/aio/poller_poll.inc
@@ -126,8 +126,11 @@ int nn_poller_wait (struct nn_poller *self, int timeout)
         --self->size;
         if (i != self->size) { 
             self->pollset [i] = self->pollset [self->size];
+            if (self->hndls [i].next != -1)
+                self->hndls [self->hndls [i].next].prev = -1;
             self->hndls [i] = self->hndls [self->size];
-            self->hndls [i].hndl->index = i;
+            if (nn_fast (self->hndls [self->size].hndl != NULL))
+                self->hndls [i].hndl->index = i;
         }
 
         /*  The fd from the end of the pollset may have been on removed fds
-- 
1.9.1

Other related posts: