Hi, I found a bug and I submit a patch for the fix. Here is the descritption: I selected to use poll as the polling mechanism (i.e. NN_USE_POLL=1) and then executed the unit tests (make check). The ipc_shutdown and tcp_shutdown failed with the following messages ./test-driver: line 107: 18064 Segmentation fault "$@" > $log_file 2>&1 FAIL: tests/ipc_shutdown ./test-driver: line 107: 20233 Segmentation fault "$@" > $log_file 2>&1 FAIL: tests/tcp_shutdown 1) I investigated the issue, and discovered that the following segmentation fault was the source of the failure. nn_poller_wait (self=self@entry=0x7ffff7dd9608 <self+136>, timeout=70) at src/aio/poller_poll.inc:130 130 self->hndls [i].hndl->index = i; The statement at line 130 fails because hndl can be NULL in the case when *the last element of the pollset belongs to the list of removed fds, itself*. I solved this by adding a check for NULL before the assignment. 2) I discovered a second bug, also in the nn_poller_wait, however more subtle than the NULL pointer problem. Assume that we add N pollsets to the list of removed fds. If it happens to be that the N-1 element is at the end of the pollsets, it will take the position of N after the replacement. In this case the .prev index is the same with the i index. The adjustment that follows /* The fd from the end of the pollset may have been on removed fds list itself. If so, adjust the list. */ [...] corrupts the list because both .next and .prev are equal to i now. The symptoms of this error are not deterministic. Sometimes The test wouldn't terminate at all, some other times it would terminate with a segmentation fault because the *removed* field would have larger value than the size of the array, or the *size* field would take negative values because the loop wouldn't terminate. I solved this problem by adding the following code just before the replacement. + if (self->hndls [i].next != -1) + self->hndls [self->hndls [i].next].prev = -1; After every replacement, the next element is at the top of the stack, therefore the .prev should be equal to -1. This fixes the problem 2) After these two changes, the tests pass. Best Regards, Aram Santogidis ---------------------------------------- Marie Curie Fellow, CERN ICE-DIP project Tel: +49 17677192498
From de3813a43d451c94e90f668dd061de124501fa8d Mon Sep 17 00:00:00 2001 From: Aram Santogidis <aram.santogidis@xxxxxxx> Date: Wed, 22 Oct 2014 11:38:50 +0200 Subject: [PATCH] Bug fix for the clean-up of removed poll fds, when NN_USE_POLL is selected. --- src/aio/poller_poll.inc | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/aio/poller_poll.inc b/src/aio/poller_poll.inc index 0aa3868..a4b0a76 100644 --- a/src/aio/poller_poll.inc +++ b/src/aio/poller_poll.inc @@ -126,8 +126,11 @@ int nn_poller_wait (struct nn_poller *self, int timeout) --self->size; if (i != self->size) { self->pollset [i] = self->pollset [self->size]; + if (self->hndls [i].next != -1) + self->hndls [self->hndls [i].next].prev = -1; self->hndls [i] = self->hndls [self->size]; - self->hndls [i].hndl->index = i; + if (nn_fast (self->hndls [self->size].hndl != NULL)) + self->hndls [i].hndl->index = i; } /* The fd from the end of the pollset may have been on removed fds -- 1.9.1