Re: Separating protocols from I/O

From: Daurnimator <quae@xxxxxxxxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Tue, 25 Aug 2015 00:22:32 +1000

On 24 August 2015 at 23:58, Aleksey Demakov <ademakov@xxxxxxxxx> wrote:

On 24 Aug 2015, at 18:37, Daurnimator <quae@xxxxxxxxxxxxxxx> wrote:

Have two coroutines that try and read from the same socket => who will
get the data?
And furthermore, the common form of a header containing a length
followed by that much data: what another coroutine tries to read from
the socket in between?

Alternatively, consider writing;
If I need to perform two writes (e.g. header then payload), you don't
want another thread jumping in and writing something in the middle.

This problem could be handled at a higher level. Just don’t do socket
I/O from many threads concurrently. Either do it always from a single
thread, or introduce some sort of synchronisation mechanism.

Furthermore, issuing concurrent read/write syscalls on real sockets
is not safe from the get go. The system guarantees that unless some
hard error occurs it will read / write the requested number of bytes
only dealing with regular files. For sockets it is legal for a system
to make a partial read/write for no apparent reason. The application
is required to repeats the syscall in order to handle the remaining
bytes. And as you might imagine, when you repeat a syscall, a
concurrent thread might issue its call before that. So concurrent
I/O for sockets is not recommended.

This is where the buffering system comes into play.
It allows you to do multiple writes knowing another thread cannot 'preempt' you.
I was trying to illustrate why a buffering layer is so often bundled
with an IO layer and/or scheduler.

However there is another problem. If you have read/write either
coroutine or thread-based I/O multiplexing layer sitting between
the system and application layers, then if you do I/O through this
layer, but open and close sockets independently, then you can run
into a problem. The intermediate multiplexing layer might buffer the
received data (or an error event). If you close a socket and then
accept another one that happens to get the same file descriptor, your
multiplexing layer will deliver stale data related to previous connection.

This is easy enough to code for.
But this possibly demonstrates why it's important that your IO layer
knows about your scheduler.
e.g. cqueues has the function `cqueues.cancel`, it removes the given
fd from all schedulers.
certain functions, like socket:shutdown() call this internally.

So in addition to the read/write calls it is advisable to also have
a close or flush call that notifies the multiplexing layer that any
pending data or events related to a socket needs to be discarded.
And possibly also removed from the underlying select/poll/epoll
or whatever mechanism is used by the I/O multiplexer.

Indeed, cqueues gives you the tools to solve all these issues; and is
possibly the high level solution you mentioned in the first paragraph.
(The reason I use it instead of other event frameworks is that it's composable)

Follow-Ups:
- Re: Separating protocols from I/O
  - From: Aleksey Demakov

References:
- Separating protocols from I/O
  - From: Cosmin Apreutesei
- Re: Separating protocols from I/O
  - From: Daurnimator
- Re: Separating protocols from I/O
  - From: Aleksey Demakov

Re: Separating protocols from I/O

Other related posts: