Michael Phipps <mphipps1@xxxxxxxxxxxxxxxx> wrote: > My question is really this - if we get a packet of N bytes, isn't the > data actually delivered to the application N - W bytes, where W is > the > "waste" of the delivery info, etc? If that is so, don't we really > know, > in advance, how much to allocate? This seems overly simplistic, so I > am > sure that it isn't the case, and I look forward to the reason why. :- > ) > > Likewise, in the other direction - if app X says "send data N", isn't > it > predictable how much buffer space you would need? We certainly know how much space is prepended in front of a buffer to be sent. And we can also determine the maximum value that we could need for buffers we receive - and that's exactly what we intend to do (that's what I suggested in our net_buffer discussion). > Tom wrote: > > A good first read on the subject is here http://lwn.net/Articles/169961/ > > [...] > > This design seems like the BeOS way, it gives much higher > > throughput and > > allows for way better scalability on N-way systems. His solution and observed problems are partially specific to the Linux/ BSD implementation. While removing the softirq overhead, simplifying the driver's interrupt routine, as well as his "channels" are good ideas in my opinion, only the latter is applicable for us. Having the networking stack in userland (per app) doesn't remove copying the data as long as you're using the standard BSD API, so all you potentially safe is a syscall per socket operation. It would definitely be nice for userland networking file systems, though :) Collecting the interface buffers and spreading them to the different channels registered would still need to run in a kernel thread, though. Also, the kernel has no control over the channels, it couldn't enforce any limits in order to keep the system alive; it couldn't drop packets in a channel when it's low on memory. He didn't really go into detail how the buffer memory management looks like to be able to evaluate it, though. Finally, having the networking in userland might be a good solution for a static stack, but it's certainly not so nice for a modular stack. And you would still need to have the whole stuff duplicated in the kernel as well. To be honest, I would rather like to introduce an API that allows you to have direct access to your socket's buffer queue (actually giving you zero copy access) rather than moving the whole stack into userland. And we still have R2 for trying changes like that :-) Bye, Axel.