[haiku-development] Re: Plan for NetServices Kit (v2)

From: Niels Sascha Reedijk <niels.reedijk@xxxxxxxxx>
To: Haiku Development <haiku-development@xxxxxxxxxxxxx>
Date: Wed, 1 Sep 2021 07:13:09 +0100

Hi,

On Mon, Aug 30, 2021 at 9:35 PM Adrien Destugues <pulkomandy@xxxxxxxxxxxxx>
wrote:

(I have not looked at your code yet, I'll do later)

I have read the full proposal now. Here are my notes as I went through it.
Probably not very clear and more questions than answers.

- Definitely don't go with both modern C++ and legacy API. It is hard
enough to get this right with one API, let's not have two completely
different ones. If you want to use modern C++, do a thing using only modern
C++. Otherwise, stay with gcc2 compatible and blend in with the existing
API.

I would actually be okay with that. The only reason that I felt we did need
some legacy API is that the package kit currently depends on it, but you
mention elsewhere that the package kit is a potential candidate to go to
modern C++ only.

- About the error handling: I find the Expected and BError thing a bit
strange. In existing APIs we use InitCheck and in some more recent
developments (package kit, mainly, we use exceptions). Why not use
exceptions here? It would remove the need for factory methods, you just
call the constructor and catch the exception if something goes wrong. I
think this results in clearer code, and this is how I work when doing
modern C++.

I will see if we can spin out this discussion separately.

- The current design has a BUrlContext. You introduce a BUrlSession that
does largely the same thing. I think that's a naming error. A session is a
connection with a server. In HTTP 1.0, a session would have one single
request. In FTP, and in HTTP1.1 and later, a session can have multiple
requests. But still, a session is basically a TCP or SSL socket. And we
still need a BUrlContext for more global things (cookie jar for HTTP, for
example, since that can span multiple sessions).

The initial thought of using B*Session was to clearly differentiate it from
the currently used BUrlProtocol and BUrlContext. The logic is that it is a
session with persistent data (like cookies and authentication data). I am
open to renaming.

- In section B5, I find the naming of the methods a bit confusing. My
understanding is that AddRequest actually immediately performs the request
and blocks until it is done? If so, it should be named ExecuteRequest or
something like that?

Obviously it is unclear, because you got it wrong :-) AddRequest() is
really AddRequestToQueue(). By default, all requests are executed
asynchronously. You can always opt in to wait for a result - i.e.
result->Body() - which will block until the body becomes available.

- About thread safety: I'm not sure internal locking is the way to go. It
makes sense for some things (for example the cookie jar) that are shared by
everything and require fine grained locking. But I'm not sure it makes
sense to want to use a BUrlSession (a connection to a server) accross
multiple threads. Internal locking also results in constant
locking/unlocking which isn't great for performance. I would consider
something like what we do for BView/BWindow: explicit locking, and asserts
in all methods that need a lock, to make sure it is indeed held.

So again, instead of understanding the B*Session as a connection to a
single server, understand it as an orchestrator of various requests to
different servers, but with a common context. Imagine I have a multi-window
application that connects to an API with a HTTP basic authentication that
is set up in the first window that opens. If I then open my other windows,
I can give them a copy of the BHttpSession with the authentication data
(and/or cookies, though they are uncommon in REST services), and each
Window can happily create requests and consume the results. Similarly,
WebPositive could open with one BHttpSession (which would again hold all
the auth, cookies, certificate exceptions, etc) and share that among all
the windows/tabs, while keeping cookie data consistent. Does that make
sense?

- I am a bit confused by B6. Your intro says that there will be no
standard API similar to BUrlRequest, but here you proceed to describe one.
Which is it, then? This API is for example not going to work for
Bittorrent, where there would be a lot of hostname resolutions and
connections opened . Also, this does not tackle one of the problems with
the current API: some things (in particular, headers received for HTTP) are
kind of "cancellation points" for the request. Ideally, the request would
block there until the application signifies that it has processed the
headers and that the download should go on. Quite often, if the request
results in an error, there is no point in continuing and handling the
content. This has proven a bit complicated to do with the existing code.

So to (hopefully) clear it up:
* Since the B*Request classes no longer hold the logic for executing the
requests, they are nothing more than data structures that hold properties
of the request. So they no longer need a common API.
* For a protocol like BitTorrent, one would create the BBitTorrentSession
(open to name suggestions) which would embody the logic of connecting to
multiple hosts.
* The problem you mention about doing partial requests: all requests are
handled by the BHttpSession in an asynchronous way, and the logic for these
partial requests can be handled there without getting back to the user.

- Section B7 says that the way to wait for a request to complete is to
call Body(), but this is specific to HTTP. Wouldn't it be nice to have a
generic way to do this for any protocol?

I think ultimately yes, I think ultimately every protocol will have some
form of a body (or maybe content is more generic?), so I expect BFtpResult,
BBitTorrentResult and BIPFSResult all to have a member Body() that acts
like a future and waits for the result to become available.

- B8: in the current kit, the BHttpResult is available as soon as the
headers are received. So there is an extra "request is still in progress"
phase. If we are to have independant APIs for each protocols, I think the
HTTP protocol should really follow how the requests actually work: there is
an initial phase with an exchange of data (open connection, send request +
request headers, receive status code + response headers) and then there is
an unidirectional body phase (either send or receive, depending on the
request). What to do in the body phase may depends on the headers, for
example you may need to check the content-encoding to decide if you need to
transcode the data before storing it. In the case of receiving data, it's
possible to do this after the fact, but in the case of sending data, not so
much. I think it is required to really have this 2-step process in the API.

I think this is covered by the fact that this logic is handled in the
BHttpSession, which executes the requests. The user just holds a
BHttpResponse, which is essentially a future-like object that will
eventually hold the final results (or an error!) of the request.

- Part C: you would be surprised how hard it is to have an HTTP download
that "just works". Ideally it would be the case, but some websites will
lockup if you don't set an accept-language header. You also need to handle
content-encoding, you are probably interested in the MIME type of the data
you're getting (we're the one OS where we have special places for that kind
of metadata). You probably also want to manage cookies, user agent, etc.
These things would normally be in the global BUrlContext but this appears
to be gone in your proposal.

I think there are two comments from my side:
* I expect a BUrlDownload to use/hold a global static BHttpSession that is
used whenever you try to open a BUrlDownload with a http/https protocol.
This is not entirely unproblematic as it puts the onus on the user to be
careful not to mix BUrlDownload with a custom BHttpSession instance, as
this would mean they would have double the internal worker threads and
double all data like cookies, cert exceptions and sessions. Maybe
BHttpSession should explicitly support a global static instance, with the
user being able to opt in to creating an individual BHttpSession (for
example when an application supports multi-user and needs individual
authentication parameters for each of the users).

- Part C: I also have the same problem with this API as I mentionned for
the advanced one: it is not clear when the download is actually started. Is
it when creating a BUrlDownload object? Or is it when calling the Body()
method? I guess it has to be the latter, because this gives you some time
to access the prepared (but not running) request, and customize it before
it starts running.

As soon as you add it to the queue, the request may start at any moment.
The only thing you have is a BHttpResult object, which is essentially a
future that will eventually contain the full response (or an error). One
thing that is not clear from this proposal is that it actually might make
sense to have the BHttpSession implement some limits like the max number of
concurrent requests and the max requests to a server. This might mean that
your request actually gets queued before it actually starts. But I am
looking for some input from the real world of WebKit whether that is a
useful feature.

I hope this clears up the main thoughts on the project.

Regards,

N>

[haiku-development] Re: Plan for NetServices Kit (v2)

Other related posts: