REST is a protocol used commonly for web APIs. It generally means each resource is identified using a a URL path
/resource1. Each of the resource end points supports a RESTful operations that is:
GET POST PUT PATCH DELETE.
A common problem occurs is when the client needs to aggregate resources from a RESTful API, or to compose together operations on multiple resources. This is called the Composition & Aggregation problem.
There's actually a number of distinct usecases here:
- The client wants to run a single operation on aggregate multiple resources:
GET2 /resource1, /resource2or
POST2 /resource1, /resource2. (Applicative).
- The client wants to compose sequential operations in a single resource.
UPDATE /resource1 >> GET /resource1. ( Monadic
- The client wants to pipeline sequential operations in multiple resources.
GET /resource1 >>= POST /resource2. ( Monadic
Basically, we have potentially
n operations, on
m resources, and some are sequential, some are pipelined, some can
be made parallel.
We also need to understand there is no such thing as a pure RESTful operation. All RESTful operations are
GET is side-effectful! (Even though the spec says we should be making it idempotent). The point
is sometimes the effects are invisible to the client.
We can address RESTful composition & aggregation in 2 ways: on the server side, or on the client side.
On the server side, there are 2 potential solutions
- Gateway Pattern - This is where you setup a proxy gateway and set up custom routes that perform the aggregate or composed operations. You can do this if you don't control or don't want to change the endpoints.
- Aggregator resource - This is where you setup an aggregator resource like
/aggregated_resourcethat applies operations on
/resource2. A variation on this is a hierarchal resource that is, if
resource2can be classified in a hierarchal relationship:
/parent/resource2, you can then apply the operation on the wrapper/parent resource
/parentand gain the same semantics.
These solutions however are not very flexible. You'll always rely on the server to implement these wrapper resources. If you don't control the server, then you are out of luck.
Server side solutions are fairly inflexible. The server should just be presenting granular resources, so they can be manipulated in detail. This allows more flexible abstractions "services" to be built on top.
The key to client side solutions is to managing requests that can be concurrent, and requests that have to be
synchronous, that is, we need to selectively apply ordering. Client side solutions need to exploit the HTTP protocol in
order to gain
- Concurrent HTTP Connections using HTTP keepalive with optional HTTP 1.1 pipelining or HTTP 2 multiplexing. You also need a DSL to express request dependencies (synchronous chains) or request pipelines, and a request scheduler that understands the DSL.
Let's break this down:
During any kind of unordered aggregate/composed request, you'll need to contact multiple different endpoints, and possibly on different hosts. This implies the usage of multiple concurrent TCP connections. An interesting fact is that browsers generally limit per-host HTTP 1.1 connections to 6. The number of concurrent connections that you can use on the client side is limited by various OS resources such as file descriptors, ports, memory, IP interfaces. You are also limited on the number of concurrent connections that your endpoint supports. You may inadvertently DDOS your own endpoint.
HTTP Keepalive (Persistent Connections)
Spinning up new HTTP connections incurs a lot of overhead (sometimes 150ms). Especially if you're establishing new SSL connections. As you're using lots of concurrent connections for your aggregate/composed request, you want to be able to reuse connections rather than establishing new ones each time you initiate a new batch.
By default HTTP 1.1 connections are persistent, and most well designed HTTP clients will reuse connections and maintain a connection cache pool.
The limit on your connection cache pool is related to the number of concurrent connections you want to hold. If you want to be able to maintain a large number of open concurrent connections, you'll generally need to increase the limit on your concurrent cache pool as well. Otherwise you can have bursts of concurrent connections will need to spend time spinning up new connections.
Please note that PHP HTTP clients generally use
curl will handle a cache pool for a single PHP lifecycle.
However most cache pools are discardedonce the PHP request exits. Whereas long running daemon can keep
If you are using PHP, you'll want to maintain the pool across PHP lifecycles. This can be done in HHVM using
HTTP 1.1 Pipelining vs HTTP 2 Multiplexing
Most of the time you're using one connection per request. But it's also possible to use one connection for multiple requests. This reduces the need to use many concurrent connections, which can exhaust OS resources on the client side or server side. HTTP 1.1 supports this with HTTP pipelining. While it works, it has some disadvantages. Most implementations do not do pipelining correctly, and this has meant that pipelining is not used by default. See these issues:
Furthermore according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html:
Clients SHOULD NOT pipeline requests using non-idempotent methods or non-idempotent sequences of methods (see section 9.1.2). Otherwise, a premature termination of the transport connection could lead to indeterminate results. A client wishing to send a non-idempotent request SHOULD wait to send that request until it has received the response status for the previous request.
HTTP 2, however changes the game. It supports multiplexing by default. It's multiplexing is far more advanced than HTTP pipelining. So it doesn't have the same problems as HTTP 1.1.
The advances in HTTP 2 means there's no actual need for per-host multiple concurrent connections, as concurrent HTTP requests can use the same connection. However each host requires its own connection.
DSL to Express Request Dependencies
In general, this is basically what monadic and applicative do-notation are for.
Here are some other examples:
There's another possibility, which is to create a query language DSL that gets carried in body of HTTP requests. This can be more suitable, if your system doesn't really fit into the REST paradigm. See for example GraphQL by Facebook. This is more of a content-protocol and behaves more like RPC.
Update with related DSLs for this approach:
The request scheduler will be dependent on your programming runtime and your HTTP client library. For example if you are
curl, then you're relying on curl's scheduler. On the other hand, if you're using NodeJS, you're probably just
doing asynchronous IO expressed with callbacks. If you're using Haskell or Erlang, you can run each request in its own
Once you start talking about RESTful composition, the topic will eventually lead to concurrency control. That is how do you deal with multiple concurrent REST API operations that could potentially lead to a race condition or corruption of data? One common technique in concurrency control is transactions. RESTful transactions are usually implemented by reifying the locks as an addressable resource. See these links for more:
A better alternative to using pessimistic transactions is HTTP based optimistic concurrency control: http://fideloper.com/etags-and-optimistic-concurrency-control See our coments on the article: http://fideloper.com/etags-and-optimistic-concurrency-control#comment-2200615457 It is far more simpler to implement, but is not as powerful.
Until HTTP 2 becomes mainstream, sending multiple requests over to the server in order to allow elegant composition and aggregation is too costly. HTTP pipelining is not widely supported and can have broken implementations. Once it becomes available, you just need a DSL to express request dependencies. I suggest using some sort of monadic HTTP client library.
- http://monkey.org/~marius/funsrv.pdf: Your Server as a Function