Description
Hi! Before I start asking I have to say that it was my pleasure to read through the code of grpc-node library. Splendid work, thank you!
Is your feature request related to a problem? Please describe.
tldr; We noticed that synchronous API is significantly faster in our scenario.
We have a grpc-js server with single BiDi Stream method:
message Request {
oneof payload {
HttpRequest http_request = 1; // plain http request Api Gateway got from user
BackendA backend_a_response = 2;
BackendB backend_b_response = 3;
// ... other backends
}
}
message Response {
optional Metadata metadata = 1; // Headers
optional string content = 2; // Response body. HTML
}
service Frontend {
rpc render (stream Request) returns (stream Response) {}
}
Our NodeJS backend receives chunks of data from Api Gateway. Once he got one chunk he starts an operation of a template rendering.
Once chunk's rendering is done server writes it into the grpc-js response stream. Sequence Diagram of the process if one is interested.
We noticed that this server is losing a lot of client speed to our old plain-http-1.1 server in the A/B experiment.
I was digging around about a week or two and made a conclusion that there is only one significant difference. Grpc-js is using streams as a proxy to http2.write
instead of good old Response.write
which our old server does.
Streams are asynchronous by their nature. Once one does write
, actual writing will happen on nextTick
as far I understand. So we have a situation when the writing operation stands aside of its corresponding rendering operation. Roughly something like this happens:
There are a couple of new operations before writing which were not presented before. In our case there are about 700 chunks and 100 rendering operations (one rendering operation can produce several chunks). And it makes delay: about 280ms in the 75 percentile (p75) and 211ms in p50 for Largest Content Paint. Our old website's LSP value is about 1000(p50)-1500ms(p75), so loosing 20% of speed is a lot. While in general LCP is a good metric for the web-sites we prefer to rely on Hero Element Rendering - we are tracking the most important element arriving time. So according this metric grpc-js is loosing about 370ms of p75 and 300ms of p50. Again it is about 20% degradation =(
Describe alternatives you've considered
I've been trying a lot of things during the last couple of weeks. First of all I've proven to myself and my colleagues using cpuprofiles and codes' listings that there is no other reason for degradation different from I described above.
The second thing I tried was chunk compressing using zlib and brotli. It was а failing effort. First of all zlib is making some cpu intensive work which delays the moment of chunk's readiness. However the most important thing that zlib is using streams too, so I got more of nextTicks and result was actually worse than without compression. So we went back to balancer's compression.
The last effort was successful but is a nasty one. I started using grpc-js private methods. Actually the most important one: sendMessage
(https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/server-interceptors.ts#L837) which synchronously calls this.stream.write
which is actually http2Response.write
.
write(data: ResponseType, _: string = 'utf-8', callback: (error: null | undefined | Error) => void = () => {}) {
try {
this._sendMetaData();
// NASTY ONE!
this._call.call.sendMessage(data, callback as () => void);
return true;
} catch (error: any) {
// ANOTHER ONE. Actually it is a copy-paste from ServerDuplexStreamImpl
this.pendingStatus = serverErrorToStatus(error);
this.end();
return false;
}
}
And this effort made significant difference!
We deployed this version to the production yesterday (about 21:00 of our local time)
As one can see we win about 200-300ms per each metric (LCP and Hero Element). Now we are loosing about 50ms against the plain http version and it is ok. It is not so much. And we are ready that packing grpc-messages is not free. Also we have a new one proxy before our server.
Describe the solution you'd like
In general I want synchronous way of using http2.response. For example it can be a new method writeSync
for ServerDuplexStreamImpl<RequestType, ResponseType>
class.
Additional context
¯_(ツ)_/¯
Thanks for reading all of this! Ready to your questions
Activity