Skip to content

[Feature request] Please add synchronous access to http2 write #2899

Open
@re-gor

Description

Hi! Before I start asking I have to say that it was my pleasure to read through the code of grpc-node library. Splendid work, thank you!

Is your feature request related to a problem? Please describe.

tldr; We noticed that synchronous API is significantly faster in our scenario.

We have a grpc-js server with single BiDi Stream method:

message Request {
    oneof payload {
        HttpRequest http_request    = 1; // plain http request Api Gateway got from user
        BackendA backend_a_response = 2;
        BackendB backend_b_response = 3;
        // ... other backends
    }
}

message Response {
    optional Metadata metadata = 1; // Headers
    optional string content    = 2; // Response body. HTML
}

service Frontend {
    rpc render (stream Request) returns (stream Response) {}
}

Our NodeJS backend receives chunks of data from Api Gateway. Once he got one chunk he starts an operation of a template rendering.
Once chunk's rendering is done server writes it into the grpc-js response stream. Sequence Diagram of the process if one is interested.

We noticed that this server is losing a lot of client speed to our old plain-http-1.1 server in the A/B experiment.
I was digging around about a week or two and made a conclusion that there is only one significant difference. Grpc-js is using streams as a proxy to http2.write instead of good old Response.write which our old server does.

Streams are asynchronous by their nature. Once one does write, actual writing will happen on nextTick as far I understand. So we have a situation when the writing operation stands aside of its corresponding rendering operation. Roughly something like this happens:

Event loop tasks

There are a couple of new operations before writing which were not presented before. In our case there are about 700 chunks and 100 rendering operations (one rendering operation can produce several chunks). And it makes delay: about 280ms in the 75 percentile (p75) and 211ms in p50 for Largest Content Paint. Our old website's LSP value is about 1000(p50)-1500ms(p75), so loosing 20% of speed is a lot. While in general LCP is a good metric for the web-sites we prefer to rely on Hero Element Rendering - we are tracking the most important element arriving time. So according this metric grpc-js is loosing about 370ms of p75 and 300ms of p50. Again it is about 20% degradation =(

Describe alternatives you've considered

I've been trying a lot of things during the last couple of weeks. First of all I've proven to myself and my colleagues using cpuprofiles and codes' listings that there is no other reason for degradation different from I described above.

The second thing I tried was chunk compressing using zlib and brotli. It was а failing effort. First of all zlib is making some cpu intensive work which delays the moment of chunk's readiness. However the most important thing that zlib is using streams too, so I got more of nextTicks and result was actually worse than without compression. So we went back to balancer's compression.

The last effort was successful but is a nasty one. I started using grpc-js private methods. Actually the most important one: sendMessage(https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/server-interceptors.ts#L837) which synchronously calls this.stream.write which is actually http2Response.write.

write(data: ResponseType, _: string = 'utf-8', callback: (error: null | undefined | Error) => void = () => {}) {
    try {
        this._sendMetaData();

        // NASTY ONE!
        this._call.call.sendMessage(data, callback as () => void);

        return true;
    } catch (error: any) {
        // ANOTHER ONE. Actually it is a copy-paste from ServerDuplexStreamImpl
        this.pendingStatus = serverErrorToStatus(error);
        this.end();

        return false;
    }
}

And this effort made significant difference!

We deployed this version to the production yesterday (about 21:00 of our local time)
LCP
Hero element velocity

As one can see we win about 200-300ms per each metric (LCP and Hero Element). Now we are loosing about 50ms against the plain http version and it is ok. It is not so much. And we are ready that packing grpc-messages is not free. Also we have a new one proxy before our server.

Describe the solution you'd like

In general I want synchronous way of using http2.response. For example it can be a new method writeSync for ServerDuplexStreamImpl<RequestType, ResponseType> class.

Additional context

¯_(ツ)_/¯
Thanks for reading all of this! Ready to your questions

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions