Skip to content

Explicit error message when failing to create the health check stream due to failing per-RPC credentials calls #8030

Open
@atollena

Description

Use case(s) - what problem will this feature solve?

When health check is enabled (via "healthCheckConfig": {"serviceName": ""} in the service config) and the client fails to create a stream, the channel remains in CONNECTING state and no error message is output. This makes the task of troubleshooting problems caused by creating the health check watch stream hard, as there are neither logs nor RPC errors, and RPC typically fail with deadline exceeded (if a deadline is set).

The code that swallows errors when creating the health check Watch stream is here:

grpc-go/health/client.go

Lines 74 to 78 in fbff2ab

rawS, err := newStream(healthCheckMethod)
if err != nil {
continue retryConnection
}
. Errors when creating a stream can come from:

  1. The transport is not ready or nil (

    grpc-go/balancer_wrapper.go

    Lines 362 to 365 in 38a8b9a

    if transport == nil {
    return nil, status.Errorf(codes.Unavailable, "SubConn state is not Ready")
    }
    and

    grpc-go/stream.go

    Lines 1233 to 1236 in e912015

    if t == nil {
    // TODO: return RPC error here?
    return nil, errors.New("transport provided is nil")
    }
    ) IIUC it doesn't apply in this case, since the health check watch is only started when the transport is ready
  2. A CallOption returns an error (I don't think this can happen here, since we don't provide the ability to customize call options for health checks)

    grpc-go/stream.go

    Lines 1253 to 1255 in e912015

    if err := o.before(c); err != nil {
    return nil, toRPCErr(err)
    }
  3. We fail to get codec or compressor for the call (again not applicable to health check since gRPC controls that part)
  4. We fail to get a transport or create a stream on the transport (

    grpc-go/stream.go

    Lines 353 to 358 in e912015

    if err := a.getTransport(); err != nil {
    return err
    }
    if err := a.newStream(); err != nil {
    return err
    }
    ). This can happen when we fail to create header fields, e.g. the call to GetRequestMetadata to get per-RPC credentials fails (that is the case I ran into).

Proposed Solution

Transition the subchannel to TRANSIENT_FAILURE if we failed to create the watch stream. Continue the retry loop.

Alternatives Considered

If changing the behavior of of health checks when per RPC credentials fail is not desirable, at least logging through channelz would be nice.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Area: ClientIncludes Channel/Subchannel/Streams, Connectivity States, RPC Retries, Dial/Call Options and more.Type: Behavior ChangeBehavior changes not categorized as bugsType: FeatureNew features or improvements in behavior

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions