-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header #11048
base: main
Are you sure you want to change the base?
Conversation
1938e60
to
463cd50
Compare
7785a60
to
921c239
Compare
Kindly cc-ing @olavloite @tharoldD @willpoint |
3d50ba7
to
3c79e77
Compare
066caaa
to
ef110da
Compare
Kindly cc-ing you @harshachinta. |
…est-id" Header This change adds sending over the "x-goog-spanner-request-id" header for every unary and streaming call, in the form: <processId>.<clientId>.<requestCountForClient>.<channelId>.<rpcCountForRequest> where: * processId is a randomly generated uint32 singleton for the lifetime of a process * clientId is the monotonically increasing id/number of gRPC Spanner clients created * requestCountForClient is the monotonically increasing number of requests made by the client * channelId currently at 1 is the Id of the client for Go * rpcCountForRequest is the number of RPCs/retries within a specific request This header is to be sent on both unary and streaming calls and it'll help debug latencies for customers. After this change, the next phase shall be providing a mechanism for customers to consume the requestID and log it along with the documentation for how to accomplish that. Updates googleapis#11073
e9edb17
to
18cd8d4
Compare
@odeke-em can you please fix the vet and tests issues here? |
@@ -209,7 +209,13 @@ func (t *BatchReadOnlyTransaction) partitionQuery(ctx context.Context, statement | |||
ParamTypes: paramTypes, | |||
} | |||
sh.updateLastUseTime() | |||
|
|||
// PartitionQuery does not retry automatically so we don't need to retrieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean with this? PartitionQuery
retries if it receives an UNAVAILABLE
error (same as most unary RPCs). See https://github.com/googleapis/googleapis/blob/master/google/spanner/v1/spanner_grpc_service_config.json for the default RPC configuration.
client := sh.getClient() | ||
gcl, ok := client.(*grpcSpannerClient) | ||
if ok { | ||
gcl.setRPCID(nRPCs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume that there will be only one active request for a gRPC client at the same time. That does not seem correct for two reasons:
- For multiplexed sessions, we keep a pool of 4 (or more correct:
numChannels
)grpcSpannerClient
s. These clients are shared across all goroutines that use multiplexed sessions. - Regular sessions can also execute requests in parallel. Those requests would also use the same
grpcSpannerClient
, meaning that keeping track of for example the number of (retry) attempts at thegrpcSpannerClient
level won't work.
func (g *grpcSpannerClient) prepareRequestIDTrackers() { | ||
g.id = nGRPCClient.Add(1) | ||
g.nthRequest = new(atomic.Uint32) | ||
g.channelID = 1 // Assuming that .raw.Connection() never changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be fixed at 1
. For regular sessions, we are setting the channel that should be used here: https://github.com/googleapis/google-cloud-go/blob/main/spanner/sessionclient.go#L404
For multiplexed sessions, we do that here:
google-cloud-go/spanner/session.go
Line 1102 in 45e1ce7
p.multiplexSessionClientCounter = p.multiplexSessionClientCounter % len(p.clientPool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above also shows why the current strategy of assuming that a grpcSpannerClient
is not used in parallel by multiple goroutines is incorrect, as the the client library just keeps a pool of numChannels
(default: 4) grpcSpannerClient
instances for multiplexed sessions. These will be handed out in round-robin fashion to application goroutines that want to execute a query or transaction.
raw *vkit.Client | ||
metricsTracerFactory *builtinMetricsTracerFactory | ||
|
||
// These fields are used to uniquely track x-goog-spanner-request-id | ||
// grpc.ClientConn is presumed to be the channel, hence channelID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The raw *vkit.Client
is the channel, so in that sense, this could be said to be redundant. But that property does not have a simple number or other simple string representation, which means that it is probably better/easier to just use the channel pool index that was used to fetch the channel as the channel ID here. Which again means that this property is not redundant and should be assigned a value.
@@ -274,6 +274,8 @@ func (sc *sessionClient) executeBatchCreateSessions(client spannerClient, create | |||
break | |||
} | |||
var mdForGFELatency metadata.MD | |||
// Each invocation of client.BatchCreateSessions is not automatically retried |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here as for PartitionQuery; I don't quite understand what you mean with 'not automatically retried' in these cases. Could you elaborate a bit on that?
// Firstly set the number of retries as the RPCID. | ||
gcl, ok := client.(*grpcSpannerClient) | ||
if ok { | ||
gcl.setRPCID(nRPCs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to confuse two different types of retries:
- Single RPCs are retried if they receive a retryable error (e.g.
UNAVAILABLE
). This is handled by the gRPC libraries and transparent to the Spanner client. This attempt number should be based on the number of retries that the RPC is being retried by the gRPC libraries. - Read/write transactions are retried if Spanner returns an
ABORTED
error. That will cause the entire transaction to be retried. That should not affect the attempt number that is used for an RPC.
This change adds sending over the "x-goog-spanner-request-id" header
for every unary and streaming call, in the form:
where:
This header is to be sent on both unary and streaming calls and it'll
help debug latencies for customers. After this change, the next phase shall
be providing a mechanism for customers to consume the requestID and log it
along with the documentation for how to accomplish that.
Updates #11073