feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header #11048

odeke-em · 2024-10-29T12:47:19Z

This change adds sending over the "x-goog-spanner-request-id" header
for every unary and streaming call, in the form:

    <processId>.<clientId>.<requestCountForClient>.<channelId>.<rpcCountForRequest>

where:

processId is a randomly generated uint32 singleton for the lifetime of a process
clientId is the monotonically increasing id/number of gRPC Spanner clients created
requestCountForClient is the monotonically increasing number of requests made by the client
channelId currently at 1 is the Id of the client for Go
rpcCountForRequest is the number of RPCs/retries within a specific request

This header is to be sent on both unary and streaming calls and it'll
help debug latencies for customers. After this change, the next phase shall
be providing a mechanism for customers to consume the requestID and log it
along with the documentation for how to accomplish that.

Updates #11073

odeke-em · 2024-11-03T19:43:25Z

Kindly cc-ing @olavloite @tharoldD @willpoint

odeke-em · 2024-11-05T02:41:16Z

Kindly cc-ing you @harshachinta.

…est-id" Header This change adds sending over the "x-goog-spanner-request-id" header for every unary and streaming call, in the form: <processId>.<clientId>.<requestCountForClient>.<channelId>.<rpcCountForRequest> where: * processId is a randomly generated uint32 singleton for the lifetime of a process * clientId is the monotonically increasing id/number of gRPC Spanner clients created * requestCountForClient is the monotonically increasing number of requests made by the client * channelId currently at 1 is the Id of the client for Go * rpcCountForRequest is the number of RPCs/retries within a specific request This header is to be sent on both unary and streaming calls and it'll help debug latencies for customers. After this change, the next phase shall be providing a mechanism for customers to consume the requestID and log it along with the documentation for how to accomplish that. Updates googleapis#11073

rahul2393 · 2024-11-08T05:49:38Z

@odeke-em can you please fix the vet and tests issues here?

olavloite · 2024-11-07T14:21:45Z

spanner/batch.go

@@ -209,7 +209,13 @@ func (t *BatchReadOnlyTransaction) partitionQuery(ctx context.Context, statement
 		ParamTypes:       paramTypes,
 	}
 	sh.updateLastUseTime()
+
+	// PartitionQuery does not retry automatically so we don't need to retrieve


What do you mean with this? PartitionQuery retries if it receives an UNAVAILABLE error (same as most unary RPCs). See https://github.com/googleapis/googleapis/blob/master/google/spanner/v1/spanner_grpc_service_config.json for the default RPC configuration.

olavloite · 2024-11-07T14:28:20Z

spanner/client.go

+		client := sh.getClient()
+		gcl, ok := client.(*grpcSpannerClient)
+		if ok {
+			gcl.setRPCID(nRPCs)


This seems to assume that there will be only one active request for a gRPC client at the same time. That does not seem correct for two reasons:

For multiplexed sessions, we keep a pool of 4 (or more correct: numChannels) grpcSpannerClients. These clients are shared across all goroutines that use multiplexed sessions.

Regular sessions can also execute requests in parallel. Those requests would also use the same grpcSpannerClient, meaning that keeping track of for example the number of (retry) attempts at the grpcSpannerClient level won't work.

olavloite · 2024-11-08T10:54:57Z

spanner/grpc_client.go

+func (g *grpcSpannerClient) prepareRequestIDTrackers() {
+	g.id = nGRPCClient.Add(1)
+	g.nthRequest = new(atomic.Uint32)
+	g.channelID = 1 // Assuming that .raw.Connection() never changes.


This should not be fixed at 1. For regular sessions, we are setting the channel that should be used here: https://github.com/googleapis/google-cloud-go/blob/main/spanner/sessionclient.go#L404

For multiplexed sessions, we do that here:

google-cloud-go/spanner/session.go

Line 1102 in 45e1ce7

p.multiplexSessionClientCounter = p.multiplexSessionClientCounter % len(p.clientPool)

The above also shows why the current strategy of assuming that a grpcSpannerClient is not used in parallel by multiple goroutines is incorrect, as the the client library just keeps a pool of numChannels (default: 4) grpcSpannerClient instances for multiplexed sessions. These will be handed out in round-robin fashion to application goroutines that want to execute a query or transaction.

olavloite · 2024-11-08T11:04:06Z

spanner/grpc_client.go

 	raw                  *vkit.Client
 	metricsTracerFactory *builtinMetricsTracerFactory
+
+	// These fields are used to uniquely track x-goog-spanner-request-id
+	// grpc.ClientConn is presumed to be the channel, hence channelID


The raw *vkit.Client is the channel, so in that sense, this could be said to be redundant. But that property does not have a simple number or other simple string representation, which means that it is probably better/easier to just use the channel pool index that was used to fetch the channel as the channel ID here. Which again means that this property is not redundant and should be assigned a value.

olavloite · 2024-11-08T11:07:45Z

spanner/sessionclient.go

@@ -274,6 +274,8 @@ func (sc *sessionClient) executeBatchCreateSessions(client spannerClient, create
 			break
 		}
 		var mdForGFELatency metadata.MD
+		// Each invocation of client.BatchCreateSessions is not automatically retried


Same question here as for PartitionQuery; I don't quite understand what you mean with 'not automatically retried' in these cases. Could you elaborate a bit on that?

olavloite · 2024-11-08T11:10:30Z

spanner/transaction.go

+		// Firstly set the number of retries as the RPCID.
+		gcl, ok := client.(*grpcSpannerClient)
+		if ok {
+			gcl.setRPCID(nRPCs)


This seems to confuse two different types of retries:

Single RPCs are retried if they receive a retryable error (e.g. UNAVAILABLE). This is handled by the gRPC libraries and transparent to the Spanner client. This attempt number should be based on the number of retries that the RPC is being retried by the gRPC libraries.

Read/write transactions are retried if Spanner returns an ABORTED error. That will cause the entire transaction to be retried. That should not affect the attempt number that is used for an RPC.

product-auto-label bot added the api: spanner Issues related to the Spanner API. label Oct 29, 2024

odeke-em force-pushed the spanner-request-id-header branch 5 times, most recently from 1938e60 to 463cd50 Compare November 1, 2024 13:39

odeke-em marked this pull request as ready for review November 1, 2024 13:53

odeke-em requested review from a team as code owners November 1, 2024 13:53

odeke-em mentioned this pull request Nov 1, 2024

spanner: propagate x-goog-spanner-request-id header on every call and increment it appropriately per retry #11073

Open

5 tasks

odeke-em changed the title ~~spanner: prototype and lay down foundation for x-spanner-request-id header~~ spanner: implement generation and propagation of "x-spanner-request-id" Header Nov 1, 2024

odeke-em force-pushed the spanner-request-id-header branch from 7785a60 to 921c239 Compare November 1, 2024 14:08

odeke-em force-pushed the spanner-request-id-header branch from 3d50ba7 to 3c79e77 Compare November 5, 2024 01:55

odeke-em changed the title ~~spanner: implement generation and propagation of "x-spanner-request-id" Header~~ feat(spanner): implement generation and propagation of "x-spanner-request-id" Header Nov 5, 2024

odeke-em force-pushed the spanner-request-id-header branch 2 times, most recently from 066caaa to ef110da Compare November 5, 2024 01:59

odeke-em changed the title ~~feat(spanner): implement generation and propagation of "x-spanner-request-id" Header~~ feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header Nov 5, 2024

odeke-em force-pushed the spanner-request-id-header branch from e9edb17 to 18cd8d4 Compare November 5, 2024 04:22

rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 8, 2024

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 8, 2024

olavloite reviewed Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header #11048

feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header #11048

odeke-em commented Oct 29, 2024 •

edited

Loading

odeke-em commented Nov 3, 2024

odeke-em commented Nov 5, 2024

rahul2393 commented Nov 8, 2024

olavloite Nov 7, 2024

olavloite Nov 7, 2024

olavloite Nov 8, 2024

olavloite Nov 8, 2024

olavloite Nov 8, 2024

olavloite Nov 8, 2024

olavloite Nov 8, 2024

feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header #11048

Are you sure you want to change the base?

feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header #11048

Conversation

odeke-em commented Oct 29, 2024 • edited Loading

odeke-em commented Nov 3, 2024

odeke-em commented Nov 5, 2024

rahul2393 commented Nov 8, 2024

olavloite Nov 7, 2024

Choose a reason for hiding this comment

olavloite Nov 7, 2024

Choose a reason for hiding this comment

olavloite Nov 8, 2024

Choose a reason for hiding this comment

olavloite Nov 8, 2024

Choose a reason for hiding this comment

olavloite Nov 8, 2024

Choose a reason for hiding this comment

olavloite Nov 8, 2024

Choose a reason for hiding this comment

olavloite Nov 8, 2024

Choose a reason for hiding this comment

odeke-em commented Oct 29, 2024 •

edited

Loading