Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header #11048

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

odeke-em
Copy link
Contributor

@odeke-em odeke-em commented Oct 29, 2024

This change adds sending over the "x-goog-spanner-request-id" header
for every unary and streaming call, in the form:

    <processId>.<clientId>.<requestCountForClient>.<channelId>.<rpcCountForRequest>

where:

  • processId is a randomly generated uint32 singleton for the lifetime of a process
  • clientId is the monotonically increasing id/number of gRPC Spanner clients created
  • requestCountForClient is the monotonically increasing number of requests made by the client
  • channelId currently at 1 is the Id of the client for Go
  • rpcCountForRequest is the number of RPCs/retries within a specific request

This header is to be sent on both unary and streaming calls and it'll
help debug latencies for customers. After this change, the next phase shall
be providing a mechanism for customers to consume the requestID and log it
along with the documentation for how to accomplish that.

Updates #11073

@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Oct 29, 2024
@odeke-em odeke-em force-pushed the spanner-request-id-header branch 5 times, most recently from 1938e60 to 463cd50 Compare November 1, 2024 13:39
@odeke-em odeke-em marked this pull request as ready for review November 1, 2024 13:53
@odeke-em odeke-em requested review from a team as code owners November 1, 2024 13:53
@odeke-em odeke-em changed the title spanner: prototype and lay down foundation for x-spanner-request-id header spanner: implement generation and propagation of "x-spanner-request-id" Header Nov 1, 2024
@odeke-em
Copy link
Contributor Author

odeke-em commented Nov 3, 2024

Kindly cc-ing @olavloite @tharoldD @willpoint

@odeke-em odeke-em changed the title spanner: implement generation and propagation of "x-spanner-request-id" Header feat(spanner): implement generation and propagation of "x-spanner-request-id" Header Nov 5, 2024
@odeke-em odeke-em force-pushed the spanner-request-id-header branch 2 times, most recently from 066caaa to ef110da Compare November 5, 2024 01:59
@odeke-em
Copy link
Contributor Author

odeke-em commented Nov 5, 2024

Kindly cc-ing you @harshachinta.

@odeke-em odeke-em changed the title feat(spanner): implement generation and propagation of "x-spanner-request-id" Header feat(spanner): implement generation and propagation of "x-goog-spanner-request-id" Header Nov 5, 2024
…est-id" Header

This change adds sending over the "x-goog-spanner-request-id" header
for every unary and streaming call, in the form:

	<processId>.<clientId>.<requestCountForClient>.<channelId>.<rpcCountForRequest>

where:
* processId is a randomly generated uint32 singleton for the lifetime of a process
* clientId is the monotonically increasing id/number of gRPC Spanner clients created
* requestCountForClient is the monotonically increasing number of requests made by the client
* channelId currently at 1 is the Id of the client for Go
* rpcCountForRequest is the number of RPCs/retries within a specific request

This header is to be sent on both unary and streaming calls and it'll
help debug latencies for customers. After this change, the next phase shall
be providing a mechanism for customers to consume the requestID and log it
along with the documentation for how to accomplish that.

Updates googleapis#11073
@rahul2393
Copy link
Contributor

@odeke-em can you please fix the vet and tests issues here?

@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 8, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 8, 2024
@@ -209,7 +209,13 @@ func (t *BatchReadOnlyTransaction) partitionQuery(ctx context.Context, statement
ParamTypes: paramTypes,
}
sh.updateLastUseTime()

// PartitionQuery does not retry automatically so we don't need to retrieve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean with this? PartitionQuery retries if it receives an UNAVAILABLE error (same as most unary RPCs). See https://github.com/googleapis/googleapis/blob/master/google/spanner/v1/spanner_grpc_service_config.json for the default RPC configuration.

client := sh.getClient()
gcl, ok := client.(*grpcSpannerClient)
if ok {
gcl.setRPCID(nRPCs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume that there will be only one active request for a gRPC client at the same time. That does not seem correct for two reasons:

  1. For multiplexed sessions, we keep a pool of 4 (or more correct: numChannels) grpcSpannerClients. These clients are shared across all goroutines that use multiplexed sessions.
  2. Regular sessions can also execute requests in parallel. Those requests would also use the same grpcSpannerClient, meaning that keeping track of for example the number of (retry) attempts at the grpcSpannerClient level won't work.

func (g *grpcSpannerClient) prepareRequestIDTrackers() {
g.id = nGRPCClient.Add(1)
g.nthRequest = new(atomic.Uint32)
g.channelID = 1 // Assuming that .raw.Connection() never changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be fixed at 1. For regular sessions, we are setting the channel that should be used here: https://github.com/googleapis/google-cloud-go/blob/main/spanner/sessionclient.go#L404

For multiplexed sessions, we do that here:

p.multiplexSessionClientCounter = p.multiplexSessionClientCounter % len(p.clientPool)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above also shows why the current strategy of assuming that a grpcSpannerClient is not used in parallel by multiple goroutines is incorrect, as the the client library just keeps a pool of numChannels (default: 4) grpcSpannerClient instances for multiplexed sessions. These will be handed out in round-robin fashion to application goroutines that want to execute a query or transaction.

raw *vkit.Client
metricsTracerFactory *builtinMetricsTracerFactory

// These fields are used to uniquely track x-goog-spanner-request-id
// grpc.ClientConn is presumed to be the channel, hence channelID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw *vkit.Client is the channel, so in that sense, this could be said to be redundant. But that property does not have a simple number or other simple string representation, which means that it is probably better/easier to just use the channel pool index that was used to fetch the channel as the channel ID here. Which again means that this property is not redundant and should be assigned a value.

@@ -274,6 +274,8 @@ func (sc *sessionClient) executeBatchCreateSessions(client spannerClient, create
break
}
var mdForGFELatency metadata.MD
// Each invocation of client.BatchCreateSessions is not automatically retried
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here as for PartitionQuery; I don't quite understand what you mean with 'not automatically retried' in these cases. Could you elaborate a bit on that?

// Firstly set the number of retries as the RPCID.
gcl, ok := client.(*grpcSpannerClient)
if ok {
gcl.setRPCID(nRPCs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to confuse two different types of retries:

  1. Single RPCs are retried if they receive a retryable error (e.g. UNAVAILABLE). This is handled by the gRPC libraries and transparent to the Spanner client. This attempt number should be based on the number of retries that the RPC is being retried by the gRPC libraries.
  2. Read/write transactions are retried if Spanner returns an ABORTED error. That will cause the entire transaction to be retried. That should not affect the attempt number that is used for an RPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants