-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcsim: Synchronize calls to OptionsManager internal data structures #2907
base: main
Are you sure you want to change the base?
Conversation
As described in vmware#2906, `OptionsManager` is racy as calls to its internal data structures are unsynchronized. When multiple threads are making calls to `OptionsManager`, they can race against each other. This change protects these data structures with a mutex, to avoid the race conditions. Closes: vmware#2906
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brakthehack . Each vcsim ManagedObject already has its own lock by default, where the lock is held during any API invoked method call here:
govmomi/simulator/simulator.go
Lines 217 to 219 in d8bd5f6
ctx.Map.WithLock(ctx, handler, func() { | |
res = m.Call(args) | |
}) |
For example, the m.Call()
above would be a method such as UpdateOptions
. And WithLock()
does a Mutex Lock()
+ Unlock()
around the call.
I noticed the test in your issue creates 2 vcsim instances:
vcsim := vcsimulator.NewSimulator()
defer vcsim.Stop()
vcsim2 := vcsimulator.NewSimulator()
defer vcsim2.Stop()
Are you able to reproduce the issue with a single vcsim instance? If not, I wonder if we actually need to deep copy here:
govmomi/simulator/option_manager.go
Line 47 in d8bd5f6
m.Setting = vpx.Setting |
As the 2 vcsim instances my actually be racing on vpx.Setting
? I will take a closer look.
This Pull Request is stale because it has been open for 90 days with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @brakthehack!
@@ -30,6 +31,7 @@ import ( | |||
|
|||
type OptionManager struct { | |||
mo.OptionManager | |||
lock sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrite this as:
type OptionManager struct {
sync.Mutex
mo.OptionManager
}
That way:
func (m *OptionManager) init(r *Registry) {
m.Lock()
defer m.Unlock()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having read the rest of the change, please rewrite further:
type OptionManager struct {
sync.RWMutex
mo.OptionManager
}
There's zero reason this should be an exclusive write lock. It's far more performant to rely on shared locks for read-only operations.
@@ -55,6 +60,9 @@ func (m *OptionManager) QueryOptions(req *types.QueryOptions) soap.HasFault { | |||
body := &methods.QueryOptionsBody{} | |||
res := &types.QueryOptionsResponse{} | |||
|
|||
m.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Move the lock calls the top of this function. The only thing currently above the calls are non-expensive variable declarations, so it is well-worth moving the lock entrance to the top of the function where it's more evident.
- Rewrite to use the
RLock
/RUnlock
methods:m.RLock() defer m.RUnlock()
@@ -70,6 +78,7 @@ func (m *OptionManager) QueryOptions(req *types.QueryOptions) soap.HasFault { | |||
return body | |||
} | |||
|
|||
// find requires holding the lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either:
- Update the GoDoc to be more clear:
// find is not thread-safe and requires callers to already hold either // a shared or exclusive lock from the OptionsManager
or
- Move the function into
UpdateOptions
so only it can invokefind
, ex:func (m *OptionManager) UpdateOptions(req *types.UpdateOptions) soap.HasFault { m.Lock() defer m.Unlock() find := func(key string) *types.OptionValue { for _, opt := range m.Setting { setting := opt.GetOptionValue() if setting.Key == key { return setting } } return nil } // ...
For some reason @dougm 's comments didn't show up when I read this PR. GH must have been brain-farting, because I swore this PR showed up as "new." Anyway, Doug is correct in that each MO has its own lock domain, so my suggestions should not actually be necessary. Still, they're worth considering as general guidelines for handling concurrency. |
Description
As described in #2906,
OptionsManager
is racy as calls to its internaldata structures are unsynchronized. When multiple threads are making
calls to
OptionsManager
, they can race against each other.This change protects these data structures with a mutex, to avoid the
race conditions.
Closes: #2906
Type of change
Please mark options that are relevant:
not work as expected)
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. If applicable, please also list any relevant
details for your test configuration.
Wrote a simple test and ran it with the
-race
flag. I consistently hit race conditions without my change and none with my change:Snip:
Checklist:
CONTRIBUTION
guidelines of
this project