vcsim: Synchronize calls to OptionsManager internal data structures #2907

brakthehack · 2022-07-19T21:13:43Z

Description

As described in #2906, OptionsManager is racy as calls to its internal
data structures are unsynchronized. When multiple threads are making
calls to OptionsManager, they can race against each other.

This change protects these data structures with a mutex, to avoid the
race conditions.

Closes: #2906

Type of change

Please mark options that are relevant:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to
not work as expected)
This change requires a documentation update
Build related change

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. If applicable, please also list any relevant
details for your test configuration.

Wrote a simple test and ran it with the -race flag. I consistently hit race conditions without my change and none with my change:

Snip:

	vcsim := vcsimulator.NewSimulator()
	defer vcsim.Stop()

	vcsim2 := vcsimulator.NewSimulator()
	defer vcsim2.Stop()

	t.Run("test", func(t *testing.T) {
		client := vcsim.Client.Client
		client2 := vcsim2.Client.Client
		m := object.NewOptionManager(client, client.ServiceContent.Setting.Reference())
		m2 := object.NewOptionManager(client2, client2.ServiceContent.Setting.Reference())
		ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
		defer cancel()
		for {
			select {
			case <-ctx.Done():
				return
			default:
			}
			// adding the new options to the simulated VC
			opt := fmt.Sprintf("config.%d", rand.Int())
			go m.Update(ctx, []vimtypes.BaseOptionValue{
				&vimtypes.OptionValue{Key: opt, Value: "bar"},
			})

			// adding the new options to the simulated VC
			opt2 := fmt.Sprintf("config.%d", rand.Int())
			go m2.Update(ctx, []vimtypes.BaseOptionValue{
				&vimtypes.OptionValue{Key: opt2, Value: "bar"},
			})

			m.Query(ctx, opt)
		}

Checklist:

My code follows the CONTRIBUTION
guidelines of
this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged

As described in vmware#2906, `OptionsManager` is racy as calls to its internal data structures are unsynchronized. When multiple threads are making calls to `OptionsManager`, they can race against each other. This change protects these data structures with a mutex, to avoid the race conditions. Closes: vmware#2906

dougm

Thanks @brakthehack . Each vcsim ManagedObject already has its own lock by default, where the lock is held during any API invoked method call here:

govmomi/simulator/simulator.go

Lines 217 to 219 in d8bd5f6

    
           ctx.Map.WithLock(ctx, handler, func() { 
        
           	res = m.Call(args) 
        
           })

For example, the m.Call() above would be a method such as UpdateOptions. And WithLock() does a Mutex Lock() + Unlock() around the call.

I noticed the test in your issue creates 2 vcsim instances:

        vcsim := vcsimulator.NewSimulator()
	defer vcsim.Stop()

	vcsim2 := vcsimulator.NewSimulator()
	defer vcsim2.Stop()

Are you able to reproduce the issue with a single vcsim instance? If not, I wonder if we actually need to deep copy here:

govmomi/simulator/option_manager.go

Line 47 in d8bd5f6

m.Setting = vpx.Setting

As the 2 vcsim instances my actually be racing on vpx.Setting ? I will take a closer look.

github-actions · 2022-10-20T01:31:23Z

This Pull Request is stale because it has been open for 90 days with
no activity. It will automatically close after 30 more days of
inactivity. Mark as fresh by adding the comment /remove-lifecycle stale.

akutz

Thanks for the PR @brakthehack!

akutz · 2022-12-15T06:49:40Z

simulator/option_manager.go

@@ -30,6 +31,7 @@ import (

 type OptionManager struct {
 	mo.OptionManager
+	lock sync.Mutex


Rewrite this as:

type OptionManager struct { sync.Mutex mo.OptionManager }

That way:

func (m *OptionManager) init(r *Registry) { m.Lock() defer m.Unlock()

Also, having read the rest of the change, please rewrite further:

type OptionManager struct { sync.RWMutex mo.OptionManager }

There's zero reason this should be an exclusive write lock. It's far more performant to rely on shared locks for read-only operations.

akutz · 2022-12-15T06:55:23Z

simulator/option_manager.go

@@ -55,6 +60,9 @@ func (m *OptionManager) QueryOptions(req *types.QueryOptions) soap.HasFault {
 	body := &methods.QueryOptionsBody{}
 	res := &types.QueryOptionsResponse{}

+	m.lock.Lock()


Move the lock calls the top of this function. The only thing currently above the calls are non-expensive variable declarations, so it is well-worth moving the lock entrance to the top of the function where it's more evident.

Rewrite to use the RLock/RUnlock methods:
m.RLock() defer m.RUnlock()

akutz · 2022-12-15T07:02:46Z

simulator/option_manager.go

@@ -70,6 +78,7 @@ func (m *OptionManager) QueryOptions(req *types.QueryOptions) soap.HasFault {
 	return body
 }

+// find requires holding the lock.


Either:

Update the GoDoc to be more clear:
// find is not thread-safe and requires callers to already hold either // a shared or exclusive lock from the OptionsManager

or

Move the function into UpdateOptions so only it can invoke find, ex:
func (m *OptionManager) UpdateOptions(req *types.UpdateOptions) soap.HasFault { m.Lock() defer m.Unlock() find := func(key string) *types.OptionValue { for _, opt := range m.Setting { setting := opt.GetOptionValue() if setting.Key == key { return setting } } return nil } // ...

akutz · 2022-12-15T07:05:26Z

For some reason @dougm 's comments didn't show up when I read this PR. GH must have been brain-farting, because I swore this PR showed up as "new."

Anyway, Doug is correct in that each MO has its own lock domain, so my suggestions should not actually be necessary. Still, they're worth considering as general guidelines for handling concurrency.

vmwclabot added the cla-not-required label Jul 19, 2022

dougm reviewed Jul 21, 2022

View reviewed changes

github-actions bot added the lifecycle/stale label Oct 20, 2022

github-actions bot closed this Nov 20, 2022

dougm reopened this Nov 29, 2022

dougm added lifecycle/frozen and removed lifecycle/stale labels Nov 29, 2022

akutz requested changes Dec 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vcsim: Synchronize calls to OptionsManager internal data structures #2907

vcsim: Synchronize calls to OptionsManager internal data structures #2907

brakthehack commented Jul 19, 2022

dougm left a comment

github-actions bot commented Oct 20, 2022

akutz left a comment

akutz Dec 15, 2022

akutz Dec 15, 2022

akutz Dec 15, 2022

akutz Dec 15, 2022

akutz commented Dec 15, 2022

vcsim: Synchronize calls to OptionsManager internal data structures #2907

Are you sure you want to change the base?

vcsim: Synchronize calls to OptionsManager internal data structures #2907

Conversation

brakthehack commented Jul 19, 2022

Description

Type of change

How Has This Been Tested?

Checklist:

dougm left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 20, 2022

akutz left a comment

Choose a reason for hiding this comment

akutz Dec 15, 2022

Choose a reason for hiding this comment

akutz Dec 15, 2022

Choose a reason for hiding this comment

akutz Dec 15, 2022

Choose a reason for hiding this comment

akutz Dec 15, 2022

Choose a reason for hiding this comment

akutz commented Dec 15, 2022