-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix UDP template expiry in collector process #381
base: main
Are you sure you want to change the base?
Fix UDP template expiry in collector process #381
Conversation
Prior to this change, for every template received by the UDP collector, a new goroutine with a ticker was created, responsible for "expiring" the template after the proper timeout. After expiration, the ticker was stopped and te goroutine would complete (the ticker would only ever tick once, so really a timer should have been used?). If a client were to send a refreshed template, the old goroutine / ticker would not be stopped, and the new template would be deleted by the old goroutine, which is invalid. Additionally, starting a new goroutine every time a template message is received is a bit expensive. With the new implementation, we use a timer created with time.AfterFunc. All the timers are handled by the go runtime with a single goroutine, making this very lightweight. The timers are stored as part of the template map. When a template is refreshed, we reset the existing timer. When a template is deleted, we stop the timer. This is the first in a series of patch to improve template management in the collector process. Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #381 +/- ##
==========================================
+ Coverage 67.12% 67.53% +0.40%
==========================================
Files 25 25
Lines 4465 4512 +47
==========================================
+ Hits 2997 3047 +50
+ Misses 1273 1271 -2
+ Partials 195 194 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
pkg/collector/process_test.go
Outdated
@@ -425,20 +426,49 @@ func TestCollectingProcess_DecodeDataRecord(t *testing.T) { | |||
assert.NotNil(t, err, "Error should be logged for malformed data record") | |||
} | |||
|
|||
// testClock is a wrapper around clocktesting.FakeClock. Unfortunately, FakeClock does not support | |||
// calling clock.Now() when executing an AfterFunc() function, which is require in our case. So we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// calling clock.Now() when executing an AfterFunc() function, which is require in our case. So we | |
// calling clock.Now() when executing an AfterFunc() function, which is required in our case. So we |
pkg/collector/process.go
Outdated
// scheduled for deletion by checking expiryTime. We cannot just | ||
// automatically delete the template. | ||
|
||
// No reason to try to stop the timer in this case, even though it would be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to follow the logic, is it because the expiryTimer is already nil, so there's no point to set stopTimer to true? Because I was thinking about L444, wondering if there's a case when stopTimer is true but expiryTimer is nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code executes as part of the function invoked by the timer. Broadly speaking, if the function has been invoked by the timer, then the timer has already fired and there is no need to stop it. There are some edge cases with AfterFunc
which are described in the documentation snippet above.
Essentially there are 2 cases:
- the timer fires because we need to expire the template. This is the "normal" case. The timer has fired and there is no need to stop it.
- the timer fires, but there is a concurrent call to
addTemplate
(and toAfterFunc
). This is the edge case described in the doc. We rely on the condition below (!tpl.expiryTime.After(now)
) to determine whether we should just ignore this firing of the timer. The synchronization provided bycp.mutex
is very important here. We will wait until the call toaddTemplate
returns and releases the lock. If the condition is false (which will be the case), we will not callexpiryTimer.Stop()
indeleteTemplateWithConds
(even ifstopTimer
was set to true), as the template entry has been updated and the current run of the timer function should be ignored and be a no-op.
Calling expiryTimer.Stop()
even if the timer has already been stopped is harmless. But it doesn't work well with the fake clock unfortunately. If the fake clock implementation is fixed, I may remove stopTimer
.
The nil
check at line L444 is more for defensive programming and for the TCP case.
func (cp *CollectingProcess) deleteTemplateWithConds(obsDomainID uint32, templateID uint16, stopTimer bool, condFns ...func(*template) bool) bool { | ||
cp.mutex.Lock() | ||
defer cp.mutex.Unlock() | ||
template, ok := cp.templatesMap[obsDomainID][templateID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the templateID be deleted before, and the timer cannot be stopped when we can't find the templateID in templatesMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which case / code path are you referring to?
the only case where stopTimer
should be false
is if the caller knows for certain than the timer has already expired (taking into account that condFns
should all be true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, when we remove the templateID
from a templatesMap
, shouldn't we call the expiryTimer.Stop()
? otherwise; wouldn't expiryTimer keep running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also remind me where we call deleteTemplate
? If we call deleteTemplate
after a template is expired, deleteTemplate will return false and the expiryTimer. Stop ()
won't be called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, when we remove the templateID from a templatesMap, shouldn't we call the expiryTimer.Stop()? otherwise; wouldn't expiryTimer keep running?
We do this a few lines below:
if stopTimer && template.expiryTimer != nil {
template.expiryTimer.Stop()
}
Could you also remind me where we call deleteTemplate? If we call deleteTemplate after a template is expired, deleteTemplate will return false and the expiryTimer. Stop () won't be called?
deleteTemplate
is not called right now actually. I have a follow up patch which deletes templates when TCP connections are closed, but I wanted to split PRs to keep them small. Sorry for the confusion. At the moment we call deleteTemplateWithConds
when expiring a template. During this call, we do not need to Stop the timer, as we are guaranteed that the timer has already fired (see comments in addTemplate
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. So stopTimer is mainly for TCP when we want to delete the template before it is expired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The fake clock provided as part of k8s.io/utils has some limitations: it does not conform to the specifications of the standard library time package when it comes to timers (e.g., Stop() does not return false if the timer has already been stopped), and it prevents (deadlock) clock methods such as Now() (but also the Stop() method of timers) from being called while executing a timer function (timer created with AfterFunc). To avoid non-intuitive workarounds in our code caused by these limitations, we define our own clock interface and fake clock implementation. This does not represent too much code, as the CollectingProcess only uses a limited amount of clock functions. Signed-off-by: Antonin Bas <antonin.bas@broadcom.com>
@heanlan @yuntanghsu Because there was some legitimate confusion around the use of |
Prior to this change, for every template received by the UDP collector, a new goroutine with a ticker was created, responsible for "expiring" the template after the proper timeout. After expiration, the ticker was stopped and te goroutine would complete (the ticker would only ever tick once, so really a timer should have been used?). If a client were to send a refreshed template, the old goroutine / ticker would not be stopped, and the new template would be deleted by the old goroutine, which is invalid. Additionally, starting a new goroutine every time a template message is received is a bit expensive.
With the new implementation, we use a timer created with time.AfterFunc. All the timers are handled by the go runtime with a single goroutine, making this very lightweight. The timers are stored as part of the template map. When a template is refreshed, we reset the existing timer. When a template is deleted, we stop the timer.
This is the first in a series of patch to improve template management in the collector process.