You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When there are many routes being retried in the consumer.m_toSync of ROUTE TABLE all the time (be blocked by the Neighbor non-existance or something), the Consumer will be not able to pops() any new routes by calling the Consumer::execute() function. The amount of the retrying routes to trigger this issue depends on the shortest Timer whose priority is higher than the ROUTE TABLE Consumer. The priority of the ROUTE TABLE Consumer is 5.
Steps to reproduce the issue
Distribute routes referencing NHG 5822 which does not exist or is deleted earlier
Diliver NHG 16518
Updating all the routes to reference NHG 16518
Describe the results you received
The old routes are retrying all the time & the new routes cannot be consumed. RouteOrch stucks here.
Describe the results you expected
New routes are able to be consumed and processed by route orch properly.
Output of show version
Output of show techsupport
(paste your output here or download and attach the file here)
Root cause of this issue
In the OrchDaemon::start(), a Selectable is selected and its execute() function will be called. After that, doTask() of all orchs will be triggered and retry all the remaining tasks. Therefore, if there are enough routes being retried, and there is a Timer whose priority is higher than the ROUTE TABLE Consumer, and the interval of this Timer is shorter than the retrying duration, the ROUTE TABLE Consumer will never be selected. In other words, new routes will never be consumed.
Additional information you deem important (e.g. issue happens only occasionally):
This was triggered occasionally in our testbed where the BGP was flapping and some interfaces were shutting down & starting up. And it may contribute to this issue that we have an additional Timer whose interval is 50ms.
Possible solution
Modify the mechanism for retrying. For example, we can do the retry operation every two loops. We can also limit this change within only the route orch to narrow the influencing scope.
The text was updated successfully, but these errors were encountered:
Another problem is that the priority does not take effect at present. As is shown below, the priority of the ROUTE TABLE Consumer is 0, not 5 as defined. In this situation, the above issue won't happen.
To make the priority valid, the following changes can be applied.
Description
When there are many routes being retried in the
consumer.m_toSync
of ROUTE TABLE all the time (be blocked by the Neighbor non-existance or something), the Consumer will be not able topops()
any new routes by calling theConsumer::execute()
function. The amount of the retrying routes to trigger this issue depends on the shortest Timer whose priority is higher than the ROUTE TABLE Consumer. The priority of the ROUTE TABLE Consumer is 5.Steps to reproduce the issue
Describe the results you received
The old routes are retrying all the time & the new routes cannot be consumed. RouteOrch stucks here.
Describe the results you expected
New routes are able to be consumed and processed by route orch properly.
Output of show version
Output of show techsupport
(paste your output here or download and attach the file here)
Root cause of this issue
In the
OrchDaemon::start()
, aSelectable
is selected and itsexecute()
function will be called. After that,doTask()
of all orchs will be triggered and retry all the remaining tasks. Therefore, if there are enough routes being retried, and there is a Timer whose priority is higher than the ROUTE TABLE Consumer, and the interval of this Timer is shorter than the retrying duration, the ROUTE TABLE Consumer will never be selected. In other words, new routes will never be consumed.Additional information you deem important (e.g. issue happens only occasionally):
This was triggered occasionally in our testbed where the BGP was flapping and some interfaces were shutting down & starting up. And it may contribute to this issue that we have an additional Timer whose interval is 50ms.
Possible solution
Modify the mechanism for retrying. For example, we can do the retry operation every two loops. We can also limit this change within only the route orch to narrow the influencing scope.
The text was updated successfully, but these errors were encountered: