-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: crash on startup #74
Comments
My working thesis is that https://github.com/facebook/fboss/blob/master/fboss/agent/hw/bcm/oss/BcmControlPlane.cpp is missing an init of queueManager_ EDIT: |
Thanks for the report - can you confirm before the update everything was working correctly? (sorry if it's a dumb question - I'm trying not to sound too surprised :-) |
Also, unfortunately it's not trivial for me to test the OSS internally, so if you could compile with debugging options (similar to what's described here: https://stackoverflow.com/questions/7724569/debug-vs-release-in-cmake), we can get a stack trace with line numbers and can make a better guess as to what broke. Thanks for the interest! |
We were running at the stated commit, or at least one around that commit, in June and it certainly worked. The same switch was unboxed from storage and the only thing we did was:
We can certainly compile a version at the previously stated commit and run with it to verify the bisection window, and we will change the config back to the one we ran successfully in June. |
And we will get back to you with the result of the |
This is the log for the startup based on:
That should be the state where we ran it in June.
The transceiver error is due to I'm not having Building from master:
Not sure why the debug didn't take, do I need to do anything more than passing that to cmake? |
Sorry for the slow reply to this. I'm also surprised that the debug symbols didn't pop up for this - let me figure that out in parallel. Looking at the code and the error, I think the patch below may help. I'm trying to test it locally, but our ability to emulate a non-facebook setup inside facebook is limited :-( If you get a chance to test this before I do, please let me know if it fixes the problem.
|
also, if you could please create a new issue for the qsfp_service segfault. Even if it is because you have a funky optic, it still shouldn't segfault. I won't promise I can fix it promptly, but still good to track. Thank you again! |
Done, #77 . I'm compiling with your patch right now - I had to pull my OpenNSL 3.5.0.1 changes as well, but I guess they should be orthogonal to this issue. |
@capveg The patch seems to not work sadly, https://gist.github.com/bluecmd/bd16185170dff642de197e34349aa14c I wish the stack trace could be more useful. Granted, that particular build I did not run with the cmake debug settings, I can try to re-do that and spend some time to see if I can get line numbers if you feel it would be the next logical step. |
Quick note to say that I'll echo this behavior (and this trace) on a Wedge 100 and using the master current as of around a month ago. Stack trace is similarly not very useful. I did try to apply the patch described above with no useful results. In our case, we also don't have a functional FBOSS version to revert to, 'cause the infrastructure for a Wedge 100 FBOSS on Open Network Linux seems to be pretty broken at the moment (for quite a few other reasons). I will try a build of e440fcd and seeing if that gets us going, though. |
Still have crash on startup on |
Xcvr Asic trace info fixes and script fixes to generate the right lane and polarity swaps
Hi,
We just updated from
e440fcd0e1abfbcea5a08b18494d54bb78c561d6
to current master. With the same configuration we now get:To me it looks like queueManager_ is not initialized by the time that de9ad1f introduced the call to getMulticastQueueSettings.
We will try to debug this, will post any findings.
The text was updated successfully, but these errors were encountered: