-
-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem with USB device reboot loop #970
Comments
|
From the PCap I see two (similar) occurrences:
Same happens on lines: 13140, 13141, 13170, 13171, 13172, 13173 The only thing weird here is the first line: the IN pipe errors with EPIPE. I don't know why what is. What I do know is that EPIPE is a translation of a VBoxUSB driver specific return code. Maybe it should be translated to something else. |
What I can say is that on v2.4.1 it worked but here I can get a BSOD on the server side very soon. So the difference should be between v2.4.1 and v3.0.0. I also tried to use v4.2.0 and put the Dirvers from v2.4.1 in the build but that didn't solve the issue so I think it's not a driver related thing. As client I'm using https://github.com/vadimgrn/usbip-win2 latest version. In the usbip-ude.txt log file you can see the "endpoint_purge" and I think these are the 6 endpoints of the USB device. It seems when stop rotating the knob all 6 endpoints get purged and perhaps this leads to the reboot. Here are further investigations concerning the client code vadimgrn/usbip-win2#75 (comment). Here are the logs: 2.) usbipd v4.2.0 - here the device was attached and started with rotating a knob - the device is running while rotating the knob - stop rotating the knob after 2 seconds it reboots - then I detached it 3.) usbbipd v2.4.1 - here the device was attached and started normally - device works normal without reboot - giving some inputs rotating a knob - then I detached it - as mentioned v2.4.1 can lead to BSOD on the usbipd server side. |
I also need the logs from The "translation" from the endpoint error to EPIPE is done by |
I've done the logging with |
I need the console output. |
Here is the output copied from console (usbipd v4.2.0 with 3 reboots): And here is console output (usbipd v2.4.1 without reboots): |
Nothing that I can find so far. Do you also have a "usbip-ude.txt" for 2.4.2, so I can maybe find the difference? |
Thank you so far. Here are the logs with v2.4.1 (start capture - attaching the device - rotating a knob - detaching - stop capture):. another idea / guess: |
No, not all endpoints are purge (despite what the usbip-ude files tells you). Only input endpoint 3, and only after it first errors (probably due to a inactivity/timeout). Which happens on both 2.4.1 and on 4.2.0. The only difference is that on 2.4.1, the sequence was always:
And on 4.2.0 it is improved to:
There is always a race condition when canceling/unlinking an URB: either the URB completes ("wins") before you could actually cancel it, or the cancelation "wins". The last option is actually the better one, and the first can still sporadically happen but causes a lot of logging in syslog. In 2.4.1, we always reported like the first option happened, which caused a lot of people to complain about excessive logging. In 4.2.0, option 2 is used (if we can), unless the race really was won by the URB. The Pcaps for 2.4.1 and 4.2.0 are identical, except for the fact that 2.4.1 did not yet log UNLINK. This is the only difference. |
So what can I try in the code to test v4.2.0 with the option used in v2.4.1. Or can I define a condition if this "deviceid:vendorid" is used then try to use option 1 and if not option 2. At the moment I can't understand exactly the two options. I would implement the code v4.2.0. if you could give a code snippet to test and recompile if that would work. And if it is a timeout issue can this be temporarily bridged in the code? Perhaps here? usbipd-win/Usbipd/AttachedClient.cs Line 54 in 514abfa
Or here? usbipd-win/Usbipd/AttachedEndpoint.cs Line 424 in 514abfa
|
You can try replacing the following line: usbipd-win/Usbipd/AttachedClient.cs Line 88 in 986bf84
with simply:
This will make UNLINK believe that it lost the race condition. It will also make every URB return its result, even if it was canceled (as it won't be removed from the queue). That will mimic old behavior. Not exactly, though, as the ordering of URBs and UNLINKs may be different. Can you also post the syslog of:
syslog reports UNLINKs quite well, so it could give some more hints as to what is happening. I still don't understand what is going wrong exactly. |
I changed to Can it be that also here is the problem? Is it possible to control which endpoints are canceled? usbipd-win/Usbipd/AttachedClient.cs Line 68 in 986bf84
Perhaps it has also to do something with handling the endpoints and here it purges somehow an endpoint which reboots the device. So can you define somehow which endpoints could be influenced or not? Perhaps one endpoint is used for firmware updates and when this is triggered it goes into a boot mode. On the other hand the device works on v2.4.1 for about 30 seconds then on the usbipd side Windows gets a BSOD und crashes. |
Only endpoint 3 is canceled. The comment is about canceling (unlinking) individual requests (URBs) on that endpoint. This is not possible with the VBoxUSB driver. But in your case there is only 1 outstanding URB on endpoint 3, so this does not apply.
No, this is not it. Only endpoint 3 is canceled, as it is requested by the client. I know that usbip-ude.txt mentions purging the other endpoints, but in practice no such calls are being made (they do not show up in the Pcaps).
Never mind. I was thinking about Linux. On Windows, you currently have all the logging that is available.
I still don't know what is the root cause of the problem. It starts with an EPIPE on the input endpoint after you stop providing input to the device. But this also shows up in the 2.4.1 logs, and nothing has changed there. Then it unlinks the next URB, which leads to an AbortPipe (also on both 2.4.1 and 4.2.0); that too has not changed. The only change between 2.4.1 and 4.2.0 is how the cancelation is reported back to the client, and you have tried both methods now. For the time being, you could checkout the 2.4.1 source, and backport the BSOD fix: #602 |
I will try to do that. Also the developer of usbip-win mentioned that it could be something with power management and selective suspend: But the device normally doesn't go into a powersaving mode. As you can see in the video it goes directly after 1 second in reboot after stop doing somthing on the device. You told that endpoint 3 is canceled and this endpoint is used as "Application Sepcific Interface/Device Firmware Update". I think that's the reason for the reboot. So how could you prevent from canceling endpoint no. 3? Thanks for your investigation. I'm not giving up to finding out the reason yet. I want to know what it is. |
Endpoint 3 seems to be used for everything. It is also the one used to report the knob data. For some reason (unknown to me), the device STALLs the URB if there is no actual input. This is weird, but not necessarily a problem. Next (after the EPIPE), a new URB is requested, apparently to replace the previously EPIPEd input URB. This is also normal. Next, it gets unlinked. This is really weird, I don't know why the client does that. But it also does it using 2.4.1, so it shouldn't be a problem either. You could just not cancel the URB upon unlink. But that goes completely against the specs. UNLINK should really complete or cancel the URB as soon as possible. The cancelation happens here: usbipd-win/Usbipd/AttachedEndpoint.cs Line 499 in 986bf84
You could skip that by replacing: usbipd-win/Usbipd/AttachedEndpoint.cs Line 478 in 986bf84
with:
That simply makes UNLINK a no-op. But I doubt it is going to work correctly. |
I've found something new. I checkout to v2.4.1 and replaced exactly these lines #602: I think I can narrow it down further:
With these lines I don't get a BSOD here, but the reboot issue begins. So it has to do with that code. When I remove these lines again and go back to v2.4.1 then the USB device works as it should but I get than the BSOD. So do you know how to solve this dilemma? And can now also something be changed in v4.2.0 with this? |
It is related, but not the problem.
It looks like the device is supposed to work like this: there is always one outstanding read request. And in the meantime there are a few write requests. Always with data. The read requests all return within around 20 ms. Then something goes wrong (reason unknown). The last successful read (with data) is line 1836. The next request is issued at line 1838. This gets completed at line 1882. However, there is no data (0 bytes read). It also took 32 ms to complete (longer than usual). The next read request gets scheduled immediately (line 1883). From this moment on, the writes are also 0-length, starting at line 1884. Things are now not working correctly. (NOTE: the BSOD fix actually fixes these 0-length writes). But I think the problem is that everything is now 0-length; there is no actual data communicated anymore. It ends with one of the read requests actually STALLING (that's a USB-term, the Linux error code is EPIPE), at line 1906. Another read is scheduled at line 1907, which never completes. The device driver asks for cancellation after 2 seconds by sending an UNLINK. Now the device reboots. Maybe (just maybe) we can insert an extra STALL recovery right after the first stall is detected. Normally the device driver should do that, but for some reason it just asks for another read (while a stall has already occurred). That one of course never completes, and after 2 seconds the driver gives up, purges everything and reconfigures the device (which you see as a reboot). I'll try to add something like "auto-stall-recovery" in NOTE: once the stall occurs, all writes are 0-length, which (under circumstances/depending on bad luck) could previously lead to BSOD. The fix and your reboot are not related; they are just happening around the same time. |
That would be great if this will work. I tested to modify something in the else statement after that line (in v2.4.1 the code is in AttachedClients.cs, in v4.2.0 the code is found here in AttachedEndpoint.cs) here usbipd-win/Usbipd/AttachedEndpoint.cs Line 335 in 514abfa
|
When do you think you could find the solution for this? |
I've looked into it, but if I add an automatic pipe reset, everything fails. It is not supposed to happen like that. The client (or the client hub) is supposed to do the pipe reset. This happens normally for all USB devices, every now and then (especially with poor cables): PIPE stalls, device reports EPIPE, client (or hub, I don't know) resets pipe. For some reason (in your logs), instead of clearing the pipe after receiving an EPIPE stall indicator, another read request is made. This is not supposed to happen on a stalled pipe. The client (or the hub) should have reset the pipe first. What client are you using? You said something about a Windows client? |
I'm using this USBIP client https://github.com/vadimgrn/usbip-win2. As said this USB device works normally with this USBIP client from Windows to Linux USBIP server. The developer told that he probably will begin to develop a USBIP server. But for now your usbipd is the only way for Windows. If you have any idea what to do please let me know to test it out. Here is also the driver code of USBIP client and eventually you can see if here is something which causes the Pipe stall / EPIPE https://github.com/vadimgrn/usbip-win2/tree/master/drivers/ude. |
What would really help is a "known good" capture when the device is physically connected to a native Linux host. You can use wireshark or tshark to capture a PcapNg file from the Linux USB monitoring device. |
Here is a fresh usbip capture. Used USBIP server from Ubuntu 24.04 and following command following steps were captured:
I hope you can read it out and come to a result. Thanks a lot. |
Maybe the info is in there somewhere, but better capture the USB traffic itself (and not the USBIP network transport layer). See https://serverfault.com/questions/1104416/how-to-capture-usb-traffic-using-wireshark-in-linux-cli. You may need to load a module (modprobe xxx) for the usbmon device.. |
Ok, here is the capture from the USB device directly connected to the linux system (no USBIP). You have also some infos of the USB device done with following steps were captured:
|
What is most striking, is that with I need to figure out why that is. No time now, but this is very interesting! Will get back to you. |
Can you please say if you can look again into the issue because I can test this device only for this week. |
I just compared the USB capture from Did you do a firmware upgrade in-between? Because the device configuration is different (they are the same device, right?): Windows: (NOTE: bcdUSB: 0x0210, this is supposed to be the USB version) Linux: (NOTE: bcdUSB: 0x0200) Also, the reported interfaces and endpoints are different, which explains my earlier comment about the pipe numbers. Windows: (NOTE: pipe IN and OUT are both 3 & additional interface 5.0) Linux: (NOTE: pipe IN 3 and OUT 4 & no interface 5.0) We are comparing apples and oranges... |
I can't remember which file was from which date. Yes there was a firmware update but the last send files should be on the same firmware. The device is the same. So what I can do is to make it more simpler for you? The FILE_2.zip was from 08.06.2024 and the usb_capture.zip from 24.06.2024 and in between there was a firmware upgrade. The last two send files usbip_capture.zip and usb_capture.zip are created on 24.06.2024 and had the same firmware, usbip_capture.zip captured the USBIP traffic and usb_capture.zip captured the local attached USB traffic. |
I am trying to compare the Linux native USB traffic (your last usb_capture.zip) with the one from Windows via Can you make another |
Here are the USBIPD Logs. If you could achieve something today it would be great because the device is only available for today and I will probably only get hands on the device again in 1 or 2 month. |
These look similar to the previous Windows logs: 5 interfaces, interface 4 has both IN 3 and OUT 3 pipes. And EPIPE after effective communication has stopped (0 byte payloads). Unfortunately, I cannot compare them to the Linux captures (which shows a different endpoint configuration). It still looks like the captures are either from different devices, or from different firmware revisions; sorry. |
I will look at it. I just noticed that when connected locally to Linux the device asks for a firmware update and then it asks for firmware v2.0 update. If you connect it locally to Windows, then it also wants to update to v2.1. So I have to try to set the device to a fixed firmware and not change it ignoring the update questions. I'll then do all the logs again and post them here. |
So, here are the new logs. Firtsly the firmware of the device was updated to version 2.2.1 on Windows. Than the capture (file and console) was done for USBIPD. Than with that actual firmware version 2.2.1 the device was connected locally on Ubuntu and captured the USB (with tcpdump of usbmon) and lastly it was captured the USBIP traffic (with tcpdump of tcp port 3240). I hope you can now compare the results. |
I just did another capture of the locally connected USB device on Windows with wireshark USBPcap. It's added in the win folder. Update: |
And could you make any progress with my last sent log files? What do you think about it? |
With usbipd v4.2.0 this device reboots when you stop giving input signals as seen in the video. You can see it here if you stop rotating the controller the device reboots. As long as you rotate a knob or operate other controllers the device stays on. If you do nothing, the device will reboot about 1 second later. The device should stay stable without rebooting. When I go back to usbipd v3.0.0 the device works fine (it stays stable without reboot loops) but usbipd v3.0.0 can lead to other troubles (BSOD) so that's no alternative. So I guess that something changed in v4.2.0 which causes this issue for this USB device. The device also works for Windows (Client) to Linux (Server). Only Windows (Client) to Windows (Server) has this issue. I also tried
usbipd bind -b 1-4 --force
without an effect.usbip_boot_loop.mp4
Here are captures of usbipd server (start capture - attaching the device - 3 reboots - detaching - stop capture):
FILE.zip
Here are captures of USBIP client (start capture - attaching the device - rotating knob - stopped rotating and device reboots - again rotation knob pressing a button - stopped doing anything and the device reboots again - stop capture):
usbip-ude.txt
usbip-flt.txt
What is the problem and how can I get the USB device to work?
Update:
Comparing usbipd v3.0.0 with v3.1.0 leads to the assumption that AttachedClient.cs may be the cause:
usbipd-win/Usbipd/AttachedClient.cs
Line 67 in 514abfa
I guess that here the AttachedEndpoint class closes all endpoints and that leads to a reboot of the device.
The text was updated successfully, but these errors were encountered: