Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6.0.0-beta.7 regression, browser does not open with "spawn EBADF" error #18527

Closed
7 tasks done
sai-cb opened this issue Oct 31, 2024 · 14 comments · Fixed by #18659
Closed
7 tasks done

6.0.0-beta.7 regression, browser does not open with "spawn EBADF" error #18527

sai-cb opened this issue Oct 31, 2024 · 14 comments · Fixed by #18659
Labels
bug: upstream Bug in a dependency of Vite p4-important Violate documented behavior or significantly improves performance (priority) regression The issue only appears after a new release
Milestone

Comments

@sai-cb
Copy link

sai-cb commented Oct 31, 2024

Describe the bug

Starting with Vite 6.0.0-beta.7, the browser intermittently no longer opens when running "vite".

The console includes the following error:

Re-optimizing dependencies because lockfile has changed
Re-optimizing dependencies because lockfile has changed (x2)

  VITE v6.0.0-beta.7  ready in 10514 ms

  ➜  Local:   https://localhost:3000/
  ➜  Network: use --host to expose
  ➜  press h + enter to show help

Error: spawn EBADF
    at ChildProcess.spawn (node:internal/child_process:421:11)
    at Object.spawn (node:child_process:762:9)
    at baseOpen (file:///.../node_modules/vite/dist/node/chunks/dep-Vd7gFLqj.js:27331:34)
    at open (file:///.../node_modules/vite/dist/node/chunks/dep-Vd7gFLqj.js:27358:9)
    at file:///.../node_modules/vite/dist/node/chunks/dep-Vd7gFLqj.js:28212:7
    at new Promise (<anonymous>)
    at startBrowserProcess (file:///.../node_modules/vite/dist/node/chunks/dep-Vd7gFLqj.js:28211:5)

--

Conditions

This bug is intermittent and making a small reproducible case is difficult:

  • It only occurs apparently on a large codebase
  • It is more likely to occur the more optimizeDeps.include is used in the config
  • It only occurs when yarn.lock was modified, forcing dependencies re-optimization

--

Details about the crash

The crash is a synchronous crash on exec and spawn calls such as this one:
https://github.com/vitejs/vite/blob/main/packages/vite/src/node/server/openBrowser.ts#L144

What is peculiar is that the callback of "exec" is never called. The exec call is throwing.

function execAsync(command, options) {
  return new Promise((resolve, reject) => {
    console.log('exec is being called with', JSON.stringify({command, options}));
    try {
      exec(command, options, (error, stdout) => {
	console.log('exec callback called');
        if (error) {
          reject(error);
        } else {
          resolve(stdout.toString());
        }
      });
    } catch (e) {
      console.log('exec did throw');
      throw e;
    }
    console.log('exec was called');
  });
}

Result:

exec is being called with {"command":"ps cax"}
exec did throw

--

The regression commit

I ran a git bisect on the Vite repo and identified the following commit as the source for the regression:

feat!: update to chokidar v4

Because the bug is intermittent, the above finding is not guaranteed to be reliable, although I did re-run enough time to have a relatively high level of confidence in the finding.

--

I am very sorry for not providing a reproducible case. I was not able to create one but felt the above was likely still precise enough to be valuable information (particularly the git bisect).

This bug seems to be a scalability issue that is more likely to happen in large codebases where a lot of dependencies need to go through optimizeDeps.

The way node.js internal utils are crashing also seems indicative of a larger issue (e.g. is chokidar used in a manner that saturates node.js somehow?) which could potentially manifest in other ways and cause additional bugs elsewhere...

Reproduction

Steps to reproduce

Intermittent, to increase the chances of reproducing:

  1. Test on a very large codebase
  2. Use a configuration that is demanding on optimizeDeps such as the following:
  environments: {
    client: {
      optimizeDeps: {
        include: ['lodash/*']
      },
    },
    ssr: {
      optimizeDeps: {
        include: ['lodash/*']
      },
      resolve: {
        noExternal: [/lodash\//]
      }
    }
  }

To reproduce:

  1. Edit yarn.lock (to trigger a re-optimization on next Vite run)
  2. Run vite

System Info

System:
    OS: macOS 15.1
    CPU: (10) arm64 Apple M2 Pro
    Memory: 2.59 GB / 32.00 GB
    Shell: 5.9 - /bin/zsh
  Binaries:
    Node: 20.17.0 - ~/.nvm/versions/node/v20.17.0/bin/node
    Yarn: 4.1.0 - ~/.nvm/versions/node/v20.17.0/bin/yarn
    npm: 10.8.2 - ~/.nvm/versions/node/v20.17.0/bin/npm
    Watchman: 2024.08.26.00 - /opt/homebrew/bin/watchman
  Browsers:
    Chrome: 130.0.6723.91
    Firefox: 128.0
    Safari: 18.1
  npmPackages:
    vite: 6.0.0-beta.7 => 6.0.0-beta.7

Used Package Manager

yarn

Logs

No response

Validations

@sapphi-red
Copy link
Member

Does it work if you set server.watch.usePolling: true?

@sai-cb
Copy link
Author

sai-cb commented Oct 31, 2024

I just tested and yes, server.watch.usePolling: true does seem to be one workaround for the issue.

Tested six attempts, three with and three without.

@sapphi-red
Copy link
Member

chokidar v4 stopped using the fsevents package for macos and uses fs.watch provided by Node.js instead. usePolling option tells chokidar to use fs.watchFile. So it seems it's caused by fs.watch somehow.

Does it work if you increase the file descriptor limit? I'm thinking whether fs.watch uses more file descriptors than fsevents and that is causing the problem..

@bluwy
Copy link
Member

bluwy commented Oct 31, 2024

Yeah it does seem like a node issue. I wonder if using node 22 helps a bit. To workaround this, perhaps we can also do a retry for the exec if it fails with this error, but it looks like it could affect other code that calls exec too.

@sai-cb
Copy link
Author

sai-cb commented Oct 31, 2024

Increasing ulimit

Does it work if you increase the file descriptor limit?

Unless I'm doing it wrong, this didn't seem to help.

Default ulimit:

ulimit -n
256

I did override it like this, and the issue persisted:

ulimit -n 65536
ulimit -n
65536

--

Node v22

I wonder if using node 22 helps a bit

Just switched to v22 with nvm, it didn't seem to make a difference.

--

Retrying

To workaround this, perhaps we can also do a retry for the exec if it fails with this error, but it looks like it could affect other code that calls exec too.

At a first glance, this doesn't seem to be possible. The current crash is already a crashing spawn that's a fallback to a crashing exec. It appears like any call to exec or spawn anywhere would be affected.

I've also just tested an async spawn, 10s after the first crash, and it does crash again.

try {
  const subprocess = childProcess.spawn(command, cliArguments, childProcessOptions);
} catch(e) {
  setTimeout(() => {
    console.log('Trying again');
    childProcess.spawn(command, cliArguments, childProcessOptions);
  }, 10000);
  throw e
}
Trying again
node:internal/child_process:420
    throw new ErrnoException(err, 'spawn');
    ^

Error: spawn EBADF
    at ChildProcess.spawn (node:internal/child_process:420:11)
    at Object.spawn (node:child_process:759:9)
    at Timeout._onTimeout (file:///.../node_modules/vite/dist/node/chunks/dep-Vd7gFLqj.js:27365:18)
    at listOnTimeout (node:internal/timers:594:17)
    at process.processTimers (node:internal/timers:529:7) {
  errno: -9,
  code: 'EBADF',
  syscall: 'spawn'
}

Node.js v22.11.0

 >  NX   Command failed with exit code 1: vite

--

Number of fs.watch calls

To provide extra information, I wrapped the fs.watch call in createFsWatchInstance in the dep-Vd7gFLqj.js chunk to see how many concurrent fs.watch were active.

At the moment the crash is happening, we have around 16723!

And as I navigate around in the app, the number slowly increases to 17268. But seems relatively stable. It does not increase further when reaching twice for the same modules from the dev server.

On the surface, all files being watched seem appropriate. Only 20-30 files at the root of the repo could be skipped, otherwise watched files are all typescript files that are part of the app. Nothing in node_modules is being watched.

--

@bluwy
Copy link
Member

bluwy commented Oct 31, 2024

Thanks for the very thorough checks here. The consistent crash with spawn looks quite concerning to me. I think fsevents in chokidar v3 worked better on macos before as it watches by directories rather than files.

It looks like fs.watch is able to watch by directories too if the path is passed a directory. I don't know if it'll make a difference though and that'll have to be something implemented in chokidar. The recursive option is also a no go as it doesn't work on linux.

If all doesn't work and we don't have a reasonable workaround, I'm not sure what alternatives we'd have besides downgrading chokidar for now.

@sai-cb
Copy link
Author

sai-cb commented Oct 31, 2024

It looks like fs.watch is able to watch by directories too if the path is passed a directory

One interesting observation about this; That I see, fs.watch is called with the root directories of the app first, then it is called for every files in them thereafter. Maybe this is deliberate for some sophisticated reason, but that seemed odd.

@sai-cb
Copy link
Author

sai-cb commented Oct 31, 2024

Some extra food for thought;

Since this issue only happens on dep re-optimization and is more likely to happen with certain optimizeDeps configs, I suspect that in our case it happens a result of chokidar and tinyglobby competing for file descriptors.

In other words, the chokidar update may be the one that pushed us over the edge for the first time, but the scalability issue may be kinda holistically larger in nature.

It is also hard to know how many Vite users would be impacted for that reason.

@sapphi-red
Copy link
Member

Unless I'm doing it wrong, this didn't seem to help.

Hmm, would it work if you run ulimit with sudo as well?

sudo ulimit -n 65536
ulimit -n 65536
npx vite

The recursive option is also a no go as it doesn't work on linux.

It seems it's implemented in Node 20.13+, 22.1+ (paulmillr/chokidar#1304 (comment)).

It looks like fs.watch is able to watch by directories too if the path is passed a directory

One interesting observation about this; That I see, fs.watch is called with the root directories of the app first, then it is called for every files in them thereafter. Maybe this is deliberate for some sophisticated reason, but that seemed odd.

IIUC, the fs.watch for directories is needed to detect a new file created in that directory and the fs.watch for each files is needed to detect the file change.

@bluwy
Copy link
Member

bluwy commented Nov 1, 2024

It seems it's implemented in Node 20.13+, 22.1+

Ah you're right. I was reading the node 18 docs. Even with recursive support, it looks like the linux implementation still isn't the most efficient, and it doesn't support ignoring when watching recursively.

IIUC, the fs.watch for directories is needed to detect a new file created in that directory and the fs.watch for each files is needed to detect the file change.

I tested locally and watching the directory also seem to detect its file changes. Could be specific to the machine though

@sapphi-red
Copy link
Member

I tested locally and watching the directory also seem to detect its file changes. Could be specific to the machine though

Ah, it worked for me on Windows as well. Maybe it's worth exploring.

@sai-cb
Copy link
Author

sai-cb commented Nov 1, 2024

Hmm, would it work if you run ulimit with sudo as well?

Just tested and no luck. The bug remains reproducible.

@ghiscoding
Copy link
Contributor

I wonder if @parcel/watcher would still be out of the question in this case? As per #13593

@bluwy
Copy link
Member

bluwy commented Nov 6, 2024

I think @parcel/watcher has always been on the table, but it's not a direct replacement of chokidar. IMO it's closer to Node's fs.watch and missing more of chokidar's "manager"-like API. And in any case if we do support some form of it, we may want to support it on the side like lightningcss first before committing to replacing chokidar I think.

@bluwy bluwy added bug: upstream Bug in a dependency of Vite p4-important Violate documented behavior or significantly improves performance (priority) regression The issue only appears after a new release and removed pending triage labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: upstream Bug in a dependency of Vite p4-important Violate documented behavior or significantly improves performance (priority) regression The issue only appears after a new release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants