The runner has received a shutdown signal. #7188

JakubMosakowski · 2022-12-07T11:16:49Z

JakubMosakowski
Dec 7, 2022

Description

Since yesterday, our GitHub action builds started to randomly fail (we didn't change anything in our configuration). The error is not very precise, unfortunately.

The process is stopped in random stages of the build (but always after at least 15 minutes or so). Even if the build passes it takes much longer than before (~25 min clean build to ~35 min now).

2022-12-07T10:18:10.5771753Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2022-12-07T10:18:10.7098386Z ##[error]The operation was canceled.
2022-12-07T10:18:10.7710701Z Cleaning up orphan processes
2022-12-07T10:18:10.8338404Z Terminate orphan process: pid (1849) (java)

Sometimes before the shutdown signal, there is also such log:
Idle daemon unexpectedly exit. This should not happen.

Workflow passes normally on builds that are shorter (for example those from cache).

Platforms affected

Azure DevOps
GitHub Actions - Standard Runners
GitHub Actions - Larger Runners

Runner images affected

Image version and build link

Image: ubuntu-22.04
Version: 20221127.1
Current runner version: '2.299.1'

Unfortunately, it happens on the private repo.

Is it regression?

No

Expected behavior

Job should pass

Actual behavior

Job fails

Repro steps

Looks similar to: #6680

erik-bershel · 2022-12-07T14:29:02Z

erik-bershel
Dec 7, 2022
Collaborator

@JakubMosakowski we cannot do any investigation without additional info.
I see that your machine got the shutdown signal. Most often, this means that the resources consumed in the process exceeded the limits. We can theoretically check whether this is so. But we need to see an example of the pipeline that caused the outage and links to failed uses. Even if they belong to a private repository.

0 replies

JakubMosakowski · 2022-12-07T14:44:46Z

JakubMosakowski
Dec 7, 2022
Author

Sure.

Examples of failing builds:
https://github.com/SpotOnInc/android-omnichannel/actions/runs/3638760141
https://github.com/SpotOnInc/android-omnichannel/actions/runs/3638651891
https://github.com/SpotOnInc/android-omnichannel/actions/runs/3638619301
https://github.com/SpotOnInc/android-omnichannel/actions/runs/3638519339
https://github.com/SpotOnInc/android-omnichannel/actions/runs/3638515786

0 replies

JakubMosakowski · 2022-12-07T14:48:04Z

JakubMosakowski
Dec 7, 2022
Author

The interesting part is that it doesn't seem to be related to any of our changes. I created a branch that is reverted by the last X commits (to the point in history where our builds were smooth) and they are not passing anymore.

0 replies

mvarrieur · 2022-12-15T18:18:28Z

mvarrieur
Dec 15, 2022

We are also seeing this after upgrading our self hosted runners from 20.04 to 22.04 with no other seemingly related changes. Do the 22.04 runners have more conservative limits even when using self hosted?

0 replies

ihor-panasiuk95 · 2022-12-21T08:31:55Z

ihor-panasiuk95
Dec 21, 2022

The same happens to us in private repo.
Builds started to randomly fail with this error:

We didn't do any significant changes to workflows.

0 replies

erik-bershel · 2022-12-21T10:29:37Z

erik-bershel
Dec 21, 2022
Collaborator

Hi @ihor-panasiuk95, please send me a links to workflow runs both with positive and negative results.

0 replies

ihor-panasiuk95 · 2022-12-21T11:02:23Z

ihor-panasiuk95
Dec 21, 2022

@erik-bershel will you be able to visit them taking into account that they are in private repo?

0 replies

erik-bershel · 2022-12-21T11:31:29Z

erik-bershel
Dec 21, 2022
Collaborator

@ihor-panasiuk95 it's not a problem. There is no need to check what is going on in your private repository in the first step. I want to check the load on agents and compare successful and failed jobs. If the information is not enough, then we will discuss the repro-steps. For example: https://github.com/owner/repo/actions/runs/runID or https://github.com/erik-bershel/erik-tests/actions/runs/3680567148.

0 replies

ihor-panasiuk95 · 2022-12-21T11:54:32Z

ihor-panasiuk95
Dec 21, 2022

@erik-bershel
Negative - https://github.com/anecdotes-ai/frontend/actions/runs/3742047670
Positive (I replaced ubuntu-latest with ubuntu-22.04 and it started to work) - https://github.com/anecdotes-ai/frontend/actions/runs/3748531101

0 replies

qhy040404 · 2022-12-21T12:45:37Z

qhy040404
Dec 21, 2022

I find that this issue only occurs when using ubuntu-latest which means ubuntu-22.04. However, it doesn't happen when using ubuntu-20.04
Negative - https://github.com/qhy040404/LibChecker/actions/runs/3747087072
Positive - https://github.com/qhy040404/LibChecker/actions/runs/3748900876/jobs/6366765819

0 replies

aronatkins · 2023-01-13T16:33:49Z

aronatkins
Jan 13, 2023

We are also seeing these errors regularly. Link to one of our most recent runs: https://github.com/rstudio/connect/actions/runs/3912304431/jobs/6687076068

The output from the job (a docker compose build) is also highly repetitive; I don't believe we had seen that phenomenon previous to these 143 termination problems.

0 replies

eygraber · 2023-01-20T06:05:05Z

eygraber
Jan 20, 2023

I've been seeing similar issues where I either get The runner has received a shutdown signal. or some of my processes just never start (using Gradle and Kotlin, the Gradle daemon starts, but the Kotlin daemon never starts).

0 replies

isaactpetersen · 2023-01-20T12:32:14Z

isaactpetersen
Jan 20, 2023

I just recently began experiencing this issue. I have never experienced it before.

Here's the error I receive:

The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
Process completed with exit code 143.

Here's a link to one of our recent runs:
https://github.com/DevPsyLab/DataAnalysis/actions/runs/3964736564/attempts/2

0 replies

chrisui · 2023-01-30T10:27:37Z

chrisui
Jan 30, 2023

We are seeing this issue consistently on pr/branch workflows at the step run with configure-aws-credentials on ubuntu-latest-4-cores. The weird thing is we have an identical step that runs in a different workflow (ie. our CD) with the exact same runner that succeeds. Debug logs below. Looks like the job is killed consistently immediately upon starting to execute this step.

##[debug]Evaluating condition for step: 'Run aws-actions/configure-aws-credentials@master'
##[debug]Evaluating: success()
##[debug]Evaluating success:
##[debug]=> true
##[debug]Result: true
##[debug]Starting: Run aws-actions/configure-aws-credentials@master
##[debug]Register post job cleanup for action: aws-actions/configure-aws-credentials@master
##[debug]Loading inputs
##[debug]Evaluating: secrets.DEPLOY_PREVIEW_ROLE
##[debug]Evaluating Index:
##[debug]..Evaluating secrets:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'DEPLOY_PREVIEW_ROLE'
##[debug]=> '***'
##[debug]Result: '***'
##[debug]Evaluating: env.AWS_DEFAULT_REGION
##[debug]Evaluating Index:
##[debug]..Evaluating env:
##[debug]..=> Object
##[debug]..Evaluating String:
##[debug]..=> 'AWS_DEFAULT_REGION'
##[debug]=> 'eu-west-[2](https://github.com/gaia-family/monorepo/actions/runs/4024185164/jobs/6943228995#step:5:2)'
##[debug]Result: '***'
##[debug]Loading env
Run aws-actions/configure-aws-credentials@master
##[debug]Re-evaluate condition on job cancellation for step: 'Run aws-actions/configure-aws-credentials@master'.
##[debug]Skip Re-evaluate condition on runner shutdown.
Error: The operation was canceled.
##[debug]System.OperationCanceledException: The operation was canceled.
##[debug]   at System.Threading.CancellationToken.ThrowOperationCanceledException()
##[debug]   at GitHub.Runner.Sdk.ProcessInvoker.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
##[debug]   at GitHub.Runner.Common.ProcessInvokerWrapper.ExecuteAsync(String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Channel`1 redirectStandardIn, Boolean inheritConsoleHandler, Boolean keepStandardInOpen, Boolean highPriorityProcess, CancellationToken cancellationToken)
##[debug]   at GitHub.Runner.Worker.Handlers.DefaultStepHost.ExecuteAsync(IExecutionContext context, String workingDirectory, String fileName, String arguments, IDictionary`2 environment, Boolean requireExitCodeZero, Encoding outputEncoding, Boolean killProcessOnCancel, Boolean inheritConsoleHandler, String standardInInput, CancellationToken cancellationToken)
##[debug]   at GitHub.Runner.Worker.Handlers.NodeScriptActionHandler.RunAsync(ActionRunStage stage)
##[debug]   at GitHub.Runner.Worker.ActionRunner.RunAsync()
##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Run aws-actions/configure-aws-credentials@master

0 replies

erik-bershel · 2023-01-30T11:38:33Z

erik-bershel
Jan 30, 2023
Collaborator

Hey @chrisui!
Please, provide links with one successful and one unsuccessful runs.

0 replies

tobybellwood · 2023-06-06T01:36:44Z

tobybellwood
Jun 6, 2023

As previously stated, this issue was becoming more prevalent for us. Downgrading the runner image to ubuntu-20.04 has proved successful so far. We're building a Kubernetes cluster in Docker (using KinD) - so we know we're probably running close to resource limitations, but we've only just started experiencing the UX nightmare that is shutdown signals (cancelled runs can't be re-run as failures, so need to each trigger individually in serial)

Couple of links:
Failures (most recent of oh so many) - https://github.com/uselagoon/lagoon-charts/actions/runs/5182505042
Success (after downgrade to ubuntu-20.04) - https://github.com/uselagoon/lagoon-charts/actions/runs/5183079519/attempts/1

0 replies

ehsanhajian · 2023-06-07T13:16:04Z

ehsanhajian
Jun 7, 2023

I also recently have same issue on my workflow. for no reason jobs keep getting failed. I've change runner OS from ubuntu-latest to ubuntu-20 but no luck.
https://github.com/AstarNetwork/Astar/actions/runs/5199420400
I have one successful job on ubuntu-latest
https://github.com/AstarNetwork/Astar/actions/runs/5196428137
but I don't what happened when workflow merged to the master branch then I started getting failed job.

0 replies

Spenhouet · 2023-06-14T19:11:19Z

Spenhouet
Jun 14, 2023

I'm also suddenly experiencing this issue. I can no longer build my application.
Everyone suggest that is has to do with resource usage, but I didn't change anything, so the resource usage didn't change.
Did the limits change?
I need 16GB of RAM for the Node build. Before this apparently was possible.
https://github.com/bcked/bcked.com/actions/workflows/deploy.yml

What are the enforced RAM limits? I can not find any information on this.

Until now I was using ubuntu-latest, no playing around with other images but no luck so far. Also not sure how that would make any sense.
Builds worked for months but now suddenly stopped.

Or is there a form of soft rate-limit for public repositories where workflow runs get limited after exceeding a threshold?

0 replies

Bobby-Huggins · 2023-06-29T16:56:02Z

Bobby-Huggins
Jun 29, 2023

I too am suddenly experiencing this issue when running automated tests, even without any changes to the tests.

I haven't testsed exhaustively, but it seems Ubuntu and Windows images are failing for me, but MacOS might be working, which suggests a memory issue since MacOS images have 14 GB as opposed to 7 (see here).

I agree it would be helpful to know if the memory allotment or other VM specs have changed recently, or if there is some other rate-limiting for public repos. For reference, my tests started failing about two weeks ago.

1 reply

erik-bershel Jun 30, 2023
Collaborator

Greetings!

VM specs have changed recently

There have been no changes to the VM specifications in the last two weeks (two months, two years).

rate-limiting for public repos

Anything that runner resources are yours as long as it works on a task. The problem is that it is not that big.

And in general. It should be borne in mind that new versions of programs (browsers, platforms, builders, etc.) may consume a different amount of resources for the same tasks. Most often this is the problem.

alexistitan · 2023-07-05T22:03:56Z

alexistitan
Jul 5, 2023

I'm also suddenly experiencing this issue. I can no longer build my application reliably.
Sometimes just rerunning the step make the the pipeline works.

We have tried changing runners, modifying the pipeline to don't execute some commands on parallel. But, nothing seems to fix the root cause.
Will be reporting in this thread what solve this issue reliably.

0 replies

senser-adi · 2023-07-11T20:21:50Z

senser-adi
Jul 11, 2023

Hi
Also experiencing same issue.. Suddenly
Tried 'ubuntu-latest', 'ubuntu-latest-m', 'ubuntu-22.04' --- same results, errorr...
Please help..

1 reply

eh-matteo-maso Nov 15, 2024

@senser-adi what does ubuntu-latest**-m** stand for?

Klapsa2503 · 2023-07-17T20:21:42Z

Klapsa2503
Jul 17, 2023

I had similar issues that was occurring on both ubuntu latest and self hosted ec2 instances running amazon linux 2023. In the end I managed to investigate the issue by benchmarking gradle. gradle-profiler --benchmark clean build with https://github.com/gradle/gradle-profiler.

In my case improper initialization / lifecycle management of testcontainers drained all the resources. Caused by: org.testcontainers.containers.ContainerLaunchException: Could not create/start container. After finding that and checking test containers example setup https://java.testcontainers.org/examples/ I refactored the tests and fixed the issue.

If you happen to have similar issue I recommend you profile / benchmark your build.

0 replies

rodrigo-queiroz-deltatre · 2023-07-27T13:50:40Z

rodrigo-queiroz-deltatre
Jul 27, 2023

Also experiencing the same issues with ubuntu-latest and ubuntu-20.04.

Usually takes 3-6 attempts to get a build.

2023-07-27T12:16:55.9472717Z > Task :apptv:dexBuilderTsnAndroidtvQaRelease
2023-07-27T12:18:31.3463035Z > Task :apptv:mergeExtDexTsnAndroidtvQaRelease
2023-07-27T12:18:41.7472315Z > Task :apptv:mergeTsnAndroidtvQaReleaseJavaResource
2023-07-27T12:18:59.2463021Z > Task :apptv:mergeDexTsnAndroidtvQaRelease
2023-07-27T12:19:00.2463262Z > Task :apptv:buildTsnAndroidtvQaReleasePreBundle
2023-07-27T12:19:01.4471900Z > Task :apptv:compileTsnAndroidtvQaReleaseArtProfile
2023-07-27T12:19:02.6652087Z > Task :apptv:packageTsnAndroidtvQaReleaseBundle
2023-07-27T12:19:11.4704449Z ##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
2023-07-27T12:19:11.6687468Z Cleaning up orphan processes
2023-07-27T12:19:11.7638576Z Terminate orphan process: pid (2017) (java)

0 replies

Onvistlex · 2023-08-17T12:24:52Z

Onvistlex
Aug 17, 2023

We could fix this problem by increasing the size of ubuntus swapfile: https://stackoverflow.com/a/76921482/1185087

  - name: Increase swapfile
    run: |
      sudo swapoff -a
      sudo fallocate -l 15G /swapfile
      sudo chmod 600 /swapfile
      sudo mkswap /swapfile
      sudo swapon /swapfile
      sudo swapon --show

3 replies

dusansusic Aug 25, 2023

Thank you for this!

wax911 Aug 26, 2023

I don't know why I never though of this 🤣 but thank you @Onvistlex 15G was a bit much for my runner for a fairly large android project, it was encountering a bit of out of storage space errors. When I just ran a df -h prior to see what I was working with and set a decent 8G swap this has improved the reliability of my actions significantly 😀 I was able to upgrade to ubuntu-22 runner after this (prior ubuntu-22 was worse in terms of reliability probably more processes put even more pressure on the system)

cameronraysmith Jan 29, 2024

This worked for my use case DeterminateSystems/magic-nix-cache-action#23 (comment) .

You might also have luck freeing up additional space and modulating the swap size using easimon/maximize-build-space, e.g.

      - name: maximize build space
        uses: easimon/maximize-build-space@fc881a613ad2a34aca9c9624518214ebc21dfc0c # ratchet:easimon/maximize-build-space@v10
        with:
          root-reserve-mb: 38912
          swap-size-mb: 4096
          remove-dotnet: "true"
          remove-android: "true"

Speak2Erase · 2023-08-29T19:36:44Z

Speak2Erase
Aug 29, 2023

I recently started experiencing this problem in one of my repos:
https://github.com/Astrabit-ST/Luminol/actions/runs/6006786541
https://github.com/Astrabit-ST/Luminol/actions/runs/6006108781

I tried limiting the amount of jobs cargo is using to build my project. It seems that the resource usage skyrockets at link time, so limiting cargo jobs wouldn't help.
This completely caught me off guard as this has never happened before on any github actions workflow I've made

Normally I'd use mold or bump up the size of the swapfile to mitigate this- but I can't do that on Windows!
What can I do?

0 replies

mkurde · 2023-10-12T08:29:50Z

mkurde
Oct 12, 2023

I have a kinda related question regards the Job "status" The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.

Is there a way to get "a" JobWorkflow result flag mention explicit that the runner received a shutdown signal while the Job was running?

Looking at https://docs.github.com/en/rest/actions/workflow-jobs?apiVersion=2022-11-28 gives me a bunch of API information like conclusion but this does not help me to find a job which was shutdown by the system or a failed job because the job throws an exit 1. Both have the conclusion failure. Is there an other field I can request which gives me that information?

Context: We use custom runner on Google spot vm instances and once this instance was taken away during a Workflow run, we want to trigger a new Workflow to run the Job again on a new runner.

FYI @erik-bershel

0 replies

Reregistered · 2023-12-20T22:53:24Z

Reregistered
Dec 20, 2023

@erik-bershel

We've started hitting this issue as well within the last couple of weeks. Always the same action, always the same intermittent problem. After re-running the failed test works more often than not.

I wouldn't expect resource exhaustion as we're running on ubuntu-latest-4-cores

here's an example of failure
https://github.com/ToricLabs/main/actions/runs/7275840393/job/19824441287

and an example of success
https://github.com/ToricLabs/main/actions/runs/7268872318/job/19805852802

As of this post its been ~24 hours since we've been able to get a reliable build out.

Thanks, and happy to provide more info if it helps.

3 replies

wax911 Dec 21, 2023

I haven't had this issue since increasing swap #7188 (comment) the only downside being I had to play around with swap settings to find the best balance otherwise my runner would run out of space 😞

Reregistered Dec 21, 2023

@wax911 - thanks for the suggestion. We tried that yesterday, however the action failed

Run pierotofy/set-swap-space@master
Run echo "Memory and swap:"
Memory and swap:
               total        used        free      shared  buff/cache   available
Mem:            15Gi       3.7Gi       1.9Gi       7.0Mi         9Gi        11Gi
Swap:             0B          0B          0B


Run export SWAP_FILE=$(swapon --show=NAME | tail -n 1)
swapoff: bad usage
Try 'swapoff --help' for more information.

Will debug and try again or run the commands local ( to the action) and try again.

Reregistered Dec 21, 2023

@wax911 - thank you for the suggestion, working so far!

alvindizon · 2024-02-23T02:11:59Z

alvindizon
Feb 23, 2024

Is there a fix for this? We are seeing this as well after applying some of the recommendations here: https://developer.android.com/build/optimize-your-build

Our gradle.properties looks like this:

android.useAndroidX=true
org.gradle.jvmargs=-Xmx6g -XX:+HeapDumpOnOutOfMemoryError -Dfile.encoding=UTF-8 -XX:+UseParallelGC -XX:MaxMetaspaceSize=1g
org.gradle.parallel=true
org.gradle.daemon=true
org.gradle.caching=true
android.defaults.buildfeatures.buildconfig=true
android.nonFinalResIds=false

We started encountering this when running unit tests with a standard runner (ubuntu-latest), so we thought that we should upgrade to a larger runner. Using ubuntu-22.04, we are still seeing this shutdown signal error.

0 replies

lsambolino · 2024-03-20T17:05:42Z

lsambolino
Mar 20, 2024

In our case, we solved this error when we understood that we had created multiple ansible automation platform installation concurrently reading and writing to the same postgres database. We believe that the The running ansible process received a shutdown signal. was linked to this "concurrency" of process towards the same database. It could be related some some "timeouts" of calls.

We stopped all the services on those machines and kept only the original process running. After stopping the services, we saw no interruptions in AAP project updates and project launch.

0 replies

aubertc · 2024-09-11T14:57:43Z

aubertc
Sep 11, 2024

I had the same issue starting with the recent update of the Ubuntu 24.04 LTS runner image.

My hunch was also that I was exhausting resources, but the logs were not helpful.

The runner is using Makefile with the -j option that maximizes parallelism, but I guess this was creating too many threads at the same time. Limiting the number of jobs with -l 2.5 solved the issue.

0 replies

The runner has received a shutdown signal. #7188

Description

Platforms affected

Runner images affected

Image version and build link

Is it regression?

Expected behavior

Actual behavior

Repro steps

Replies: 61 comments · 43 replies

erik-bershel Dec 7, 2022 Collaborator

JakubMosakowski Dec 7, 2022 Author

JakubMosakowski Dec 7, 2022 Author

erik-bershel Dec 21, 2022 Collaborator

erik-bershel Dec 21, 2022 Collaborator

erik-bershel Jan 30, 2023 Collaborator

erik-bershel Jun 30, 2023 Collaborator

Replies: 61 comments 43 replies

erik-bershel
Dec 7, 2022
Collaborator

JakubMosakowski
Dec 7, 2022
Author

JakubMosakowski
Dec 7, 2022
Author

erik-bershel
Dec 21, 2022
Collaborator

erik-bershel
Dec 21, 2022
Collaborator

erik-bershel
Jan 30, 2023
Collaborator

erik-bershel Jun 30, 2023
Collaborator