Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding health check for ocp shared cluster #86

Merged
merged 3 commits into from
Oct 9, 2024
Merged

Adding health check for ocp shared cluster #86

merged 3 commits into from
Oct 9, 2024

Conversation

agonzalezrh
Copy link
Contributor

No description provided.

@agonzalezrh agonzalezrh changed the title [WIP] Adding health check for ocp shared cluster Adding health check for ocp shared cluster Oct 9, 2024
@agonzalezrh agonzalezrh merged commit 03dff42 into main Oct 9, 2024
5 checks passed
agonzalezrh added a commit that referenced this pull request Oct 27, 2024
* Add secret to generate a token (#82)

* Add secret to generate a token

* Add secret to generate a token

* Use new fork of aws-nuke (#83)

* Use new fork of aws-nuke

  https://github.com/rebuy-de/aws-nuke   is not maintained anymore.

  The official fork is https://github.com/ekristen/aws-nuke  as mentioned in the readme.

This change updates the conan image and playbook.

- use the binary from the new fork
- to be on the safe side: keep using the old binary as a last step

* Update readme

* Fix helm chart for conan and update readme

* Fix command line with new version of aws nuke

* tool: add login to hurl file (mark for cleanup)

* aws-nuke config: Use new keys in conf file

fixes:

WARN[0000] deprecated configuration key 'account-blocklist' - please use 'blocklist' instead  component=config
WARN[0000] deprecated configuration key 'feature-flags' - please use 'settings' instead  component=config

* aws-nuke: Add separate config for legacy + fixes

* aws-nuke: fail playbook if legacy aws-nuke failed

* Fix typo

* Fix ansible deprecation warnings (#84)

* OcpSandbox: Fix credentials output when empty/nil (#87)

* Adding health check for ocp shared cluster (#86)

* Adding health check for ocp shared cluster

* Adding health check for ocp shared cluster

* Adding health check for ocp shared cluster

* conan: fix ongoing cleanup errors (#85)

- bump aws-nuke to v3.26.0
- Instances setup with the disable-stop-protection were not deleted by aws-nuke.
  => Enable the DisableStopProtection option for aws-nuke.
- add a 'debug' environment variable to better control output of conan
  by default improve output of conan by being a little bit less verbose.
- EC2Images: include disabled and deprecated images + disable deregistration protection
  disabled, deprecated images  or images with deregistration protection weren't deleted by aws-nuke
- `manual_cleanup.py`: Release EIP that are in a NetworkBorderGroup  - aws-nuke misses them.
- `manual_cleanup.py`: VPC can't be deleted when they have a VPC Lattice target group registered. Delete VPC Lattice target groups and targets and deregister it from the VPC.
- Improve output of the ansible playbook by reducing noise:
  * add the `--quiet` option to the aws-nuke command
  * do not include `stdout` and `stderr` in the output of the register for the aws-nuke task
    `stdout_lines` and `stderr_lines` are enough and more readable.
- `requirements.txt`: do not pin versions of python modules. Instead, use the latest version of each module
  those will  be baked into the container image.
  That is useful here to have the DeletionMode option for the `delete_stack()` function for deleting faulty cloudformation stacks.
- Add duration of the "cleanup" run at the end for each sandbox.
  ```
  2024-10-09T06:39:11+00:00 sandbox123 reset took 30m20s
  ```
- Cloudformation stacks are sometimes stuck in DELETE_FAILED because a resource part of the stack is already deleted.
  in `manual_cleanup.py` use the `FORCE_DELETE_STACK` option.
- Fix some Ansible deprecation warnings

* conan script: fix test with empty var

Fixes the error:
./wipe_sandbox.sh: line 122: [: : integer expression expected

* Conan performance improvements (#88)

Before:   **35+ minutes** to cleanup a sandbox
After and without `aws-nuke-legacy`:  **~5 minutes**

* Throw an error if aws-nuke-legacy deletes resource(s)
* Add a flag to disable/enable aws-nuke legacy
   Once we're sure no resource is ever cleaned up by aws-nuke legacy after aws-nuke new fork, we can easily disable it.
* Add ansible log when debug is on
* Give aws-nuke up to 1h
* target groups cleanup is done in `manual_cleanup.py`, remove it from ansible tasks
* Disassociation of EIP is done in `manual_cleanup.py`, remove it from ansible tasks
* RDS: Disable deletion protection is done by aws-nuke, remove it from the ansible tasks
* Termination protection is done by aws-nuke, remove it from the ansible tasks
* Reduce noise in logs
* Print `aws-nuke` summary, including the number of resources nuked.
   ```
   reset_sandbox939.log:Nuke complete: 0 failed, 2495 skipped, 3 finished.                                                                                                                                                                      
   ```
* Do not run `manual_cleanup.py` first but only after running aws-nuke once.

* Enable profiling (http/pprof) (#89)

- create debug routes behind *admin* authentication

* Fix loop to get secret (#90)

* Fix loop to get secret, add a stop condition

---------

Co-authored-by: Guillaume Coré <gucore@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants