Running from cron no longer possible on Derecho #998

mkavulich · 2024-01-02T17:24:59Z

Expected behavior

Previously the USE_CRON_TO_RELAUNCH=true option worked on all Tier-1 platforms (as far as I know).

Current behavior

Due to a policy change with the new machine, there are now special procedures for setting up cron tables on Derecho. These procedures are not compatible with automatic modification due to needing to log in to a separate machine, so it is not feasible to support this mode of running the workflow automatically (USE_CRON_TO_RELAUNCH=true) on Derecho.

Machines affected

Derecho

Steps To Reproduce

1. Generate an experiment with the USE_CRON_TO_RELAUNCH=true option on Derecho
2. Observe that the workflow is not run.

So far, it looks like cron jobs do still work on Derecho. But I have been told by CISL that we need to migrate away from this system ASAP.

Detailed Description of Fix

Users guide run instructions will need to be updated with Derecho-specific instructions, as the crontab functionality is currently the recommended way to run the workflow.

Possible Implementation

One way to get around this would be to leverage the WE2E functionality currently present for running and monitoring experiments for general use. This would require some tweaking of the current setup to be more user-friendly outside of the WE2E context.

Output (optional)

Currently, the error message that appears in log.launch_FV3LAM_wflow is:

Running "module reset". Resetting modules to system default. The following $MODULEPATH directories have been removed: None
ERROR:
Loading of platform-specific module file (WFLOW_MOD_FN) for the workflow 
task failed:
  WFLOW_MOD_FN = "wflow_derecho"

But that may change in the future, as I was informed by CISL that the crontab functionality will stop working all together at some point.

The text was updated successfully, but these errors were encountered:

SarahLu-NOAA · 2024-01-12T19:54:15Z

@mkavulich
My SRW exp at Derecho sit in the queue for 12 hours since last night. The exp with USE_CRON_TO_RELAUNCH=true runs ok until last night. You posted this issue last week. Is my jobs sitting in the queue related to a very busy Derecho or with the cron option in UFS/SRW no longer working/supported?

mkavulich · 2024-01-18T22:35:33Z

@SarahLu-NOAA It looks like the cron jobs have not yet been disabled (though this change is "imminent"), so this is likely unrelated to the issues you saw.

mkavulich added the bug Something isn't working label Jan 2, 2024

mkavulich mentioned this issue Jan 2, 2024

[develop] Fixing several issues, including 966 (bash octal issue); add new winter weather verification test with staged data #997

Merged

24 tasks

gspetro-NOAA added this to ufs-community User Support Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running from cron no longer possible on Derecho #998

Running from cron no longer possible on Derecho #998

mkavulich commented Jan 2, 2024 •

edited

Loading

SarahLu-NOAA commented Jan 12, 2024

mkavulich commented Jan 18, 2024

Running from cron no longer possible on Derecho #998

Running from cron no longer possible on Derecho #998

Comments

mkavulich commented Jan 2, 2024 • edited Loading

Expected behavior

Current behavior

Machines affected

Steps To Reproduce

Detailed Description of Fix

Possible Implementation

Output (optional)

SarahLu-NOAA commented Jan 12, 2024

mkavulich commented Jan 18, 2024

mkavulich commented Jan 2, 2024 •

edited

Loading