Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #372

Merged
merged 27 commits into from
Oct 2, 2024
Merged

Dev #372

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
e3f9539
export convert_cv_index_rset; skip_on_ci; expect_warning
mitchellmanware Sep 24, 2024
106e194
Merge pull request #360 from NIEHS/mm-tests-0820
kyle-messier Sep 25, 2024
d922834
rebasing + gitignore for tokens
kyle-messier Sep 30, 2024
819dcd2
remove old test data
kyle-messier Sep 30, 2024
524fcd7
Merge pull request #365 from NIEHS/pre-dev-fix
kyle-messier Sep 30, 2024
fc275d2
set up container, def, and archive old bash scripts
kyle-messier Oct 1, 2024
bad244e
add dev to branches triggering GH actions
kyle-messier Oct 1, 2024
45a2170
Merge pull request #368 from NIEHS/container-01
kyle-messier Oct 1, 2024
30b0f89
def files, bash scipts for containers, testing yaml
kyle-messier Oct 1, 2024
4c42f5a
fixing [on] typo
kyle-messier Oct 1, 2024
969565b
GH action path correction
kyle-messier Oct 1, 2024
66c65aa
name change
kyle-messier Oct 1, 2024
27d64e3
def file was wrong name
kyle-messier Oct 1, 2024
f4db895
rel path update for container build
kyle-messier Oct 1, 2024
89a2f95
path fix, fakeroot
kyle-messier Oct 1, 2024
8af11cd
path change for test functions
kyle-messier Oct 1, 2024
7afc578
cache fixing attempt
kyle-messier Oct 1, 2024
ab3dfb7
path fix attempts
kyle-messier Oct 1, 2024
775c98d
change order of yaml names to include new R test file in mnt
kyle-messier Oct 1, 2024
691aa16
remove _targets/ path from mount
kyle-messier Oct 1, 2024
ddee948
cache updating and check sections
kyle-messier Oct 1, 2024
ee9da0b
test cache of sif
kyle-messier Oct 1, 2024
401a78a
Merge branch 'dev' into container-01
kyle-messier Oct 1, 2024
e460622
Merge pull request #369 from NIEHS/container-01
kyle-messier Oct 1, 2024
51a6733
_targets.R, test and run bash scripts
kyle-messier Oct 2, 2024
8790b40
Merge branch 'dev' into container-01
kyle-messier Oct 2, 2024
7d17aba
Merge pull request #370 from NIEHS/container-01
kyle-messier Oct 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/check-standard.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
branches: [main, master, dev]
pull_request:
branches: [main, master]
branches: [main, master, dev]

name: R-CMD-check

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
branches: [main, master, dev]
pull_request:
branches: [main, master]
branches: [main, master, dev]

name: lint

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
branches: [main, master, dev]
pull_request:
branches: [main, master]
branches: [main, master, dev]
release:
types: [published]
workflow_dispatch:
Expand Down
73 changes: 73 additions & 0 deletions .github/workflows/test-container-dl-calc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
name: Test Coverage for Download and Calculate via Apptainer

on:
push:
branches: [main, master, dev]
pull_request:
branches: [main, master, dev]

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Create run_dl_calc_tests.R dynamically
run: |
mkdir -p tests/testthat # Ensure the directory exists
echo 'testthat::test_file("tests/testthat/test_download.R")' > tests/testthat/run_dl_calc_tests.R
echo 'testthat::test_file("tests/testthat/test_calculate.R")' >> tests/testthat/run_dl_calc_tests.R

- name: Install Apptainer dependencies
run: |
sudo apt-get update && sudo apt-get install -y \
libseccomp-dev \
squashfs-tools \
cryptsetup \
wget

- name: Install Apptainer
run: |
wget https://github.com/apptainer/apptainer/releases/download/v1.1.0/apptainer-1.1.0.tar.gz
tar -xvf apptainer-1.1.0.tar.gz
cd apptainer-1.1.0
./mconfig && make -C builddir && sudo make -C builddir install

- name: Restore .sif file from cache
id: cache-sif
uses: actions/cache@v3
with:
path: beethoven_dl_calc.sif
key: sif-cache-${{ runner.os }}-${{ hashFiles('container/beethoven_dl_calc.def') }}
restore-keys: |
sif-cache-${{ runner.os }}-

- name: Build the Apptainer container (if cache miss)
if: steps.cache-sif.outputs.cache-hit != 'true'
run: |
apptainer build --fakeroot beethoven_dl_calc.sif container/beethoven_dl_calc.def

- name: Cache the .sif file
if: steps.cache-sif.outputs.cache-hit != 'true'
uses: actions/cache@v3
with:
path: beethoven_dl_calc.sif
key: sif-cache-${{ runner.os }}-${{ hashFiles('container/beethoven_dl_calc.def') }}

- name: Check if .sif file exists
run: |
if [ ! -f beethoven_dl_calc.sif ]; then
echo "Error: .sif file not found!"
exit 1
fi

- name: Run R tests
run: |
apptainer exec \
--bind $PWD/inst:/pipeline \
--bind $PWD/input:/input \
--bind $PWD:/mnt \
beethoven_dl_calc.sif \
Rscript /mnt/tests/testthat/run_dl_calc_tests.R
4 changes: 2 additions & 2 deletions .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
push:
branches: [main, master]
branches: [main, master, dev]
pull_request:
branches: [main, master]
branches: [main, master, dev]

name: test-coverage-local

Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,8 @@ slurm_error.log
/doc/

# interactive targets
targets_start.Rout
targets_start.Rout

.netrc
.urs_cookies
beethoven_branching_notes.txt
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ export(calc_geos_strict)
export(calc_gmted_direct)
export(calc_narr2)
export(calculate)
export(convert_cv_index_rset)
export(feature_raw_download)
export(fit_base_learner)
export(fit_base_tune)
Expand Down
1 change: 1 addition & 0 deletions R/base_learner.R
Original file line number Diff line number Diff line change
Expand Up @@ -506,6 +506,7 @@ assign_learner_cv <-
#' training-test data.frames and a column of labels.
#' @author Insang Song
#' @importFrom rsample make_splits manual_rset
#' @export
convert_cv_index_rset <-
function(
cvindex,
Expand Down
209 changes: 108 additions & 101 deletions _targets.R
Original file line number Diff line number Diff line change
@@ -1,123 +1,80 @@
library(targets)
library(tarchetypes)
library(future)
library(future.batchtools)
library(dplyr)
library(
beethoven,
lib.loc = "/ddn/gs1/home/manwareme/R/x86_64-pc-linux-gnu-library/4.3"
)
library(tidymodels)
library(bonsai)
# library(
# torch,
# lib.loc = "/ddn/gs1/biotools/R/lib64/R/library"
# )
library(crew)
library(future)
library(beethoven)
library(amadeus)

Sys.setenv("LD_LIBRARY_PATH" = paste("/ddn/gs1/biotools/R/lib64/R/customlib", Sys.getenv("LD_LIBRARY_PATH"), sep = ":"))

# replacing yaml file.
# targets store location corresponds to _targets/ in the root of the project
tar_config_set(
store = "/ddn/gs1/home/manwareme/beethoven/beethoven_targets"
store = "/opt/_targets"
)

# crew contollers
# For now, one is set, but we can explore the use of multiple controllers
# Can also explore making the workers input for bash script or Rscript
geo_controller <- crew_controller_local(
name = "geo_controller",
workers = 16L,
launch_max = 8L,
seconds_idle = 120
)

# maximum future exportable object size is set 50GB
# TODO: the maximum size error did not appear until recently
# and suddenly appeared. Need to investigate the cause.
# Should be removed after the investigation.
# options(future.globals.maxSize = 50 * 2^30)
options(future.globals.maxSize = 60 * 1024^3) # 60 GiB


generate_list_download <- FALSE
# Setting up the NASA Earthdata token inside the container
# This needs to be tested
if (!nzchar(Sys.getenv("NASA_EARTHDATA_TOKEN"))){
tar_source("/mnt/NASA_token_setup.R")
file.exists(".netrc")
file.exists(".urs_cookies")
file.exists(".dodsrc")
}


arglist_download <-
set_args_download(
char_period = c("2018-01-01", "2022-12-31"),
char_input_dir = "input",
nasa_earth_data_token = NULL,#Sys.getenv("NASA_EARTHDATA_TOKEN"),
mod06_filelist = "inst/targets/mod06_links_2018_2022.csv",
export = generate_list_download,
path_export = "inst/targets/download_spec.qs"
char_input_dir = "/input",
nasa_earth_data_token = Sys.getenv("NASA_EARTHDATA_TOKEN"),
mod06_filelist = "/pipeline/targets/mod06_links_2018_2022.csv",
export = TRUE,
path_export = "/pipeline/targets/download_spec.qs"
)

generate_list_calc <- FALSE

arglist_common <-
set_args_calc(
char_siteid = "site_id",
char_timeid = "time",
char_period = c("2018-01-01", "2022-12-31"),
num_extent = c(-126, -62, 22, 52),
char_user_email = paste0(Sys.getenv("USER"), "@nih.gov"),
export = generate_list_calc,
path_export = "inst/targets/calc_spec.qs",
char_input_dir = "/ddn/gs1/group/set/Projects/NRT-AP-Model/input"
)

tar_source("inst/targets/targets_initialize.R")
tar_source("inst/targets/targets_download.R")
tar_source("inst/targets/targets_calculate_fit.R")
tar_source("inst/targets/targets_calculate_predict.R")
tar_source("inst/targets/targets_baselearner.R")
tar_source("inst/targets/targets_metalearner.R")
tar_source("inst/targets/targets_predict.R")


# bypass option
Sys.setenv("BTV_DOWNLOAD_PASS" = "TRUE")

#
# bind custom built GDAL
# Users should export the right path to the GDAL library
# by export LD_LIBRARY_PATH=.... command.
### NOTE: It is important to source the scipts after the global variables are defined from the set_args functions
#tar_source("/pipeline/targets/targets_aqs.R")
tar_source("/pipeline/targets/targets_download.R")

# arglist_common is generated above
plan(
list(
tweak(
future.batchtools::batchtools_slurm,
template = "inst/targets/template_slurm.tmpl",
resources =
list(
memory = 8,
log.file = "slurm_run.log",
ncpus = 1, partition = "geo", ntasks = 1,
email = arglist_common$char_user_email,
error.file = "slurm_error.log"
)
),
multicore
)
)
# Toy test files - note we will not have functions defined like this directly in
# the _targets.R file
my_fun_a <- function(n) {
rnorm(n)
}

my_fun_b <- function(x) {
x^2
}

# # invalidate any nodes older than 180 days: force running the pipeline
# tar_invalidate(any_of(tar_older(Sys.time() - as.difftime(180, units = "days"))))


# # nullify download target if bypass option is set
if (Sys.getenv("BTV_DOWNLOAD_PASS") == "TRUE") {
target_download <- NULL
}

# targets options
# For GPU support, users should be aware of setting environment
# variables and GPU versions of the packages.
# TODO: check if the controller and resources setting are required
tar_option_set(
packages = c(
"beethoven", "amadeus", "chopin", "targets", "tarchetypes",
"data.table", "sf", "terra", "exactextractr",
#"crew", "crew.cluster",
"tigris", "dplyr",
"future.batchtools", "qs", "collapse", "bonsai",
"tidymodels", "tune", "rsample", "torch", "brulee",
"glmnet", "xgboost",
"future", "future.apply", "future.callr", "callr",
"stars", "rlang", "parallelly"
),
library = c("/ddn/gs1/group/set/isong-archive/r-libs"),
repository = "local",
packages =
c( "amadeus", "targets", "tarchetypes",
"data.table", "sf", "terra", "exactextractr",
"dplyr", "qs", "callr", "stars", "rlang"),
controller = crew_controller_group(geo_controller),
resources = tar_resources(
crew = tar_resources_crew(controller = "geo_controller")
),
error = "abridge",
memory = "transient",
format = "qs",
Expand All @@ -127,15 +84,65 @@ tar_option_set(
seed = 202401L
)

# should run tar_make_future()
list(
tar_target(name = A, command = my_fun_a(100)),
tar_target(name = B, command = my_fun_b(A), pattern = A),
tar_target(name = save_input, command = saveRDS(B, "/input/input.rds")),
tar_target( # Test download data with amadeus
download_test,
amadeus::download_narr(
variables = c("weasd", "omega"),
year = c(2023, 2023),
directory_to_save = "/input/narr_monolevel",
acknowledgement = TRUE,
download = TRUE,
remove_command = TRUE
)
),
target_download
)


# Style below that uses sources scripts for targets by pipeline step
# Note that variables created in _targets.R are in the same local
# environment as the sourced scripts

# list(
# target_init,
# target_download
# target_calculate_fit,
# target_baselearner#,
# target_metalearner,
# target_calculate_predict,
# target_predict,
# # documents and summary statistics
# targets::tar_target(
# summary_urban_rural,
# summary_prediction(
# grid_filled,
# level = "point",
# contrast = "urbanrural"))
# ,
# targets::tar_target(
# summary_state,
# summary_prediction(
# grid_filled,
# level = "point",
# contrast = "state"
# )
# )
# )

# targets::tar_visnetwork(targets_only = TRUE)
# END OF FILE

list(
target_init,
target_download,
target_calculate_fit,
target_baselearner,
target_metalearner,
target_calculate_predict#,
# list(
# target_init,
# target_download,
# target_calculate_fit,
# target_baselearner,
# target_metalearner,
# target_calculate_predict#,
# target_predict,
# # documents and summary statistics
# targets::tar_target(
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading
Loading