Merge pull request #314 from NIEHS/kpm-readme

Kpm readme
NIEHS · Mar 29, 2024 · 0967639 · 0967639
2 parents 00d9043 + 979b266
commit 0967639
Show file tree

Hide file tree

Showing 3 changed files with 147 additions and 58 deletions.
diff --git a/.gitignore b/.gitignore
@@ -10,6 +10,9 @@
 output/
 manuscript/
 
+# README html 
+README.html
+
 # User-specific files
 .Ruserdata
 
@@ -78,6 +81,7 @@ code/mitchell_tests/
 # Insang's negative value exploration script
 tools/negative_exploration
 tools/metalearner_test
+tools/shiny_explore_pm/missingness_exploration_pm_quarto_shiny_data/
 
 # NASA Earthdata login credentials
 .dodsrc

diff --git a/README.md b/README.md
@@ -7,18 +7,18 @@ experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](h
 # Building an Extensible, rEproducible, Test-driven, Harmonized, Open-source, Versioned, ENsemble model for air quality
 Group Project for the Spatiotemporal Exposures and Toxicology group with help from friends :smiley: :cowboy_hat_face: :earth_americas:
 
-## GitHub Push/Pull Workflow
-1) Each collaborator has a local copy of the github repo - suggested location is ddn/gs1/username/home
-2) Work locally
-3) Push to remote
-4) Kyle [or delegate] will pull to MAIN local copy on SET group ddn location
-
-## Repo Rules 
-1) To PUSH changes to the repo, the changes must be made to a non-MAIN branch
-2) Then a PULL request must be made
-3) Then it requires the REVIEW of 1 person (can be anyone)
-4) Then the change from the branch is MERGED to the MAIN branch
-   
+## Installation
+
+```r
+remotes::install_github("NIEHS/beethoven")
+```
+
+## Getting Started
+
+```r
+TODO 
+```
+
 ## Overall Project Workflow
 
 Targets: Make-like Reproducible Analysis Pipeline
@@ -28,74 +28,133 @@ Targets: Make-like Reproducible Analysis Pipeline
  4) Fit Meta Learners
  5) Predictions
  6) Summary Stats
+
+**Placeholder for up-to-date rendering of targets**
+
+```r
+tar_visnetwork(targets)
+```
 
-##  Unit and Integration Testing 
 
-We will utilize various testing approaches to ensure functionality and quality of code
+## Project Organization
+
+Here, we describe the structure of the project and the naming conventions used. The most up to date file paths and names are recorded here for reference.
+
+### File Structure
+
+#### Folder Structure
+Root Directory
+- R/ 
+- input/
+- output/
+- tests/
+- inst/
+- docs/
+- tools/
+- man/
+- vignettes/
+
+#### input/ 
+
+#### output/ 
+
+Currently, as of 3/29/24, the output folder contains .rds files for each
+of the covariates/features for model development. e.g.:
+
+- NRTAP_Covars_NLCD.rds
+- NRTAP_Covars_TRI.rds 
+
+
+#### tests/ 
+
+##### testthat/
+
+##### testdata/
+
+#### Relevant files 
+
+- LICENSE
+- DESCRIPTION
+- NAMESPACE 
+- README.md
+
+### Naming Conventions
+
+For `tar_target` functions, we use the following naming conventions:
+
+Example from CF conventions: 
+[surface] [component] standard_name [at surface] [in medium] [due to process] [assuming condition]
+
+Naming conventions for tar objects: 
+
+Long Version:
+[R object type]_[role]_[stage]_[source]_[spacetime]
 
-### Processes to test or check 
-1) data type
-2) data name
-3) data size
-4) relative paths
-5) output of one module is the expectation of the input of the next module
+- R object type: sf, datatable, tibble, SpatRaster, SpatVector
 
-### Test Drive Development
-Starting from the end product, we work backwards while articulating the tests needed at each stage.
+- role : Detailed description of the role of the object in the pipeline. Allowable keywords:
 
-#### Key Points of Unit and Integration Testing
-File Type
-1. NetCDF
-2. Numeric, double precision
-3. NA
-4. Variable Names Exist
-5. Naming Convention
+  - PM25
+  - feature (i.e. geographic covariate) 
+  - base_model
+    - base_model suffix types: linear, random_forest, xgboost, neural_net etc.
+  - meta_model 
+  - prediction
+  - plot
+    -plot suffix types: scatter, map, time_series, histogram, density etc. 
+
+- stage: the stage of the pipeline the object is used in. Object transformations
+are also articulated here. Allowable keywords: 
 
-Stats 
-1. Non-negative variance ($\sigma^2$)
-2. Mean is reasonable ($\mu$)
-3. SI Units
+  - raw
+  - process
+  - fit: Ready for base/meta learner fitting
+  - result: Final result
+  - log
+  - log10 
 
-Domain 
-1. In the US (+ buffer)
-2. In Time range (2018-2022)
+- source: the original data source
 
-Geographic 
-1. Projections
-2. Coordinate names (e.g. lat/lon)
-3. Time in acceptable format 
+  - AQS
+  - MODIS
+  - GMTED 
+  - NLCD
+  - NARR
+  - GEOSCF
+  - TRI
 
+
+- spacetime: relevant spatial or temporal information 
 
-### Frameworks for Testing of this project (with help from ChatGPT)
+  - spatial: 
+    - site_id
+    - census_tract
+    - grid
+  - time: 
+    - daily  [optional YYYYMMDD]
+    - annual  [optional YYYY]
 
-#### Test Driven Development (TDD)- Key Steps
-1. **Write a Test**: Before you start writing any code, you write a test case for the functionality you want to implement. This test should fail initially because you haven't written the code to make it pass yet. The test defines the expected behavior of your code.
 
-2. **Run the Test**: Run the test to ensure it fails. This step confirms that your test is correctly assessing the functionality you want to implement.
 
-3. **Write the Minimum Code**: Write the minimum amount of code required to make the test pass. Don't worry about writing perfect or complete code at this stage; the goal is just to make the test pass.
+Short Verion: 
 
-4. **Run the Test Again**: After writing the code, run the test again. If it passes, it means your code now meets the specified requirements.
+Cross-walk lives on the punchcard.
 
-5. **Refactor (if necessary)**: If your code is working and the test passes, you can refactor your code to improve its quality, readability, or performance. The key here is that you should have test coverage to ensure you don't introduce new bugs while refactoring.
+### Function Naming Convenctions 
 
-6. **Repeat**: Continue this cycle of writing a test, making it fail, writing the code to make it pass, and refactoring as needed. Each cycle should be very short and focused on a small piece of functionality.
+TBD
 
-7. **Complete the Feature**: Keep repeating the process until your code meets all the requirements for the feature you're working on.
+  - download
+  - calc
 
-TDD helps ensure that your code is reliable and that it remains functional as you make changes and updates. It also encourages a clear understanding of the requirements and promotes better code design.
+### Punchcard 
 
+The punchard is a single list of paths, variables, and functions that are used throughout the pipeline. 
 
+If a path is used in multiple places, it is only listed once. Then updates only require changing the path in one place.
 
+tools/pipeline/punchcard.csv
 
-## Base Learners 
-Potential base learners we can use: 
-1) PrestoGP (lasso + GP)
-2) XGBOOST
-3) RF
-4) CNN
-5) UMAP covariates
-6) Encoder NN covariates
 
 
 

diff --git a/contributing_guide.qmd b/contributing_guide.qmd
@@ -0,0 +1,26 @@
+---
+title: "Contibuting Guide for beethoven"
+format: html
+---
+
+
+## GitHub Push/Pull Workflow
+1) Each collaborator has a local copy of the github repo
+2) Work locally
+3) Push to remote
+4) Admin will pull to MAIN local copy on SET group ddn location
+
+## Repo Rules 
+1) To PUSH changes to the repo, the changes must be made to a non-MAIN branch
+2) Then a PULL request must be made
+3) Then it requires the REVIEW of 1 person (can be anyone)
+4) Then the change from the branch is MERGED to the MAIN branch
+
+
+## Areas for Contribution
+
+1. **Documentation**: We are always looking for help in improving our documentation. This includes the README, contributing guide, and any other documentation that you think could be improved.
+2. **Code**: We are always looking for help in improving our code. This includes fixing bugs, adding new features, and improving existing features.
+3. **Testing**: We are always looking for help in improving our testing. This includes writing new tests, improving existing tests, and ensuring that all tests pass.
+4. **Base Learners**: Novel and efficient base learners are always welcome and can be integrated into the overall model workflow
+5. **Model Workflow**: Improvements to the model workflow are always welcome. This includes improving the efficiency of the workflow, adding new features, and improving existing features.