Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new code for data download and filtering, updated folder structure #35

Merged
merged 1 commit into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
^\.github$
tools/
tools/
^.*\.Rproj$
^\.Rproj\.user$
4 changes: 4 additions & 0 deletions NRTAPmodel.Rproj
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,7 @@ Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

BuildType: Package
PackageUseDevtools: Yes
PackageInstallArgs: --no-multiarch --with-keep.source
Binary file modified R/.DS_Store
Binary file not shown.
44 changes: 44 additions & 0 deletions R/data_download_preprocess/aqs_download_filter.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Download EPA AQS data and filter to a single POC per site#
# Author: Mariana Alifa Kassien
# September 2023

# Source functions
source("R/data_download_preprocess/download_aqs_data.R")
source("R/data_download_preprocess/filter_minimum_poc.R")

# Define parameters of interest for download
parameter_code = 88101
year_start = 2018
year_end = 2022
resolution_temporal = "daily"
directory_to_download = "./input/aqs/"
directory_to_save = "./input/aqs/"
url_aqs_download = "https://aqs.epa.gov/aqsweb/airdata/"

# Run data download function
download_aqs_data(parameter_code, year_start,year_end,resolution_temporal,
directory_to_download, directory_to_save,
url_aqs_download, remove_zips = FALSE)

# Read in sites dataframe for filtering
input_df= read.csv(paste(directory_to_save, resolution_temporal, "_", parameter_code, "_", year_start, "-", year_end, ".csv", sep=""),
stringsAsFactors = F)

# Optional: view column names to determine right inputs for filtering
#colnames(input_df)

# Create a site ID column that does not include the POC number
input_df = input_df |>
mutate(ID.Site = substr(ID.Monitor, 1, 17))

site_id="ID.Site"
poc_name="POC"

# Filter sites with multiple POCs (select lowest-numbered POC per site)
data_filtered =filter_minimum_poc(input_df, site_id, poc_name)

# Save new filtered data - wasn't sure what to name it so I added "filtered_" to the beginning. We can change that too if there's a better idea.
output_name=paste(directory_to_save, "filtered_",resolution_temporal, "_", parameter_code, "_", year_start, "-", year_end, ".csv", sep="")
# output_name
# [1] "./input/aqs/filtered_daily_88101_2018-2022.csv"
write.csv(data_filtered, output_name)