RNHANES
is an R package for accessing and analyzing CDC NHANES (National Health and Nutrition Examination Survey) data that was developed by Silent Spring Institute.
- Download and search NHANES variable and data file lists
- Download and cache NHANES data files
- Compute survey-weighted detection frequencies, quantiles, and geometric means
- Plot weighted histograms
You can install the latest stable version from github:
library(devtools)
install_github("silentspringinstitute/RNHANES")
The version through CRAN is older and will pull errors when working with more recent NHANES cycles:
install.packages("RNHANES")
You can browse the package's documentation on the RNHANES website: http://silentspringinstitute.github.io/RNHANES/.
library(RNHANES)
# Download environmental phenols & parabens data from the 2011-2012 survey cycle
dat <- nhanes_load_data("EPH", "2011-2012")
# Download the same data, but this time include demographics data (which includes sample weights)
dat <- nhanes_load_data("EPH", "2011-2012", demographics = TRUE)
# Find the sample size for urinary triclosan
nhanes_sample_size(dat,
column = "URXTRS",
comment_column = "URDTRSLC",
weights_column = "WTSA2YR")
# Compute the detection frequency of urinary triclosan
nhanes_detection_frequency(dat,
column = "URXTRS",
comment_column = "URDTRSLC",
weights_column = "WTSA2YR")
# Compute 95th and 99th quantiles for urinary triclosan
nhanes_quantile(dat,
column = "URXTRS",
comment_column = "URDTRSLC",
weights_column = "WTSA2YR",
quantiles = c(0.95, 0.99))
# Compute geometric mean of urinary triclosan
nhanes_geometric_mean(dat,
column = "URXTRS",
weights_column = "WTSA2YR")
# Plot a histogram of the urinary triclosan distribution
nhanes_hist(dat,
column = "URXTRS",
comment_column = "URDTRSLC",
weights_column = "WTSA2YR")
# Build a survey design object for use with survey package
design <- nhanes_survey_design(dat, weights_column = "WTSA2YR")
An easy way to calculate geometric means is now built into RNHANES via the nhanes_geometric_mean
function, but the version in CRAN hasn't been updated yet. If you are using the CRAN version, however, you can compute them by taking the arithmetic mean of a log-transformed variable and exponentiating. Here's an example:
library(survey)
library(RNHANES)
library(tidyverse)
dat <- nhanes_load_data("EPHPP_H", "2013-2014", demographics = TRUE) %>%
filter(!is.na(URXBPH))
des <- nhanes_survey_design(dat, "WTSB2YR")
logmean <- svymean(~log(URXBPH), des, na.rm = TRUE)
# Geometric mean lower 95% confidence interval
exp(logmean[1] - 1.96 * sqrt(attr(logmean, "var")))
# Geometric mean
exp(logmean)[1]
# Geometric mean upper 95% confidence interval
exp(logmean[1] + 1.96 * sqrt(attr(logmean, "var")))
I recommend using the svycor function from the jtools package to compute survey-weighted Pearson correlations between NHANES variables:
library(RNHANES)
library(tidyverse)
library(jtools)
# Download PAH dataset
nhanes_dat <- nhanes_load_data("PAH_H", "2013-2014", demographics = TRUE)
# Build the survey design object
des <- nhanes_survey_design(nhanes_dat)
svycor(~log(URXP01) + log(URXP04) + log(URXP06) + log(URXP10), design = des, na.rm = TRUE)
Thanks to the following people for contributing pull requests: