- HTML: https://www.opencasestudies.org/ocs-bp-school-shootings-dashboard
- GitHub: https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard
- Bloomberg American Health Initiative: https://americanhealth.jhu.edu/open-case-studies
- Dashboard: https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/
- GitHub repo for dashboard: https://github.com/opencasestudies/ocs-bp-school-shootings-flexdashboard
The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.
This case study is part of the OpenCaseStudies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.
To cite this case study:
Wright, Carrie and Ontiveros, Michael and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard. School Shootings in the United States (Version v1.0.0).
We would like to acknowledge Elizabeth Stuart for assisting in framing the major direction of the case study.
We would like to acknowledge Michael
Breshock for his contributions to this
case study and developing the OCSdata
package.
We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.
The total reading time for this case study was calculated with koRpus: About 110 minutes
The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 9, Age 14
School Shootings in the United States
According to this report school shootings can have long lasting impacts on those that witness them. This article states that:
Over 240,000 American students experienced a school shooting in the last two decades.
Therefore as the number of school shootings apppears to be increasing, it is useful to better understand the characteristics about these shootings to better understand why they happen and how to avoid them in the future. Thus we will make a dashboard to display this data.
The dashboard created in this case study can be found here.
Our main questions:
-
What has been the yearly rate of school shootings and where in the country have they occurred in the last 50 years (from January 1970 to June 2020)?
-
How many individuals are typically killed in a shooting?
-
What were the characteristics of the shooters: How often was a shooter male? How often did a shooter attempt or commit suicide?
In this case study we will be using data related to school shootings in the US from 1970 to June 2020 from the Center for Homeland Defense and Security (CHDS) K-12 Shool Shooting Database.
Their methods for identifying and authenticating incidents are outlined here.
Previously according to their website:
“The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. All of the information is based on open-source information and 3rd party reporting… and may include reporting errors.”
The skills, methods, and concepts that students will be familiar with by the end of this case study are:
Data Science Learning Objectives:
- Importing text from a Google Sheets document (
googlesheets4
) - Converting date formats (
lubridate
) - Geocoding data (
ggmap
) and creating a jitter for geocoded data on a map (SF
) - How to reshape data by pivoting between “long” and “wide” formats
and drop rows with
NA
values (tidyr
) - How to create data visualizations with
ggplot2
- An introductory understand of R Markdown
- How to create an interactive table (
DT
) - How to create a map (
leaflet
) - How to create an interactive dashboard with
flexdashboard
andshiny
Statistical Learning Objectives:
- Calculating percentages for data with missing values
In this case study we demonstrate how to import data from Google Sheets, however we have also downloaded the data as a CSV file and we demonstate how to import the data in this format as well.
This case study covers the differences between the various *_join()
functions of the dplyr
package, as well as use of the case_when()
function to recode data based on particular evaluations of existing
values.
We also cover removing NA
values with the drop_na()
function of the
tidyr
package, and selecting the last few variables of a tibble using
the last_col()
function. We cover using the tidyr
functions such as
pivot_wider()
and pivot_longer()
for reshaping data, as well as
arranging levels of factors using the forcats
package.
Finally, this case study also covers a few of the stringr
functions to
manipulate character strings, including str_c()
, str_detect()
, and
str_remove()
as well as some of the functions of teh lubridate
package for working with data related to dates.
We also cover how to geocode data using the ggmap
package and how to
modify duplicated locations using the SF
pacakge so as to avoid
overlapping points on a map.
In this case study we show how to make faceted plots where each plot has
its own y-axis label (which is actually a bit tricky), we show how to
make pie charts with ggplot2
and we demonstrate how to use the
waffle
package to create a waffle plot. We also discuss why in some
cases a pie chart might not be a good choice.
We also show how to create an interactive table with the DT
package,
as well as how to create an interactive map with the leaflet
package.
This case study does not really include an analysis like other case studies, but it does domonstrate how to create simple percentage statistics using a data with missing values, as well as how to properly report such percentages.
The dashboard created in this case study can be found here.
RStudio
Cheatsheet on RStuido IDE
Other RStudio cheatsheets
RStudio projects
String manipulation cheatsheet
Table formats
Geocoding
Coordinate reference system
(CRS)
ESPG
World Geodetic System (WGS) version 84 also called
ESPG:4326
Albers equal-area conic
projection
crs 102008
To learn more about geospatial coordinate systems see here and here.
ggplot2
package
Please see this case
study for more
details on using ggplot2
grammar of graphics
ggplot2
themes
Motivating article for this case study about school shootings
Also see this article to learn more about the impacts of school shootings.
Lightweight markup
languages(LML)
Markdown
R markdown
knitr
rmarkdown
(package)
See this book for more information on working with R Markdown files.
The RStudio cheatsheet for R Markdown and this tutorial are great for getting started.
See
here
for a video about flexdashboard and
here for a more
information on how to use this package.
See
here
for a list of other packages that are useful for adding elements to
dashboards created with the flexdashboard
package.
See here
for a list of R Markdown themes which can be used with flexdashbard
.
See Font Awesome for icons.
To learn more about using shiny
with the flexdashboard
package to
create interactive dashboards, see this
tutorial.
leaflet (R package)
Leaflet (JavaScript Library)
shiny
See here for a gallery of shiny
examples.
See this website to learn
about a more flexible and slightly more challenging option for creating
dashboards in R using a package called shinydashboard
.
Packages used in this case study:
Package | Use in this case study |
---|---|
here | to easily load and save data |
readr | to import the data as a csv file |
googlesheets4 | to import directly from Google Sheets |
tibble | to create tibbles (the tidyverse version of dataframes) |
dplyr | to filter, subset, join, add rows to, and modify the data |
stringr | to manipulate character strings within the data (collapsing strings together, replace values, and detect values) |
magrittr | to pipe sequential commands |
tidyr | to change the shape or format of tibbles to wide and long, to drop rows with NA values, and to see the last few columns of a tibble |
ggmap | to geocode the data (which means get the latitude and longitude values) |
sf | to modify the geocoded data so that overlapping points did not overlap |
lubridate | to work with the data-time data |
DT | to create the interactive table |
htmltools | to add a caption to our interactive table |
ggplot2 | to create plots |
forcats | to reorder factor for plot |
waffle | to make waffle proportion plots |
poliscidata | to get population values for the states |
flexdashboard | to create the dashboard |
shiny | to allow our dashboard to be interactive |
leaflet | to implement the leaflet (a JavaScript library for maps) to create the map for our dashboard |
There is a Makefile
in this folder that allows you to type
make
to knit the case study contained in the index.Rmd
to
index.html
and it will also knit the README.Rmd
to a
markdown file (README.md
). Note that you may need to press the “Q” key
to close the documentation about flexdashboard.
Users can skip the Data Import and Data Wrangling sections to start with the Data Analysis and Visualization section if they wish. Alternatively users can also start at the Dashboard Basics or Our Dashboard sections.
Instructors who only wish to demonstrate the basics of how to create a
dashboard with flexdashboard
can simply use the Dashboard Basics
section, this would likely only take one or two class sessions to cover.
Instructors can skip the Data Import and Data Wrangling sections to start with the Data Analysis section if they wish.
This case study is appropriate for those new to R programming. It is also appropriate for more advanced R users who are new to the Tidyverse. This particular case study may require some introductory knowlege of R programming.
Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more shootings. Students could be asked to use one of the pages of the dashboard that we created as an example.
~ About 29 - 39 seconds
This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.