Skip to content

opencasestudies/ocs-bp-school-shootings-dashboard

Repository files navigation

OpenCaseStudies

render-README render-index

Important links

Disclaimer

The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.

License

This case study is part of the OpenCaseStudies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.

Citation

To cite this case study:

Wright, Carrie and Ontiveros, Michael and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard. School Shootings in the United States (Version v1.0.0).

Acknowledgments

We would like to acknowledge Elizabeth Stuart for assisting in framing the major direction of the case study.

We would like to acknowledge Michael Breshock for his contributions to this case study and developing the OCSdata package.

We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.

Reading Metrics

The total reading time for this case study was calculated with koRpus: About 110 minutes

The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 9, Age 14

Title

School Shootings in the United States

Motivation

According to this report school shootings can have long lasting impacts on those that witness them. This article states that:

Over 240,000 American students experienced a school shooting in the last two decades.

Therefore as the number of school shootings apppears to be increasing, it is useful to better understand the characteristics about these shootings to better understand why they happen and how to avoid them in the future. Thus we will make a dashboard to display this data.

The dashboard created in this case study can be found here.

Motivating questions

Our main questions:

  1. What has been the yearly rate of school shootings and where in the country have they occurred in the last 50 years (from January 1970 to June 2020)?

  2. How many individuals are typically killed in a shooting?

  3. What were the characteristics of the shooters: How often was a shooter male? How often did a shooter attempt or commit suicide?

Data

In this case study we will be using data related to school shootings in the US from 1970 to June 2020 from the Center for Homeland Defense and Security (CHDS) K-12 Shool Shooting Database.

Their methods for identifying and authenticating incidents are outlined here.

Previously according to their website:

“The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. All of the information is based on open-source information and 3rd party reporting… and may include reporting errors.

Learning Objectives

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

  1. Importing text from a Google Sheets document (googlesheets4)
  2. Converting date formats (lubridate)
  3. Geocoding data (ggmap) and creating a jitter for geocoded data on a map (SF)
  4. How to reshape data by pivoting between “long” and “wide” formats and drop rows with NA values (tidyr)
  5. How to create data visualizations with ggplot2
  6. An introductory understand of R Markdown
  7. How to create an interactive table (DT)
  8. How to create a map (leaflet)
  9. How to create an interactive dashboard with flexdashboard and shiny

Statistical Learning Objectives:

  1. Calculating percentages for data with missing values

Data import

In this case study we demonstrate how to import data from Google Sheets, however we have also downloaded the data as a CSV file and we demonstate how to import the data in this format as well.

Data wrangling

This case study covers the differences between the various *_join() functions of the dplyr package, as well as use of the case_when() function to recode data based on particular evaluations of existing values.

We also cover removing NA values with the drop_na() function of the tidyr package, and selecting the last few variables of a tibble using the last_col() function. We cover using the tidyr functions such as pivot_wider() and pivot_longer() for reshaping data, as well as arranging levels of factors using the forcats package.

Finally, this case study also covers a few of the stringr functions to manipulate character strings, including str_c(), str_detect(), and str_remove() as well as some of the functions of teh lubridate package for working with data related to dates.

We also cover how to geocode data using the ggmap package and how to modify duplicated locations using the SF pacakge so as to avoid overlapping points on a map.

Data Visualization

In this case study we show how to make faceted plots where each plot has its own y-axis label (which is actually a bit tricky), we show how to make pie charts with ggplot2 and we demonstrate how to use the waffle package to create a waffle plot. We also discuss why in some cases a pie chart might not be a good choice.

We also show how to create an interactive table with the DT package, as well as how to create an interactive map with the leaflet package.

Analysis

This case study does not really include an analysis like other case studies, but it does domonstrate how to create simple percentage statistics using a data with missing values, as well as how to properly report such percentages.

Other notes and resources

The dashboard created in this case study can be found here.

RStudio
Cheatsheet on RStuido IDE
Other RStudio cheatsheets
RStudio projects

Tidyverse

Piping in R

String manipulation cheatsheet
Table formats

Geocoding
Coordinate reference system (CRS) ESPG World Geodetic System (WGS) version 84 also called ESPG:4326
Albers equal-area conic projection
crs 102008

To learn more about geospatial coordinate systems see here and here.

ggplot2 package
Please see this case study for more details on using ggplot2
grammar of graphics
ggplot2 themes

Motivating article for this case study about school shootings

Also see this article to learn more about the impacts of school shootings.

Lightweight markup languages(LML)
Markdown
R markdown
knitr
rmarkdown (package)

See this book for more information on working with R Markdown files.

The RStudio cheatsheet for R Markdown and this tutorial are great for getting started.

Pandoc

YAML
Configuration

flexdashboard

See here for a video about flexdashboard and here for a more information on how to use this package.
See here for a list of other packages that are useful for adding elements to dashboards created with the flexdashboard package.
See here for a list of R Markdown themes which can be used with flexdashbard.
See Font Awesome for icons.

To learn more about using shiny with the flexdashboard package to create interactive dashboards, see this tutorial.

leaflet (R package)
Leaflet (JavaScript Library)

shiny
See here for a gallery of shiny examples.

See this website to learn about a more flexible and slightly more challenging option for creating dashboards in R using a package called shinydashboard.

Packages used in this case study:

Package Use in this case study
here to easily load and save data
readr to import the data as a csv file
googlesheets4 to import directly from Google Sheets
tibble to create tibbles (the tidyverse version of dataframes)
dplyr to filter, subset, join, add rows to, and modify the data
stringr to manipulate character strings within the data (collapsing strings together, replace values, and detect values)
magrittr to pipe sequential commands
tidyr to change the shape or format of tibbles to wide and long, to drop rows with NA values, and to see the last few columns of a tibble
ggmap to geocode the data (which means get the latitude and longitude values)
sf to modify the geocoded data so that overlapping points did not overlap
lubridate to work with the data-time data
DT to create the interactive table
htmltools to add a caption to our interactive table
ggplot2 to create plots
forcats to reorder factor for plot
waffle to make waffle proportion plots
poliscidata to get population values for the states
flexdashboard to create the dashboard
shiny to allow our dashboard to be interactive
leaflet to implement the leaflet (a JavaScript library for maps) to create the map for our dashboard

For users

There is a Makefile in this folder that allows you to type make to knit the case study contained in the index.Rmd to index.html and it will also knit the README.Rmd to a markdown file (README.md). Note that you may need to press the “Q” key to close the documentation about flexdashboard.

Users can skip the Data Import and Data Wrangling sections to start with the Data Analysis and Visualization section if they wish. Alternatively users can also start at the Dashboard Basics or Our Dashboard sections.

For instructors

Instructors who only wish to demonstrate the basics of how to create a dashboard with flexdashboard can simply use the Dashboard Basics section, this would likely only take one or two class sessions to cover.

Instructors can skip the Data Import and Data Wrangling sections to start with the Data Analysis section if they wish.

Target audience

This case study is appropriate for those new to R programming. It is also appropriate for more advanced R users who are new to the Tidyverse. This particular case study may require some introductory knowlege of R programming.

Suggested homework

Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more shootings. Students could be asked to use one of the pages of the dashboard that we created as an example.

Estimate of RMarkdown Compilation Time:

~ About 29 - 39 seconds

This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages