Skip to content

Commit

Permalink
Merge pull request #8 from olincollege/SAN-29-Collect-population-density
Browse files Browse the repository at this point in the history
SAN-29 collect population density + SAN-36 vulnerable population
  • Loading branch information
cory0417 authored Nov 13, 2024
2 parents 99352b8 + 81b284a commit 3f95e24
Show file tree
Hide file tree
Showing 12 changed files with 657 additions and 6 deletions.
33 changes: 30 additions & 3 deletions docs/source/explanations/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,45 @@ Since creating a synthesized dataset based on multiple datasets is a critical pa


Crosswalk Dataset
*****************
-----------------

The crosswalk dataset should contain polygons that represent the boundaries of the crosswalks. The dataset ideally should accurately reflect the real world, but there is an implicit understanding that the dataset may not be perfect since most cities do not have a comprehensive dataset of crosswalks. The dataset should be in a format that can be easily read by the software, such as GeoJSON. In the testing module, you'll find the tests for fetching the crosswalk dataset and mapping it for the city of Boston. We've chosen Boston for its relative ease of access to various datasets and physical proximity to the team at Olin College of Engineering.

UMass Amherst has been developing a dataset of all crosswalks in Massachusetts using computer vision model (YOLOv8) and aerial imagery. The dataset is not perfect, but it is a good starting point for our project, and has the potential to be applicable for states that also do not have a thorough catalog of their crosswalk assets. The dataset can be viewed at the `following link <https://www.arcgis.com/apps/mapviewer/index.html?url=https://gis.massdot.state.ma.us/arcgis/rest/services/Assets/Crosswalk_Poly/FeatureServer/0&source=sd>`_.

Traffic Dataset
***************
---------------

Since our project is focused on pedestrian safety at nighttime on crosswalks, we need a dataset that contains information about the volume of traffic. MassDOT provides a convenient dataset that includes average annual daily traffic (AADT) counts for most roads in Massachusetts. The counts will be used to inform the risk of a pedestrian being hit by a car at a given crosswalk. The dataset can be viewed at the `following link <https://www.arcgis.com/apps/mapviewer/index.html?url=https://gis.massdot.state.ma.us/arcgis/rest/services/Roads/VMT/FeatureServer/10&source=sd>`_.

Population Density Dataset
--------------------------

The population density data is sourced from the American Community Survey (ACS) 5-year estimates, which provide reliable demographic information on U.S. populations. The ACS data is integrated with geographic data from census tract boundaries to compute the area of each tract and derive population density values.

Density Calculation Methodology
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

1. **Geographic Area Calculation**:
Each census tract is represented by a polygon geometry, and its area is calculated in square kilometers. To ensure accuracy, geometries are reprojected to a projected coordinate system (EPSG:3857) before calculating areas.

2. **Population Segments**:
The dataset includes specific population segments based on age and disability
status:

- **Total Population**: Total number of individuals residing within each tract.
- **Senior Population (65+)**: The population aged 65 years and older.
- **Youth Population (0-17)**: The population aged 17 years and younger.
- **Disabled Population**: Individuals identified as having a disability.

3. **Density Calculation**:
Population density for each segment is calculated by dividing the total
population or segment-specific population by the area of each tract in
square kilometers. This results in values representing the density of each
population segment (e.g., seniors per km²).

Streetlights Dataset
********************

The main information that the streetlights dataset should contain is the location of the streetlights. Additional information such as the type of bulb, last-replacement year, and wattage, etc. are useful to have as well. After talking to Michael Donaghy, Superintendent of Street Lighting at the City of Boston Public Works Department, we learned that Boston has recently completed a full catalog of their streetlight assets in 2023. We acknowledge that many cities might not have this data available, in which case, `OpenStreetMap features <https://wiki.openstreetmap.org/wiki/Tag:highway%3Dstreet_lamp>`_ could be used to roughly estimate the streetlight locations. The Boston streetlight dataset can be viewed at the `following link <https://sdmaps.maps.arcgis.com/apps/dashboards/84e1553e754b424f9c544ab5079ed99f>`_.
The main information that the streetlights dataset should contain is the location of the streetlights. Additional information such as the type of bulb, last-replacement year, and wattage, etc. are useful to have as well. After talking to Michael Donaghy, Superintendent of Street Lighting at the City of Boston Public Works Department, we learned that Boston has recently completed a full catalog of their streetlight assets in 2023. We acknowledge that many cities might not have this data available, in which case, `OpenStreetMap features <https://wiki.openstreetmap.org/wiki/Tag:highway%3Dstreet_lamp>`_ could be used to roughly estimate the streetlight locations. The Boston streetlight dataset can be viewed at the `following link <https://sdmaps.maps.arcgis.com/apps/dashboards/84e1553e754b424f9c544ab5079ed99f>`_.

5 changes: 3 additions & 2 deletions docs/source/references/index.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
References
=========================
==========


.. toctree::
:maxdepth: 2
:caption: Contents:

utils/index
utils/index
socioeconomic/index
7 changes: 7 additions & 0 deletions docs/source/references/socioeconomic/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Socioeconomic Module
====================

.. toctree::
:maxdepth: 1

population
13 changes: 13 additions & 0 deletions docs/source/references/socioeconomic/population.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Population Density
==================

.. automodule:: night_light.socioeconomic.population
:members:
:undoc-members:
:show-inheritance:


.. automodule:: night_light.socioeconomic.population_density
:members:
:undoc-members:
:show-inheritance:
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,5 @@ geopandas~=1.0.1
folium~=0.18.0
pytest~=8.3.3
sphinx~=8.1.3
aiohttp~=3.10.10
pygris~=0.1.6
aiohttp~=3.10.10
150 changes: 150 additions & 0 deletions src/night_light/socioeconomic/population.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
from pygris import tracts
from pygris.data import get_census
from night_light.utils.fips import StateFIPS

mass_tracts = tracts(state="MA", cb=True, cache=True, year=2021)


def get_population(
year: int = 2021, state: StateFIPS = StateFIPS.MASSACHUSETTS, **kwargs
):
"""
Fetch population data for a given year and state using the ACS 5-year survey.
This function retrieves population data for a specific state and year from the
American Community Survey (ACS) 5-year dataset. It allows for additional filtering
by specific population segments, including seniors, youths, and disabled
individuals, if specified in the keyword arguments.
Args:
year (int, optional): The year for which to fetch the data. Defaults to 2021.
state (StateFIPS, optional): The state for which to fetch the data, using FIPS
code Enum. Defaults to StateFIPS.MASSACHUSETTS.
Keyword Args:
seniors (bool, optional): If True, fetches the senior population (ages 65+).
youths (bool, optional): If True, fetches the youth population (ages 0-17).
disabled (bool, optional): If True, fetches the disabled population.
Returns:
pd.DataFrame: A DataFrame containing the population data by census tract,
with columns for:
- "GEOID"
- "total_population"
- "senior_population" (if seniors=True)
- "youth_population" (if youths=True)
- "disabled_population" (if disabled=True)
"""
seniors = kwargs.pop("seniors", False)
youths = kwargs.pop("youths", False)
disabled = kwargs.pop("disabled", False)
variables = ["B01003_001E"]
if seniors:
variables.extend(
[
"B01001_020E",
"B01001_021E",
"B01001_022E",
"B01001_023E",
"B01001_024E",
"B01001_025E",
"B01001_044E",
"B01001_045E",
"B01001_046E",
"B01001_047E",
"B01001_048E",
"B01001_049E",
]
)
if youths:
variables.extend(
[
"B01001_003E",
"B01001_004E",
"B01001_005E",
"B01001_006E",
"B01001_027E",
"B01001_028E",
"B01001_029E",
"B01001_030E",
]
)

if disabled:
variables.append("B18101_001E")

df = get_census(
year=year,
variables=variables,
params={
"for": "tract:*",
"in": f"state:{state.value}",
},
dataset="acs/acs5",
return_geoid=True,
guess_dtypes=True,
)

df["total_population"] = df["B01003_001E"]
df.drop(columns=["B01003_001E"], inplace=True)
if seniors:
df["senior_population"] = (
df["B01001_020E"]
+ df["B01001_021E"]
+ df["B01001_022E"]
+ df["B01001_023E"]
+ df["B01001_024E"]
+ df["B01001_025E"]
+ df["B01001_044E"]
+ df["B01001_045E"]
+ df["B01001_046E"]
+ df["B01001_047E"]
+ df["B01001_048E"]
+ df["B01001_049E"]
)
df.drop(
columns=[
"B01001_020E",
"B01001_021E",
"B01001_022E",
"B01001_023E",
"B01001_024E",
"B01001_025E",
"B01001_044E",
"B01001_045E",
"B01001_046E",
"B01001_047E",
"B01001_048E",
"B01001_049E",
],
inplace=True,
)
if youths:
df["youth_population"] = (
df["B01001_003E"]
+ df["B01001_004E"]
+ df["B01001_005E"]
+ df["B01001_006E"]
+ df["B01001_027E"]
+ df["B01001_028E"]
+ df["B01001_029E"]
+ df["B01001_030E"]
)
df.drop(
columns=[
"B01001_003E",
"B01001_004E",
"B01001_005E",
"B01001_006E",
"B01001_027E",
"B01001_028E",
"B01001_029E",
"B01001_030E",
],
inplace=True,
)
if disabled:
df["disabled_population"] = df["B18101_001E"]
df.drop(columns=["B18101_001E"], inplace=True)

return df
66 changes: 66 additions & 0 deletions src/night_light/socioeconomic/population_density.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
from night_light.socioeconomic.population import get_population
from night_light.utils.fips import StateFIPS
from night_light.utils.shapes import get_tract_shapes


def get_population_density(
year: int = 2021, state: StateFIPS = StateFIPS.MASSACHUSETTS, **kwargs
):
"""
Calculate population density using ACS 5-year survey data.
This function retrieves population data and geographic shapes for census tracts
within the specified state, calculates the area of each tract, and computes
population density. Additional segments of the population (seniors, youths, and
disabled individuals) can be included if specified, providing corresponding density
values.
Args:
year (int, optional): The year for which to fetch the population data. Defaults
to 2021.
state (StateFIPS, optional): The state for which to calculate population
density, specified using FIPS code Enum. Defaults to
`StateFIPS.MASSACHUSETTS`.
Keyword Args:
seniors (bool, optional): If True, includes senior population density (ages
65+).
youths (bool, optional): If True, includes youth population density (ages 0-17).
disabled (bool, optional): If True, includes disabled population density.
Returns:
gpd.GeoDataFrame: A GeoDataFrame containing population density data by census
tract, with columns for:
- "GEOID"
- "area_km2"
- "total_population_density"
- "senior_population_density" (if seniors=True)
- "youth_population_density" (if youths=True)
- "disabled_population_density" (if disabled=True)
"""
df_pop = get_population(year, state, **kwargs)
tract_shapes = get_tract_shapes(state)
tract_shapes.to_crs("EPSG:3857", inplace=True)
tract_shapes["area_km2"] = tract_shapes.area / 1e6

df_pop_density = tract_shapes.merge(df_pop, how="inner", on="GEOID")
if kwargs.get("seniors"):
df_pop_density["senior_population_density"] = (
df_pop_density["senior_population"] / tract_shapes["area_km2"]
)

if kwargs.get("youths"):
df_pop_density["youth_population_density"] = (
df_pop_density["youth_population"] / tract_shapes["area_km2"]
)

if kwargs.get("disabled"):
df_pop_density["disabled_population_density"] = (
df_pop_density["disabled_population"] / tract_shapes["area_km2"]
)

df_pop_density["total_population_density"] = (
df_pop_density["total_population"] / tract_shapes["area_km2"]
)

return df_pop_density.to_crs("EPSG:4326")
55 changes: 55 additions & 0 deletions src/night_light/utils/fips.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
from enum import Enum


class StateFIPS(Enum):
ALABAMA = "01"
ALASKA = "02"
ARIZONA = "04"
ARKANSAS = "05"
CALIFORNIA = "06"
COLORADO = "08"
CONNECTICUT = "09"
DELAWARE = "10"
DISTRICT_OF_COLUMBIA = "11"
FLORIDA = "12"
GEORGIA = "13"
HAWAII = "15"
IDAHO = "16"
ILLINOIS = "17"
INDIANA = "18"
IOWA = "19"
KANSAS = "20"
KENTUCKY = "21"
LOUISIANA = "22"
MAINE = "23"
MARYLAND = "24"
MASSACHUSETTS = "25"
MICHIGAN = "26"
MINNESOTA = "27"
MISSISSIPPI = "28"
MISSOURI = "29"
MONTANA = "30"
NEBRASKA = "31"
NEVADA = "32"
NEW_HAMPSHIRE = "33"
NEW_JERSEY = "34"
NEW_MEXICO = "35"
NEW_YORK = "36"
NORTH_CAROLINA = "37"
NORTH_DAKOTA = "38"
OHIO = "39"
OKLAHOMA = "40"
OREGON = "41"
PENNSYLVANIA = "42"
RHODE_ISLAND = "44"
SOUTH_CAROLINA = "45"
SOUTH_DAKOTA = "46"
TENNESSEE = "47"
TEXAS = "48"
UTAH = "49"
VERMONT = "50"
VIRGINIA = "51"
WASHINGTON = "53"
WEST_VIRGINIA = "54"
WISCONSIN = "55"
WYOMING = "56"
8 changes: 8 additions & 0 deletions src/night_light/utils/mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@
""",
)

Choropleth = partial(
folium.Choropleth,
fill_color="YlOrRd",
fill_opacity=0.7,
line_opacity=0.2,
line_weight=0.1,
)

LAYER_STYLE_DICT = {
"fillColor": "cyan",
"color": "black",
Expand Down
14 changes: 14 additions & 0 deletions src/night_light/utils/shapes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
from pygris import tracts
from pygris import places

from night_light.utils.fips import StateFIPS


def get_tract_shapes(state: StateFIPS, year: int = 2021):
"""Get tract shapes data"""
return tracts(state=state.value, cb=True, cache=True, year=year)


def get_place_shapes(state: StateFIPS, year: int = 2021):
"""Get place shapes data"""
return places(state=state.value, cb=True, cache=True, year=year)
Loading

0 comments on commit 3f95e24

Please sign in to comment.