This project is created as part of the Data Analytics Boot Camp at University of Toronto.
Following are the team members:
- Sara Parveen
- Lailah Libay
- Gus Mendes
- Jacob Evans
Repository: https://github.com/gusmendesbh/project3-group1
GitHub Pages: https://gusmendesbh.github.io/project3-group1/html/teams.html
Flask-AWS: https://flask-service.ofeg1bv1af188.ca-central-1.cs.amazonlightsail.com/index.html
Football is one of the most popular sports in the world with a large viewing globally. There is a growing need for a platform that could cater for the needs of football enthusiasts and analysts. This project is aimed at creating an interactive dashboard that could serve as a one-point stop for National Football League (NFL) fans seeking information and these are our target audience. Some other people who would be interested in the insights represented on the dashboard such as sports analysts, NFL Sports Agents and Football coaches. The dashboard displays overall Team and Athlete details for the year 2022 and also presents visualizations for teams performance statistics from 2002 onwards to 2023.
The interactive dashboard has three views:
-
Main Dasboard
This view is the default view that loads when the dashboard is launched. It displays a static view of some key details of the 2022 game and an interactive map plotting team venues.
-
Teams Dashboard
The Teams view has a drodown containing list of all the teams that played in the 2022 season. The dashboard displays team logo, basic team data and results metrics for 2022. It also includes three additional visualizations displaying scores, yards and attempts statistics from 2002 to 2023. The dashboard updates every time a different team is selected from the dropdown.
-
Athletes Dashboard
The Athletes view has a drodown containing list of all the athletes that played in the 2022 season. The dashboard displays athletes's headshot and their demographic information which update every time a different athlete is selected from the dropdown. This view also includes two charts showing Height Vs. Weight, and Age vs. Experience statistics of all athletes which remain static.
The data for this project is taken from a combination of web scraping using APIs, CSV and Mapping sources. Following are the key data sources we have used:
-
This link contains NFL API endpoints which contain 2022 data only. We mainly used the Athletes and Teams endpoints to get informative and overall statistics data.
-
The Kaggle dataset was downloaded as CSV and it contains detailed match statistics for teams from 2002 to 2023. The CSV has 5642 rows and 39 columns.
-
This was used for geocoding and getting latitude and longitude information for team venues.
Following are some other data sources which were consulted and reviewed to get ideas about the required data:
The team members collaborated on different steps of the project to optimize the workflow. Some steps were completed by working together and the others were assigned to each teammate to make best use of teh available resources and time.
Following are some of the key steps that were involved in building the dashboard:
The data was scraped from ESPN API using requests. We reviewed the available data and only extracted the key-value pairs that we needed. For loops were used to loop create URLs for each team. There were a total of 32 teams.
Similar methodology was used to extrach athletes data from the API by using embedded loops. There were three categories of athletes: offensive, defensive and special. The extracted data included the player type to use for filtering when needed. This created a total of 2158 records.
The statistics data was loaded using the CSV file which is saved in Resources folder.
The team venue data was created by scraping data from ESPN API and merging it with location coordinates extracted using GeoPy. This created 32 records in total each corresponding to the 32 teams that participated in 2022 NFL season.
All the extracted data was converted to Pandas Dataframes. This was then saved to CSV and JSON formats to be able to use for creating the visualizations later. The extraction and transformation process is documented in the Jupyter Notebooks stored in python folder.
An ERD is created for the databse using QuickDBD.
An SQL database is then created using SQL, specifically PostgreSQL to load the data for easy retrieval and manipulation. The CSV files were imported into the tables created in the SQL. Following are some screenshots showing tables in SQL:
HTML and CSS is developed by taking inspiration from a template on bootstrap. The template was customized to fulfill the needs of crerating the NFL Dashboard by personalizing the HTML and CSS.
HTM was used to integrate the data retrieval and manipulation logic into the UI using JavaScript mainly.
Three JavaScript files are developed to create visualizations for the dashboard.
- The Main dashboard has visualizations created using D3 and Leaflet library. The interactive map was created data extracted from GeoPy mainly.
Following are some screenshots of the Main dashboard:
- The Teams and Athletes dasboards have visualizations created using D3 and Highcharts. Highcharts is a new JS library which was not covered during class.
Following are some screenshots of the Teams dashboard:
Following are some screenshots of the Athletes dashboard:
The visualizations are customized by adding features such as labels, legends, tooltips, and interactivity.
In order to deploy the dashboard to web, we created a Python Flask application. In the Flask application, a few routes were added to display teams and athletes’ pages depending on the requested URL. A debugger was also included when running the application in development mode to enable quick identification and fixing of errors.
After developing the Flask app, we deployed it to AWS. It is also deployed to GitHub Pages. Following are the links to access it on AWS and GitHub:
This was a very challenging project for all the team members and it gave us a lot of opportunities to learn new skills and technologies and help each other to troubleshoot at different steps. With collaboration, we were able to successfully perform ETL, convert to a SQL databse, develop HTML/ CSS, prepare JavaScript to help create visualizations, and finally deploy the dashboard to web using Flask and render it using gunicorn.