Emissions-by-Country

By Ruben Giosa, Reed Carter, Chloe (Yen Chi) Le

This repo showcases working as a team to build an ETL pipeline and create visualizations using Python, SQL, BigQuery and Looker Studio. Below is a snapshot of the visuals created by Ruben Giosa from this project, which is outlined in more detail further down in the README.md for each portion:

Technologies Used

BigQuery
Looker Studio
Python
SQL
Pandas
Git
Markdown
NumPy
.gitignore
requirements.txt

Datasets Used

Description

This repo includes an ETL pipeline leveraging the different datasets highlighted above. The team began by narrowing down our datasets and then outlining 1) the architectural structure 2) data model and 3) the full ETL pipeline to then construct each respective portion as a team:

Architectural diagram:

Data Model:

Why Snowflake vs Star Schema:

The team debated the merits of leveraging a snowflake schema vs star schema, while we acknowledged the efficiencies of a star schema that would be achieved by consolidating fact tables, we landed on a snowflake schema based on the size of the data sets not having a meaningful impact on performance (resulting from need for additional joins) and opting for the organization of data being more user friendly and intuitive.

Data Pipeline:

ETL Construction:

Reed constructed the dim_country table in the REED.ipynb notebook. REED performed profiling, cleaning and transformations on the Latitude and Longitude for Every Country and State and Land area (sq. km) datasets. Upon completion it was loaded to BigQuery.

Chloe worked on profiling, cleaning and transformations for the Emissions by Country data set create the fct_emissions table. Upon completion it was loaded to BigQuery.

Ruben performed profiling, cleaning and transformations on the World Energy Consumption to compile both the fct_gdp and fct_consump tables. Upon completion it was loaded to BigQuery. He also owned and authored the README.md.

Visualizations:

Once the datasets were cleaned and consolidated, the team created data visualizations and analysis (using Looker Studio) leveraging the constructed dimension and fact tables outlined above.

Below is a combo chart that was put together by Ruben that shows GDP compared to Population and Energy Consumption (click on image of chart to use dashboard):

While Github disables iframe, which allows embedding of the report on markdown files, I have included the code below for users that clone the project.

<iframe width="600" height="450" src="https://datastudio.google.com/embed/reporting/dbe92c8b-ccd3-41d9-b269-5964eb9717c3/page/f94CD" frameborder="0" style="border:0" allowfullscreen></iframe>

The scale of GDP (trillions of dollars), population (billions), and Energy consumption (thousands of terawatt-hours) posed an issue for the visualization, but by embedding this report the user is able to see the individual values for energy consumption which shows the consistent trend that population and energy consumption growth align with the growth of GDP. The chart is dynamic in that it allows users to filter for country and timeframe. Also the timeframe of 1965 through 2016 was chosen as consumption data prior to 1965 was missing and GDP data goes up to 2016.

Below is a line chart by Ruben that shows total global energy consumption compared to CO2 emissions (click on image of chart to use dashboard):

As called out above Github disables iframe, but I have included the code below for reference:

<iframe width="600" height="450" src="https://datastudio.google.com/embed/reporting/b7d972c6-7faf-4c78-948c-614945f42350/page/Io6CD" frameborder="0" style="border:0" allowfullscreen></iframe>

The chart leverages different scales for the left and right y-axis to better show the correlation between emissions and energy consumption over time (1965 - 2019). The chart is dynamic in that it allows users to filter for country and timeframe.

Chloe put together two line graphs that 1) plots the global CO2 emissions over time with total emissions and each type of emission producer and 2) total consumption compared to renewable consumption (click on the image of either chart to use dashboard):

Reed put together a line graph plotting the world GDP compared to coastline size and population:

Overall, the team was able to limit the amount of merge conflicts by working on independent notebooks and assigning different tasks (e.g. Each focused on constructing specific dimension and fact tables, etc.). One challenge we came across was setting up a BigQuery project and granting access to each user, this was a great learning experience for the team as we set up Service Accounts with authorization keys for each user.

Setup/Installation Requirements

Go to https://github.com/Reed-Carter/Emissions-by-Country.git to find the specific repository for this website.
Then open your terminal. I recommend going to your Desktop directory:
```
cd Desktop
```

Then clone the repository by inputting:

git clone https://github.com/Reed-Carter/Emissions-by-Country.git

Go to the new directory or open the directory folder on your desktop:
```
cd Emissions-by_Country
```
Once in the directory you will need to set up a virtual environment in your terminal:
```
python3.7 -m venv venv
```
Then activate the environment:
```
source venv/bin/activate
```
Install the necessary items with requirements.txt:
```
  pip install -r requirements.txt
```
Download the necessary csv files listed in the Datasets Used section
With your virtual environment now enabled with proper requirements, open the directory:
```
code .
```
Upon launch please update the Google Cloud client and project details to configure it to load to your project

Known Bugs

No known bugs

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
data		data
imgs		imgs
models		models
.gitignore		.gitignore
README.md		README.md
Reed.ipynb		Reed.ipynb
cl_edits.ipynb		cl_edits.ipynb
requirements.txt		requirements.txt
rg_work.ipynb		rg_work.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emissions-by-Country

By Ruben Giosa, Reed Carter, Chloe (Yen Chi) Le

This repo showcases working as a team to build an ETL pipeline and create visualizations using Python, SQL, BigQuery and Looker Studio. Below is a snapshot of the visuals created by Ruben Giosa from this project, which is outlined in more detail further down in the README.md for each portion:

Technologies Used

Datasets Used

Description

Architectural diagram:

Data Model:

Why Snowflake vs Star Schema:

Data Pipeline:

ETL Construction:

Visualizations:

Setup/Installation Requirements

Known Bugs

License

About

Releases

Packages

Languages

rgiosa10/Emissions-by-Country

Folders and files

Latest commit

History

Repository files navigation

Emissions-by-Country

By Ruben Giosa, Reed Carter, Chloe (Yen Chi) Le

This repo showcases working as a team to build an ETL pipeline and create visualizations using Python, SQL, BigQuery and Looker Studio. Below is a snapshot of the visuals created by Ruben Giosa from this project, which is outlined in more detail further down in the README.md for each portion:

Technologies Used

Datasets Used

Description

Architectural diagram:

Data Model:

Why Snowflake vs Star Schema:

Data Pipeline:

ETL Construction:

Visualizations:

Setup/Installation Requirements

Known Bugs

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages