Mexico-Gas-Prices

In Mexico, the price of gasoline is constantly changing since its prices depend a lot on the prices of barrel, transfer and competitors. That is why keeping track of prices can save money when looking for the best price to fill your tank.

The task

The objective of this project is proces this fuel prices data obtained from CRE. (Comision Regulatoria de Energía en México) in the cloud to storage a daily basis in Storage and in Big Query Data Warehouse

Tools that were used to accomplish this task:

Python
- Pandas
- Requests
- reverse_geocoder
- unidecode
- pyarrow
- google-cloud-bigquery
- google-cloud-storage
- gcsfs
Google Cloud Storage
BigQuery
Cloud Functions

Big Query Data

The schema of the data is as follows:

[ 
  {
    "mode": "REQUIRED",
    "name": "Permiso",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Marca",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Nombre",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Direccion",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Producto",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Precio",
    "type": "FLOAT64"
  },
  {
    "mode": "REQUIRED",
    "name": "Estado",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Municipio",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Fecha",
    "type": "DATE"
  },
  {
    "mode": "REQUIRED",
    "name": "y",
    "type": "FLOAT64"
  },
  {
    "mode": "REQUIRED",
    "name": "x",
    "type": "FLOAT64"
  }
]

Process

The procedure that was taken to process the information is as follows:

Get xml element tree from CRE web page
Struct data in json objects to then convert it to pandas dataframe
Transforms data with pandas and other libraries to insert geo coordinates and insert new columns
Store transformed data to storage
Send the final dataframe to Big Query

There are scripts to run all localy to test the data transformations. This proccess runs at 7:30 am with a cloud scheduler using pubsub.

Architecture

Here is the GCP architecture, this shows the resources that are used

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Old Transformation		Old Transformation
Resources		Resources
.gitignore		.gitignore
README.md		README.md
local.ipynb		local.ipynb
local.py		local.py
main.py		main.py
old_main.py		old_main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mexico-Gas-Prices

The task

Big Query Data

Process

Architecture

About

Releases

Languages

Enr1que319/Mexico-Gas-Prices

Folders and files

Latest commit

History

Repository files navigation

Mexico-Gas-Prices

The task

Big Query Data

Process

Architecture

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages