Skip to content

Project that runs in GCP using Big Query and Cloud Functions services, this pipeline is an ETL process that leaves the information ready to apply different analyzes.

Notifications You must be signed in to change notification settings

Enr1que319/Mexico-Gas-Prices

Repository files navigation

Mexico-Gas-Prices

In Mexico, the price of gasoline is constantly changing since its prices depend a lot on the prices of barrel, transfer and competitors. That is why keeping track of prices can save money when looking for the best price to fill your tank.

The task

The objective of this project is proces this fuel prices data obtained from CRE. (Comision Regulatoria de Energía en México) in the cloud to storage a daily basis in Storage and in Big Query Data Warehouse

Tools that were used to accomplish this task:

  • Python
    • Pandas
    • Requests
    • reverse_geocoder
    • unidecode
    • pyarrow
    • google-cloud-bigquery
    • google-cloud-storage
    • gcsfs
  • Google Cloud Storage
  • BigQuery
  • Cloud Functions

Big Query Data

The schema of the data is as follows:

[ 
  {
    "mode": "REQUIRED",
    "name": "Permiso",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Marca",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Nombre",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Direccion",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Producto",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Precio",
    "type": "FLOAT64"
  },
  {
    "mode": "REQUIRED",
    "name": "Estado",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Municipio",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Fecha",
    "type": "DATE"
  },
  {
    "mode": "REQUIRED",
    "name": "y",
    "type": "FLOAT64"
  },
  {
    "mode": "REQUIRED",
    "name": "x",
    "type": "FLOAT64"
  }
]

Process

The procedure that was taken to process the information is as follows:

  • Get xml element tree from CRE web page
  • Struct data in json objects to then convert it to pandas dataframe
  • Transforms data with pandas and other libraries to insert geo coordinates and insert new columns
  • Store transformed data to storage
  • Send the final dataframe to Big Query

There are scripts to run all localy to test the data transformations. This proccess runs at 7:30 am with a cloud scheduler using pubsub.

Architecture

Here is the GCP architecture, this shows the resources that are used