At one time or another, almost all of us have used an Uber or other transportation service in this digital age to take a ride. Ridesharing services are services that use online-enabled platforms to connect between passengers and local drivers using their personal vehicles.
In most cases they are a convenient method for door-to-door transportation. They are generally cheaper than using licensed cabs. Examples of ridesharing services include Uber, Cabify, Beat, Didi, etc.
To improve the efficiency of cab dispatch systems for such services, it is important to be able to predict how long a driver will have their cab occupied. If a dispatcher knew approximately when a cab driver would finish their current trip, they could better identify which driver to assign to each pickup request.
This project worked with a dataset published by the New York City Taxi and Limousine Commission, which includes pickup time, geographic coordinates, number of passengers among other variables. The goal of this project is to predict the total duration of cab trips in New York City.
👉 The dataset used for this analysis was downloaded here
💻📚 Libraries used: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn.
🔬🎯 Applied models: Linear Regression, Regression Tree, Regression XGBoost and Regression KNN.
👀:bar_chart: Previews: