🚖Project: Uber Fare Rate Prediction in New York City using Regression Models

👉 Google Colab Code: https://colab.research.google.com/drive/1H3pNjBhPNxVNt37EQMbkHRW_-k2zOkN7?usp=sharing

👉 GitHub Code: https://github.com/Shibli-Nomani/project--Uber-Fare-Prediction-in-New-York-City-/blob/main/project_Uber_Fare_in_New_York_City_Dataset.ipynb

👉 Kaggle Code: https://www.kaggle.com/code/shiblinomani/uber-fare-rate-prediction-in-new-york-city

🐍 About Dataset:

The project is about on world's largest taxi company Uber inc. In this project, we're looking to predict the fare for their future transactional cases. Uber delivers service to lakhs of customers daily. Now it becomes really important to manage their data properly to come up with new business ideas to get best results. Eventually, it becomes really importaant to estimate the fare prices accurately. The dataset is containing 200000 trips of users in New York City, USA

📌 dataset link: https://www.kaggle.com/datasets/shiblinomani/uber-fare-newyorkcity

🤖 Machine Learning:

A branch of AI where systems learn from data to make decisions or predictions without explicit programming.

📚 Supervised Machine Learning:

Training a model on labeled data, where inputs are paired with corresponding outputs, to make predictions or classifications.

🔋 Regression:

Predicting continuous outcomes, like predicting house prices based on features such as size and location.

📊 Example: Predicting stock prices based on historical data.

🎯 Classification:

Assigning categories or labels to inputs based on their features.

🔍 Example: Classifying emails as spam or non-spam based on their content and features.

♎ Different Types Python Libraries for this project

📊 Pandas: Data manipulation and analysis library.

➕ NumPy: Mathematical computing library for arrays and matrices.

📈 Matplotlib and Seaborn: Visualization libraries for creating static plots.

🗺️ Geopandas and Shapely: Libraries for working with geospatial data and geometries.

📊 Plotly: Interactive visualization library.

📍 Geopy: Library for calculating distances based on latitude and longitude.

📅 DateTime Conversion:

datetime: Library for handling dates and times.

🔢 Data Preprocessing:

train_test_split, StandardScaler: Functions for splitting data and scaling features.

➡️ Dataset Operations:

train_test_split, SMOTE, StandardScaler: Tools for splitting, sampling, and normalizing data.

📈 Regression Models:

LinearRegression, Lasso, Ridge, KNeighborsRegressor, XGBRegressor, RandomForestRegressor: Various regression algorithms for modeling relationships between variables.

📈 LinearRegression: Fits a straight line to the data, suitable for linear relationships.

🔍 Lasso: Performs feature selection by penalizing coefficients to zero, helpful for reducing overfitting and selecting important features.

🏞️ Ridge: Reduces model complexity and multicollinearity by adding L2 regularization term, preventing overfitting.

🤝 KNeighborsRegressor: Predicts based on the average of the 'k' nearest neighbors, robust for non-linear relationships.

🌳 XGBRegressor: Implements gradient boosting, boosting ensemble technique, enhancing prediction accuracy.

🌲 RandomForestRegressor: Constructs multiple decision trees and averages predictions, robust against overfitting and noise.

❌ Error Metrics:

mean_absolute_error, r2_score, explained_variance_score: Metrics for evaluating model performance.

🔍 Hyperparameter Tuning:

GridSearchCV: Tool for finding the best parameters through exhaustive search.

🔢 Data Evaluation:

StratifiedKFold: Cross-validation technique for evaluating dataset performance.

💾 Model Saving:

joblib: Library for saving and loading models.

⚠️ Warnings:

warnings: Library for managing warnings, suppressing them in this case.

✨ Model Evaluation:

Explained Variance Score: 📊

Measures the proportion of variance in the target variable that is explained by the model. Good value: Closer to 1, indicating a better fit.

Mean Absolute Error (MAE): 🔍

Average of absolute differences between predicted and actual values, representing model accuracy. Good value: Lower, with 0 being perfect accuracy.

R-squared (R2): 📈

Represents the proportion of variance in the dependent variable that is explained by the independent variables. Good value: Closer to 1, indicating a better fit of the model to the data.

⭐ Summary

Our KNN (K Nearest Neighbors) for Uber fare prediction shows good performance as compared to others. Through meticulous parameter tuning with GridSearchCV, we optimized the model to deliver better accurate fare estimates. Moving forward, future enhancements could include exploring additional features, added different models, fine tuning, training with higher infrastructure, integrating real-time data, and enhancing user experience with a user-friendly interface. 🚖🔍🚀

Authors

@LinkedIn Khan MD Shibli Nomani

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
project_Uber_Fare_in_New_York_City_Dataset.ipynb		project_Uber_Fare_in_New_York_City_Dataset.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚖Project: Uber Fare Rate Prediction in New York City using Regression Models

🐍 About Dataset:

🤖 Machine Learning:

📚 Supervised Machine Learning:

🔋 Regression:

🎯 Classification:

♎ Different Types Python Libraries for this project

✨ Model Evaluation:

⭐ Summary

Authors

About

Releases

Packages

Languages

Shibli-Nomani/project--Uber-Fare-Prediction-in-New-York-City-

Folders and files

Latest commit

History

Repository files navigation

🚖Project: Uber Fare Rate Prediction in New York City using Regression Models

🐍 About Dataset:

🤖 Machine Learning:

📚 Supervised Machine Learning:

🔋 Regression:

🎯 Classification:

♎ Different Types Python Libraries for this project

✨ Model Evaluation:

⭐ Summary

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages