👉 Google Colab Code: https://colab.research.google.com/drive/1H3pNjBhPNxVNt37EQMbkHRW_-k2zOkN7?usp=sharing
👉 Kaggle Code: https://www.kaggle.com/code/shiblinomani/uber-fare-rate-prediction-in-new-york-city
The project is about on world's largest taxi company Uber inc. In this project, we're looking to predict the fare for their future transactional cases. Uber delivers service to lakhs of customers daily. Now it becomes really important to manage their data properly to come up with new business ideas to get best results. Eventually, it becomes really importaant to estimate the fare prices accurately. The dataset is containing 200000 trips of users in New York City, USA
📌 dataset link: https://www.kaggle.com/datasets/shiblinomani/uber-fare-newyorkcity
A branch of AI where systems learn from data to make decisions or predictions without explicit programming.
Training a model on labeled data, where inputs are paired with corresponding outputs, to make predictions or classifications.
Predicting continuous outcomes, like predicting house prices based on features such as size and location.
📊 Example: Predicting stock prices based on historical data.
Assigning categories or labels to inputs based on their features.
🔍 Example: Classifying emails as spam or non-spam based on their content and features.
📊 Pandas: Data manipulation and analysis library.
➕ NumPy: Mathematical computing library for arrays and matrices.
📈 Matplotlib and Seaborn: Visualization libraries for creating static plots.
🗺️ Geopandas and Shapely: Libraries for working with geospatial data and geometries.
📊 Plotly: Interactive visualization library.
📍 Geopy: Library for calculating distances based on latitude and longitude.
📅 DateTime Conversion:
datetime: Library for handling dates and times.
🔢 Data Preprocessing:
train_test_split, StandardScaler: Functions for splitting data and scaling features.
➡️ Dataset Operations:
train_test_split, SMOTE, StandardScaler: Tools for splitting, sampling, and normalizing data.
📈 Regression Models:
LinearRegression, Lasso, Ridge, KNeighborsRegressor, XGBRegressor, RandomForestRegressor: Various regression algorithms for modeling relationships between variables.
📈 LinearRegression: Fits a straight line to the data, suitable for linear relationships.
🔍 Lasso: Performs feature selection by penalizing coefficients to zero, helpful for reducing overfitting and selecting important features.
🏞️ Ridge: Reduces model complexity and multicollinearity by adding L2 regularization term, preventing overfitting.
🤝 KNeighborsRegressor: Predicts based on the average of the 'k' nearest neighbors, robust for non-linear relationships.
🌳 XGBRegressor: Implements gradient boosting, boosting ensemble technique, enhancing prediction accuracy.
🌲 RandomForestRegressor: Constructs multiple decision trees and averages predictions, robust against overfitting and noise.
❌ Error Metrics:
mean_absolute_error, r2_score, explained_variance_score: Metrics for evaluating model performance.
🔍 Hyperparameter Tuning:
GridSearchCV: Tool for finding the best parameters through exhaustive search.
🔢 Data Evaluation:
StratifiedKFold: Cross-validation technique for evaluating dataset performance.
💾 Model Saving:
joblib: Library for saving and loading models.
warnings: Library for managing warnings, suppressing them in this case.
Explained Variance Score: 📊
Measures the proportion of variance in the target variable that is explained by the model. Good value: Closer to 1, indicating a better fit.
Mean Absolute Error (MAE): 🔍
Average of absolute differences between predicted and actual values, representing model accuracy. Good value: Lower, with 0 being perfect accuracy.
R-squared (R2): 📈
Represents the proportion of variance in the dependent variable that is explained by the independent variables. Good value: Closer to 1, indicating a better fit of the model to the data.
Our KNN (K Nearest Neighbors) for Uber fare prediction shows good performance as compared to others. Through meticulous parameter tuning with GridSearchCV, we optimized the model to deliver better accurate fare estimates. Moving forward, future enhancements could include exploring additional features, added different models, fine tuning, training with higher infrastructure, integrating real-time data, and enhancing user experience with a user-friendly interface. 🚖🔍🚀