This project involves analyzing global e-commerce trends and their impact on traditional retail. The analysis includes data preprocessing, outlier detection, Principal Component Analysis (PCA), Customer Lifetime Value (CLV) calculation, and a What-if analysis to simulate the effect of different pricing strategies.
-
Datasets:
1.csv
,2.csv
,3.csv
: Raw e-commerce and retail datasets containing information about sales, customer transactions, and more.tashi.csv
: A combined and preprocessed dataset that merges1.csv
,2.csv
, and3.csv
. Downloadpca_transformed_data.csv
: Dataset after applying PCA for dimensionality reduction. Download
-
Reports:
Report.pdf
: A comprehensive report detailing the analysis, visualizations, and insights.Pre-Processing_Insights.pdf
: A detailed document explaining the preprocessing steps taken, including outlier detection and handling missing values.
-
Notebook:
code.ipynb
: Jupyter notebook containing Python code for data preprocessing, PCA, CLV calculation, and visualizations.
- Data Integration: Datasets
1.csv
,2.csv
, and3.csv
are combined into a single datasettashi.csv
. - Handling Missing Values: Missing values in numerical columns were filled using the mean of each column.
- Outlier Detection and Removal: Outliers were detected using the Z-score method with a threshold of 2.5. These outliers were removed from the dataset.
- Normalization: Numerical columns were normalized using
MinMaxScaler
. - Label Encoding: Categorical columns were label encoded using
LabelEncoder
. - Feature Creation:
Total_Sales
: Calculated by multiplyingUnitPrice
andQuantity
.Discount_Effectiveness
: Ratio ofdiscount_amount
toTotal_Sales
.Sales_per_Customer
: Sum of total sales per customer.
-
Principal Component Analysis (PCA): Applied to the normalized dataset to reduce dimensionality while retaining 80% of the variance.
-
Visualizations:
- CLV Calculation: CLV was calculated for each customer based on average purchase value, purchase frequency, and retention rate.
- CLV Visualization:
- Price Change Simulations: The effect of different price changes on CLV was simulated by modifying the
UnitPrice
variable. The results were visualized using line plots and histograms. - Visualization of Impact: Line plots and heatmaps were created to illustrate the impact of different
UnitPrice
multipliers on CLV and total sales.
- Open the Jupyter notebook
code.ipynb
in any Jupyter environment (e.g., JupyterLab, Google Colab). - Run the code cells in sequence to preprocess the data, apply PCA, calculate CLV, and generate the visualizations.
- Ensure that
1.csv
,2.csv
, and3.csv
are available in the working directory. - The notebook generates
tashi.csv
andpca_transformed_data.csv
as outputs.
- Dimensionality Reduction: PCA effectively reduced the dataset dimensions while preserving most of the data's variance.
- CLV Segmentation: Customer Lifetime Value was calculated and segmented, highlighting differences in customer value across groups.
- Impact of Price Changes: The What-if analysis demonstrated how changes in
UnitPrice
affect CLV, providing insights into potential pricing strategies.
- Scatter Plot of Principal Components: Highlights clusters or separations in the dataset after applying PCA.
- Boxplot of CLV Segments: Showcases the distribution of CLV across different customer segments.
- Violin Plot of CLV Segments: Visualizes the density of CLV values across segments.
- Heatmaps: Used to display the relationship between CLV, total sales, and
UnitPrice
changes. - Histograms: Demonstrate the frequency of CLV values for various price multipliers in the What-if analysis.
For any questions or suggestions, feel free to contact at [abbasitashfeen7@gmail.com]