Skip to content

Latest commit

 

History

History
71 lines (45 loc) · 4.51 KB

README.md

File metadata and controls

71 lines (45 loc) · 4.51 KB

Analysis on Online Retails data

Raw dataset: Online Retail.xlsx

Purpose Skills Dataset Notebook Generated New Dataset
Data preprocessing and understanding the data Data cleaning, feature engineering, EDA, and generate category with AI Online Retail.xlsx EDA-Online-Retail.ipynb cleaned_Online_Retail.csv
Explore UK products Product analysis and profitability insights cleaned_Online_Retail.csv products_return.ipynb UK_return_rate.csv
Explore customer data Customer retention analysis, Generating E-commerce metrics, RFM analysis, and Statistical analysis cleaned_Online_Retail.csv Customer_Analysis.ipynb rmf.csv, cohort_data.csv
Explore customer behavior Churn analysis, price analysis, Customer Segmentation (with K-Means) rmf.csv, cohort_data.csv customer_behaviour_analysis.ipynb customer_behaviour.csv

Note: I have utilized both OpenAI and Choq (with the Llama3-8B-8192 model) to generate a new category column based on the text data. While OpenAI provides more accurate results, it operates on a paid-per-prompt basis. Choq offers a limited number of free prompts but also requires payment for extended usage. The code for generating the new category column has been included in the EDA-Online-Retail.ipynb notebook. If you'd like to generate the data yourself, you can insert your API key and run the code.

EDA

Business insight are listed inside the ipynb file.

Monthly Total Revenue and Number of Orders by Country:

螢幕截圖 2024-08-26 下午7 45 26

Weekday vs weekend purchase

螢幕截圖 2024-08-26 下午7 46 00

Top 20 Customers by Total Purchases and Total Spend

image

Top 20 Customers by Total Spend and Their Number of Purchases

image

Daily Sales Trends by Country - UK

螢幕截圖 2024-08-26 下午7 49 09

Most Selling 20 Products

image

Correlation

image

Top 10 Return Rate Products

螢幕截圖 2024-08-26 下午7 59 56

Top 10 Performers by Sales Volume

螢幕截圖 2024-08-26 下午8 01 20

Customer Retention by Monthly

image

New Customers Acquired Each Day by Country

螢幕截圖 2024-08-26 下午8 02 24

RFM Score: it has been weighted

image

Multiple Linear Regression

螢幕截圖 2024-08-26 下午8 07 54

Number of Customers Who Churned at Different Time Periods

image

Average Product Ratios by Country: it has been weighted

image

Elbow Method to detect the best number of clusters for KNN (n_clusters=4)

image

Cluster the customer based on the RFM and product price ratio

螢幕截圖 2024-08-26 下午8 11 03