This repository demonstrates how to automate Exploratory Data Analysis (EDA) using the ydata-profiling
library (formerly known as pandas-profiling). It simplifies the process of generating a comprehensive EDA report, saving time and ensuring a thorough analysis.
The tool provides the following capabilities:
- Type Inference: Automatically detects data types (Categorical, Numerical, Date, etc.).
- Warnings: Identifies data challenges like missing values, inaccuracies, skewness, and more.
- Univariate Analysis: Generates descriptive statistics (mean, median, mode, etc.) and visualizations like histograms.
- Multivariate Analysis: Includes correlation analysis, missing data summaries, duplicate rows detection, and pairwise variable interactions.
- Time-Series Analysis: Provides insights such as auto-correlation, seasonality, and ACF/PACF plots.
- Text Analysis: Detects most common categories, scripts, and blocks (e.g., Latin, ASCII).
- File & Image Analysis: Reviews file sizes, creation dates, dimensions, and EXIF metadata.
- Dataset Comparison: Quickly compares datasets in one line of code.
- Flexible Output Formats: Reports can be exported as:
- HTML: Easily shareable interactive reports
- JSON: Suitable for automation systems
- Jupyter Notebook Widgets
data/
: Contains sample datasets used for demonstration.notebooks/
: Jupyter Notebooks showcasing how to useydata-profiling
.output/
: Stores generated EDA reports.
For Pre-requisites & Running Code, Refer: https://github.com/ydataai/ydata-profiling
📊 Sample Output The output/ folder contains example reports generated with ydata-profiling.
Reports include: Data summary (missing values, duplicates, etc.) Visualizations (correlations, distributions, etc.) Detailed variable analysis
🎥 Credits: Big thanks to https://www.youtube.com/@CodeWithHarry for his excellent tutorial https://www.youtube.com/watch?v=sGQfiyXOvF0&t=1136s on pandas profiling, which inspired this project.
🤝 Contributing: Contributions are welcome! If you have suggestions, feel free to open an issue or submit a pull request.
📜 License: This project is licensed under the MIT License.
💬 Feedback: If you find this project helpful or have any questions, feel free to reach out!