❗ This repository has been archived. I recommend using S3 Inventory instead. https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html
Processes S3 Inventory Manifests and generates a report about the folder size and object size average
First configure S3 to create inventory for your bucket/s.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-inventory.html
Once completed install the requirements file and use the CLI. Not required if running on any platform with Pandas, like Airflow. This script uses PyArrow for loading the CSV, ORC and Parquet files.
pip install -r requirements.txt
This tool is a simple CLI script. By default the information is displayed on the screen, but if a out file is set a CSV file is created either locally or on S3.
./s3_inventory_report.py \
-m s3://inventory-bucket/production-bucket/Daily/2022-07-24T00-00Z/ \
-o s3://archive-bucket/2022-07-24.csv \
-d 3 \
-c ./cache/
The CSV will have a header with the following fields:
- Folder - Name of folder in the S3 Bucket.
- Count - Number of Objects in the folder.
- Size - The total size of all Objects the folder in Bytes.
- DelSize - The size of the Objects with deletion marker only.
- VerSize - The size of the Objects that are a noncurrent version only.
- AvgObject - Average Object size of the objects in the Folder.
- Depth - The depth of the folder.