Solution writeup: Public 71st Solution Writeup (Private 39th)
In this competition, competitors need to build ML models to predict energy production and consumption patterns of prosumers in Estonia. Specifically speaking, our solution is mainly composed of lightweight feature engineering with conservative selection, target engineering, and a large model pool with simple ensemble.
You need to download the dataset following the instruction on the data tab,
kaggle competitions download -c predict-energy-behavior-of-prosumers
Then, you can unzip the dataset and put the raw data under ./data/raw/
.
To support iterative model development, you can run the following commands to generate reusable processed data including the complete feature set,
python -m data.preparation.gen_data
After the process finishes, a file base_feats.parquet
will be dumped under ./data/processed/
.
With hydra
-based configuration system, it's easy to modify configuration setup and do iterative experiments. Each experiment is mainly controlled via data and model configuration. After setup, you can train models by running,
# Train production model with raw target
python -m tools.main_ml +model_type="p_raw" 'data.tgt_types=[prod]' data.dp.tgt_col="target"
# Train consumption model with target minus target_lag2d
python -m tools.main_ml +model_type="c_raw" 'data.tgt_types=[cons]' data.dp.tgt_col="target_diff_lag2d" 'data.dp.tgt_aux_cols=[target_lag2d]'
# Train domestic consumption model with target divided by installed_capacity
python -m tools.main_ml +model_type="cc_dcap" 'data.tgt_types=[cons_c]' data.dp.tgt_col="target_div_cap_lag2d" 'data.dp.tgt_aux_cols=[installed_capacity_lag2d]'
The output objects (e.g., models, log file train_eval.log
, feature importance feat_imps.parquet
) will be dumped under the path ./output/<%m%d-%H_%M_%S>/
.
After models are trained, you can upload model objects to Kaggle for online inference by following steps,
- Initialize Kaggle datasets.
kaggle datasets init -p ./output/<exp_id-goes-here>/
- Fill dataset metadata in
./output/<exp_id-goes-here>/dataset-metadata.json
. - Create Kaggle dataset and upload.
kaggle datasets create -p ./output/<exp_id-goes-here>/ -r zip # Choose compressed upload
After uploading, you can add the corresponding dataset into the inference notebook for submission.
We focus on local cross-validation following chronological order and observe whether the result is sync with public LB or not.
CV and LB scores (still waiting...) are shown as follows,
CV Fold2 (202209 ~ 202211) | CV Fold1 (202210 ~ 202301) | CV Fold0 (202303 ~ 202305) | 3-Fold Avg | Public LB (202306 ~ 202308) | Private LB (202402 ~ 202404) | |
---|---|---|---|---|---|---|
MAE | 30.47 | 27.06 | 51.32 | 36.29 | x | 59.20 |
MAE | 29.71 | 26.32 | 51.00 | 35.68 | x | 58.29 |