- Multi-Objective MDP formulation with objectives as thermal comfort and energy consumption
- Lagrangian dual reinforcement learning approach
- fine tuning left to do
- Single objective MDP of energy consumption, and thermal comfort enforced through hard constraint
- action bound approach
- in progress: Inferring change of environment to adjust the mask accordingly
- single objective reinforcement learning formulation (electric cost), with demand response
- demand response, Toronto Hydro electricity ToU (time of use)
- To reduce HVAC actuation load, CAPS action smoothing utilized
- Rainbow DQN
- DQN
- SAC
- PPO