Ranked Reward Reinforcement Learning (Bin Packing Problem)

This contains experimental code for the Bin Packing Problem with PPO and ranked reward MCTS methods. Bin packing problem is a combinatorial optimization problem consisting of a set of bins of varied sizes to be filled into a container. There are multiple versions of the bin packing problem. This repository considers the offline problem where we have a fixed size container and all bins to be fit into the container are visible throughout. The goal is to fit the maximum number of bins possible into the container. Currently it implements a 2D problem (which can be extended) where the state space consists of the features of all items {id, size and placement(if already placed)} and a set of feasible actions. There are constraints on the actions as the items cannot be placed when not supported below or go beyond the container. The actions available are based on the left over items to be filled and valid placements in the container. The problem sets are generated using a method similar to that mentioned in the ranked reward paper. This considers a discrete action space. At each step, the agent selects an item and its placement in the container. The reward is +1 for fitting all the items in the container, -1 when no actions are feasible and there are items remaining to be filled and 0 otherwise. An episode either ends when all items are placed or no more items can be placed.

The policy and value network architecture is based on Ranked Reward. There are assumptions made about the architecture as the exact architecture is not mentioned in the paper. Currently there are two implementations:

using a PPO agent
using the ranked reward method

Both the methods have a similar policy and value network architecture but the ranked reward method uses MCTS.

Installation

git clone https://github.com/shiveshkhaitan/ranked_reward_rl
cd ranked_reward_rl
conda env create -f environment.yml
conda activate ranked_reward

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
PPO		PPO
ranked_reward		ranked_reward
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ranked Reward Reinforcement Learning (Bin Packing Problem)

Installation

Example placement using Ranked Reward

About

Releases

Packages

Languages

shiveshkhaitan/ranked_reward_rl

Folders and files

Latest commit

History

Repository files navigation

Ranked Reward Reinforcement Learning (Bin Packing Problem)

Installation

Example placement using Ranked Reward

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages