Human-robot interaction has long been an issue in reinforcement learning with respect to the learning of rewards, based on what the user wants the robot to do. In such a scenario, preference learning phenomenon comes up with a solution that is based on learning from human preferences rather than reward signals.
One such library, called APReL is proposed and replicated in this paper that contains various active preference-based reward learning techniques. A classic control environment of “Mountain Car” is used along with its features, whose values are used to identify the minimum and maximum position of the car, along with its velocity.
Preference learning is used in this environment to make the car reach its goal state at the top of the right hill. A number of trajectories are generated during training, which in turn asked for human preferences based on the number of queries and trajectories provided. The algorithm contains different parameters that control the features of the car which have produced the results accordingly.
This analysis would fit in similar kinds of robotic applications where the objective is to control and learn from human preferences. In future, these techniques can be extended to other learning methods such as neural networks and gradient-based algorithms.