Currently using the Cozmo robot from anki as an RL agent train to maximise rewards for natural language tasks like "go to the green cup".
Currently trying actor critic from pytorch-examples, ikostrikov's a3c and Devendra Chaplot's Gated Attention A3C on training cozmo to look at different objects e.g. cups and bowls. This is a form of language grounding which enables machines to ground themselves in some form of language.
Many plans in the future to extend this out to more difficult tasks and sim2real.
Videos and a tweet with more detail here: tweet