You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main source of information for human players is reading the text prompts. I'm pretty sure the agent is capable of almost playing perfectly once he understands the text prompts and connects them with a proper world model of the game.
Frame stacking 3 frames doesn't suffice, although it's a nice starting point. I've used similar approaches to train self-driving robots. It's quite effective to train motion control tasks.
Is there any popular research on multi-modal reinforcement learning state representation? It would require to mix video and text perception into a latent state of some kind of RNN like in Dreamer.
I know this issue is super open-ended, but the resulting agent would be very powerful. Maybe it's worth exploring and Pokémon is the right training environment to try.
The text was updated successfully, but these errors were encountered:
The main source of information for human players is reading the text prompts. I'm pretty sure the agent is capable of almost playing perfectly once he understands the text prompts and connects them with a proper world model of the game.
Frame stacking 3 frames doesn't suffice, although it's a nice starting point. I've used similar approaches to train self-driving robots. It's quite effective to train motion control tasks.
Is there any popular research on multi-modal reinforcement learning state representation? It would require to mix video and text perception into a latent state of some kind of RNN like in Dreamer.
I know this issue is super open-ended, but the resulting agent would be very powerful. Maybe it's worth exploring and Pokémon is the right training environment to try.
The text was updated successfully, but these errors were encountered: