Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Text Prompts To Multi-Modal Observation #186

Open
Bonifatius94 opened this issue Aug 30, 2024 · 0 comments
Open

Add Text Prompts To Multi-Modal Observation #186

Bonifatius94 opened this issue Aug 30, 2024 · 0 comments

Comments

@Bonifatius94
Copy link

The main source of information for human players is reading the text prompts. I'm pretty sure the agent is capable of almost playing perfectly once he understands the text prompts and connects them with a proper world model of the game.

Frame stacking 3 frames doesn't suffice, although it's a nice starting point. I've used similar approaches to train self-driving robots. It's quite effective to train motion control tasks.

Is there any popular research on multi-modal reinforcement learning state representation? It would require to mix video and text perception into a latent state of some kind of RNN like in Dreamer.

I know this issue is super open-ended, but the resulting agent would be very powerful. Maybe it's worth exploring and Pokémon is the right training environment to try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant