You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 1, 2024. It is now read-only.
Add a batch dimension to all operations in ModelEnv.evaluate_action_sequences so that it can find rewards for several trajectories in parallel, each starting at a different observation.
This issue depends on #131, so if that one is open, it'd be better to start with that one first.
Motivation
The current implementation of ModelEnv.evaluate_action_sequences only supports one observation at a time, which means that it's not possible to evaluate several environments in parallel, wasting GPU parallelization of the model rollouts.
Pitch
Add batch support (in term of possible initial states) to all operations involved in ModelEnv.evaluate_action_sequences, and aggregate particles correctly, so that the return is a set of total rewards batched for each initial observation.
The text was updated successfully, but these errors were encountered:
🚀 Feature Request
Add a batch dimension to all operations in
ModelEnv.evaluate_action_sequences
so that it can find rewards for several trajectories in parallel, each starting at a different observation.This issue depends on #131, so if that one is open, it'd be better to start with that one first.
Motivation
The current implementation of
ModelEnv.evaluate_action_sequences
only supports one observation at a time, which means that it's not possible to evaluate several environments in parallel, wasting GPU parallelization of the model rollouts.Pitch
Add batch support (in term of possible initial states) to all operations involved in
ModelEnv.evaluate_action_sequences
, and aggregate particles correctly, so that the return is a set of total rewards batched for each initial observation.The text was updated successfully, but these errors were encountered: