rewardbench
CLI can be run on any instruction dataset with fancy logging of scores.
This makes it so rewardbench
can be used to quickly throw together a rejection sampling pipeline once give generations.
Specifically, I think this type of logging is really great for evaluation. It’s something wandb does for training, but when using the CLI, you pass one arg that will save:
- All the scores, input text, etc to HuggingFace
- The command used to launch the eval
- The current python env for reproducibility
Examples are in the readme: https://github.com/allenai/reward-bench?tab=readme-ov-file#logging
What's Changed
- Clean, minor fixes, and release 0.1.2 by @natolambert in #139
- Fix DPO prompts by @natolambert in #142
- New super secret models by @natolambert in #141
- Minor fixes, new dockerfile, new models by @natolambert in #144
- Fix llama3 quantization for DPO models by @natolambert in #145
- Fix small bugs by @natolambert in #148
- Add GRM classes by @YangRui2015 in #151
- New models + dockerfile by @natolambert in #152
- Add Claude 3.5 Sonnet by @natolambert in #153
- fix padding for GRM class by @YangRui2015 in #154
- Add bfloat16 support natively by @natolambert in #155
- Add generative models by @natolambert in #156
- Add InternLM2 RMs by @natolambert in #157
- Bump generative models by @natolambert in #160
- added offsetbias execute prompt and judgement process code by @sanghyuk-choi in #159
- small gen pr by @natolambert in #161
- Bos fix by @natolambert in #166
- Add automatic Beaker Images by @natolambert in #167
- Small bumps by @natolambert in #168
- Add attn_implementation support by @chrisliu298 in #170
- Fixes in run_generative, new models by @natolambert in #171
- fix vllm version by @natolambert in #172
- Delete training by @natolambert in #174
- Mirror change from leaderboard by @natolambert in #175
- Add models by @natolambert in #179
- Add o1 and other model by @natolambert in #181
- Support loading model from wandb by @vwxyzjn in #184
- add_con-j_support_code by @YeZiyi1998 in #183
- Bump requirements and generative improvements by @natolambert in #190
- Support upload metadata to hf by @vwxyzjn in #188
- Bump Cuda version by @natolambert in #191
- Typo and VLLM generalization by @natolambert in #192
- Add better logging and functionality with instructions to CLI by @natolambert in #193
- Tweak ArmorRM implementation, add args to CLI by @natolambert in #194
New Contributors
- @YangRui2015 made their first contribution in #151
- @sanghyuk-choi made their first contribution in #159
- @chrisliu298 made their first contribution in #170
- @vwxyzjn made their first contribution in #184
- @YeZiyi1998 made their first contribution in #183
Full Changelog: v0.1.2...v0.1.3