Release v0.1.3 -- Tons of CLI logging improvements! · allenai/reward-bench

rewardbench CLI can be run on any instruction dataset with fancy logging of scores.
This makes it so rewardbench can be used to quickly throw together a rejection sampling pipeline once give generations.

Specifically, I think this type of logging is really great for evaluation. It’s something wandb does for training, but when using the CLI, you pass one arg that will save:

All the scores, input text, etc to HuggingFace
The command used to launch the eval
The current python env for reproducibility

Examples are in the readme: https://github.com/allenai/reward-bench?tab=readme-ov-file#logging

What's Changed

Clean, minor fixes, and release 0.1.2 by @natolambert in #139
Fix DPO prompts by @natolambert in #142
New super secret models by @natolambert in #141
Minor fixes, new dockerfile, new models by @natolambert in #144
Fix llama3 quantization for DPO models by @natolambert in #145
Fix small bugs by @natolambert in #148
Add GRM classes by @YangRui2015 in #151
New models + dockerfile by @natolambert in #152
Add Claude 3.5 Sonnet by @natolambert in #153
fix padding for GRM class by @YangRui2015 in #154
Add bfloat16 support natively by @natolambert in #155
Add generative models by @natolambert in #156
Add InternLM2 RMs by @natolambert in #157
Bump generative models by @natolambert in #160
added offsetbias execute prompt and judgement process code by @sanghyuk-choi in #159
small gen pr by @natolambert in #161
Bos fix by @natolambert in #166
Add automatic Beaker Images by @natolambert in #167
Small bumps by @natolambert in #168
Add attn_implementation support by @chrisliu298 in #170
Fixes in run_generative, new models by @natolambert in #171
fix vllm version by @natolambert in #172
Delete training by @natolambert in #174
Mirror change from leaderboard by @natolambert in #175
Add models by @natolambert in #179
Add o1 and other model by @natolambert in #181
Support loading model from wandb by @vwxyzjn in #184
add_con-j_support_code by @YeZiyi1998 in #183
Bump requirements and generative improvements by @natolambert in #190
Support upload metadata to hf by @vwxyzjn in #188
Bump Cuda version by @natolambert in #191
Typo and VLLM generalization by @natolambert in #192
Add better logging and functionality with instructions to CLI by @natolambert in #193
Tweak ArmorRM implementation, add args to CLI by @natolambert in #194

New Contributors

@YangRui2015 made their first contribution in #151
@sanghyuk-choi made their first contribution in #159
@chrisliu298 made their first contribution in #170
@vwxyzjn made their first contribution in #184
@YeZiyi1998 made their first contribution in #183

Full Changelog: v0.1.2...v0.1.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.3 -- Tons of CLI logging improvements!

What's Changed

New Contributors

Contributors