-
-

ProgressGym: Dataset & Experimental Framework for Progress-Aware Alignment

+
+

ProgressGym: Dataset & Experimental Framework for Progress Alignment

Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale. We introduce progress alignment as a technical solution to mitigate this imminent risk.

-

To empower research in progress alignment, we introduce ProgressGym, an experimental -framework that codifies real-world progress alignment challenges into concrete benchmarks. -Specifically, we introduce three core challenges: tracking evolving values (PG-Follow), -preemptively anticipating moral progress (PG-Predict), and regulating the feedback loop be- -tween human and AI value shifts (PG-Coevolve).

+

To empower research in progress alignment, we introduce ProgressGym, an experimental framework that codifies real-world progress alignment challenges into concrete benchmarks. +Specifically, we introduce three core challenges: tracking evolving values (PG-Follow), preemptively anticipating moral progress (PG-Predict), and regulating the feedback loop between human and AI value shifts (PG-Coevolve).

Aside from example alignment algorithms (Lifelong RLHF, Lifelong DPO, Extrapolative RLHF, Extrapolative DPO), -we provide APIs for user-implemented examinees as well as judges, so that you can evaluate your own algorithm in terms of progress-aware alignment, on your custom tasks.

+we provide APIs for user-implemented examinees as well as judges, so that you can evaluate your own algorithm on your custom progress alignment tasks.

Check out the Implementing Your Own Experiment section for basic usage.

Contents

-

Contents:

-
-
-

Indices and tables

-