This includes the original implementation of When to Trust LLMs: Aligning Confidence with Response Quality (ACL 2024) by Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, Jinyang Gao, Huawei Shen, and Bolin Ding.
CONQORD is an innovative framework aimed at enhancing the reliability of large language models (LLMs) in natural language generation by aligning confidence with response accuracy.
LLMs frequently produce incorrect or nonsensical outputs. Traditional methods typically depend on confidence levels to indicate reliability. In contrast, CONQORD employs reinforcement learning driven by a unique dual-component reward function that combines quality rewards with order-preserving alignment rewards. This approach encourages the model to express higher confidence for more accurate responses, effectively synchronizing confidence levels with accuracy and improving the overall reliability of generated text. Furthermore, the aligned confidence provided by CONQORD informs when to trust LLMs, and acts as a determinant for initiating the retrieval process of external knowledge.
python>=3.9
torch>=1.13.0+cu117
tensorboardX>=2.6.2.2
transformers>=4.28.1
deepspeed>=0.14.1
huggingface_hub>=0.22.1
numpy>=1.24.3
scipy>=1.12.0
pandas>=2.1.4
The training dataset in the paper released at https://huggingface.co/datasets/hooope/CONQORD_dataset/tree/main.
The evaluation datasets are TruthfulQA and Natural Quesions.
Please run "CONQORD/step1_supervised_finetuning_LM/run_step1.sh"
Step 1.1: Downloading dataset from https://huggingface.co/datasets/hooope/CONQORD_datasets/conqord_step1_data, and save them to ../datasets/conqord_step1_data/
Step 1.2: Create log, checkpoint, tensorboard folders
Step 1.3: Run main.py in step1
export CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --master_port 13001 main.py \
--data_path ../datasets/Dahoas/rm-static_conf_both/ \
--data_split 10,0,0 \
--model_name_or_path ../model_pth/llama2_hf_7b/ \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--data_output_path ../datasets/datatmp/ \
--max_seq_len 512 \
--learning_rate 1e-4 \
--weight_decay 0. \
--num_train_epochs 3 \
--gradient_accumulation_steps 64 \
--lr_scheduler_type cosine \
--num_warmup_steps 5 \
--seed 1234 \
--gradient_checkpointing \
--zero_stage 3 \
--deepspeed \
--lora_dim 128 \
--lora_module_name "layers." \
--output_dir checkpoint/step1 \
--print_loss \
--enable_tensorboard \
--tensorboard_path tensorboard/step1 \
&> log/step1.log 2>&1 &
Please run "CONQORD/step2_reward_model/run_step2.sh"
Step 2.1: Downloading dataset from https://huggingface.co/datasets/hooope/CONQORD_datasets/conqord_step2_data, and save them to ../datasets/conqord_step2_data/
Step 2.2: Run main.py in step2
export CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --master_port 23001 main.py \
--data_path ../datasets/Anthropic/hh-rlhf/helpful-base \
--data_split 0,10,0 \
--model_name_or_path ../model_pth/llama2_hf_7b/ \
--data_output_path ../datasets/datatmp/ \
--per_device_train_batch_size 32 \
--per_device_eval_batch_size 32 \
--max_seq_len 512 \
--learning_rate 5e-5 \
--weight_decay 0.1 \
--num_padding_at_beginning 0 \
--num_train_epochs 10 \
--gradient_accumulation_steps 16 \
--lr_scheduler_type cosine \
--num_warmup_steps 0 \
--seed 1234 \
--gradient_checkpointing \
--zero_stage 3 \
--deepspeed \
--offload \
--lora_dim 128 \
--lora_module_name "layers." \
--output_dir checkpoint/step2 \
--enable_tensorboard \
--tensorboard_path tensorboard/step2 \
&> log/step2.log 2>&1 &
Please run "CONQORD/step3_RL_finetune_LLM/run_step3.sh"
Step 3.1: Prepared the actor model in step1 and the critic model in step2.
Step 3.2: Run main.py in step3
export CUDA_VISIBLE_DEVICES=0,1,2,3 nohup deepspeed --master_port 33001 main.py \
--data_path ../datasets/Anthropic/hh-rlhf/helpful-base_conf_half_sharp \
--data_split 0,0,10 \
--actor_model_name_or_path ../step1_supervised_finetuning_LM/checkpoint/step1/final \
--tokenizer_model_name_or_path ../model_pth/llama2_all_hf/llama2_hf_7b/ \
--critic_model_name_or_path ../step2_reward_model_finetuning/checkpoint/step2/ep3/ \
--num_padding_at_beginning 1 \
--per_device_generation_batch_size 1 \
--per_device_training_batch_size 1 \
--generation_batches 1 \
--ppo_epochs 1 \
--max_answer_seq_len 256 \
--max_prompt_seq_len 256 \
--actor_learning_rate 1e-6 \
--critic_learning_rate 1e-6 \
--actor_weight_decay 0.1 \
--critic_weight_decay 0.1 \
--num_train_epochs 1 \
--lr_scheduler_type constant \
--gradient_accumulation_steps 32 \
--actor_gradient_checkpointing \
--critic_gradient_checkpointing \
--offload_reference_model \
--disable_actor_dropout \
--num_warmup_steps 0 \
--deepspeed --seed 1234 \
--actor_zero_stage 3 \
--critic_zero_stage 3 \
--enable_hybrid_engine \
--actor_lora_dim 64 \
--critic_lora_dim 64 \
--critic_lora_module_name "layers." \
--actor_lora_module_name "layers." \
--output_dir checkpoint/step3 \
--enable_tensorboard \
--tensorboard_path tensorboard/step3 \
&> log/step3.log 2>&1 &
Please run "CONQORD/step4_evaluation/run_step4.sh"
Step 4.1: CONQORD Inference
nohup python -u test_conqord.py --data_name truthful_qa \
--mode llama2_7b \
--suffix conqord_llama2_nq \
--path ../step3_rlhf_finetuning/checkpoint/step3_RL_finetune_LLM/ep1/step30/actor \
--gpu 2 > ./log/truthful_qa/conqord_llama2.log 2>&1 &
Step 4.2: Evaluating performance for CONQORD
nohup python -u gpt_evaluation.py --data_name truthful_qa --suffix conqord_llama2 --mode llama2_7b --gpu -1 > ./log/truthful_qa/conqord_llama2.log 2>&1 &
If you have questions, please open an issue mentioning @ or send an email to shuchangtao5@gmail.com
If you find our code, models, or the paper useful, please cite the paper:
@inproceedings{Tao2024,
author = {Shuchang Tao and
Liuyi Yao and
Hanxing Ding and
Yuexiang Xie and
Qi Cao and
Fei Sun and
Jinyang Gao and
Huawei Shen and
Bolin Ding},
title = {When to Trust LLMs: Aligning Confidence with Response Quality},
journal = {CoRR},
volume = {abs/2404.17287},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2404.17287},
doi = {10.48550/ARXIV.2404.17287},
eprinttype = {arXiv},
eprint = {2404.17287},
timestamp = {Mon, 03 Jun 2024 20:48:03 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2404-17287.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}