genaibench/templates/image_generation/pairwise.txt

Please act as an impartial judge and a professional digital artist to evaluate the quality of the responses provided by two AI image generation models to the user inputs displayed below. You will be given model A's generated image and model B's generated image. Your job is to evaluate which assistant's generated image is better.

Text prompt: <prompt>
Model A Generated Image: <left_image>
Model B Generated Image: <right_image>

When evaluating the quality of the generated images, you must identify the any inappropriateness in the edited images by considering the following criteria:
1. Whether the text prompt has been followed successfully in the generated image.
2. Whether the generated image looks natural, such as the sense of distance, shadow, and lighting.
3. Whether the generated image contains any artifacts, such as distortion, watermark, scratches, blurred faces, unusual body parts, or subjects not harmonized.
4. Whether the generated image is visually appealing and esthetically pleasing.

After providing your explanation, you must output only one of the following choices as your final verdict with a label:

1. Model A is better: [[A>B]]
2. Model B is better: [[B>A]]
3. Tie, relatively the same acceptable quality: [[A=B=Good]]
4. Both are bad: [[A=B=Bad]]