Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results of inference #43

Open
Rts-Wu opened this issue Oct 24, 2024 · 1 comment
Open

Results of inference #43

Rts-Wu opened this issue Oct 24, 2024 · 1 comment

Comments

@Rts-Wu
Copy link

Rts-Wu commented Oct 24, 2024

This is my settings:
if name == "main":
pretrained_model_path = '/root/autodl-tmp/StoryGen/checkpoint_StorySalon/'
logdir = "./inference_StorySalon/"
num_inference_steps = 40
guidance_scale = 7
image_guidance_scale = 3.5
num_sample_per_prompt = 10
mixed_precision = "fp16"
stage = 'multi-image-condition' # ["multi-image-condition", "auto-regressive", "no"]

prompt = "The cat is running after the mouse"
prev_p = ["The cat" , "The mouse"]
ref_image = ["./Tom.png" , ",/Jerry.png"]

But the result is a bit ridiculous, images generated combining the cat and mouse as a new character like these:

image

and I also tried the ckpt from COCO(using inference,py, only changed pretrained_model_path = '/root/autodl-tmp/StoryGen/checkpoint_COCO/'):

image
Could you give some advice?

@Rts-Wu
Copy link
Author

Rts-Wu commented Oct 24, 2024

Additionally, I have some other questions:
1.When I using auto regressive and the image and prompt in the paper, the result is not great enough.
In my understanding, since the coherence of the images is maintained through the ref-image, it should be possible to generate images based on the current description. Why is the previous context still needed? What role does it play?

2.I observed that the coherence of the character image in the first frame generated from the ref-image is not as excellent as described in the paper, but it is still acceptable. I tried the example from the paper: "White cat," where the prev_p used was "A White Cat."
image

However, the coherence of the character image in the subsequent second frame seems to be lacking, possibly due to the first frame's context.

image
image

Can I keep prev_p as "A White Cat"? What drawbacks might this approach have?

3.During the inference process, there was a small error: "The config attributes {'center_input_sample': False} were passed to UNet2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file." Could this be a reason for the less than optimal results? I have directly used checkpoints you provided but results is not good enough, I have no ideas about that.

@Rts-Wu Rts-Wu changed the title About "multi-image-condition" Results of inference Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant