Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia 4.1 inference code is giving segmentation fault for RTX 4090 (4.0 code works fine) #1847

Open
arjunsuresh opened this issue Sep 12, 2024 · 6 comments

Comments

@arjunsuresh
Copy link
Contributor

Trying to run Nvidia v4.1 implementation for stable diffusion on RTX 4090.

(mlperf) arjun@mlperf-inference-arjun-x86-64-24944:/work$ make generate_engines RUN_ARGS="--benchmarks=stable-diffusion-xl --scenarios=Offline"
make: *** [Makefile:37: generate_engines] Segmentation fault (core dumped)
make download_model BENCHMARKS="stable-diffusion-xl"

ran successfully and produced int8 model. Below are the custom configs used for 2x RTX4090.

class SPR(OfflineGPUBaseConfig):
    system = KnownSystem.spr

    # Applicable fields for this benchmark are listed below. Not all of these are necessary, and some may be defined in the BaseConfig already and inherited.
    # Please see NVIDIA's submission config files for example values and which fields to keep.
    # Required fields (Must be set or inherited to run):
    gpu_batch_size = {'clip1': 32 * 2, 'clip2': 32 * 2, 'unet': 32 * 2, 'vae': 1}
    offline_expected_qps: float = 1.0
    precision: str = 'int8'
@Oseltamivir
Copy link
Contributor

Hi,

May I ask if it is possible to test this on fp32 model?

Specifically, change precision: str = 'fp16' in the custom config and change SDXLVAEBuilder in code/stable-diffusion-xl/tensorrt/builder.py to the following:

class SDXLVAEBuilder(SDXLBaseBuilder,
                     ArgDiscarder):
    """SDXL VAE builder class.
    """

    def __init__(self,
                 *args,
                 component_name: str,
                 batch_size: int,
                 model_path: PathLike,
                 **kwargs):
        vae_precision = 'fp32'
        vae_path = model_path + "onnx_models/vae/model.onnx"
        strongly_typed = False
        super().__init__(*args,
                         model=VAE(name=component_name, max_batch_size=batch_size, precision=vae_precision, device='cuda'),
                         model_path=vae_path,
                         batch_size=batch_size,
                         strongly_typed=strongly_typed,
                         use_native_instance_norm=True,
                         **kwargs)

then make generate_engines RUN_ARGS="--benchmarks=stable-diffusion-xl --scenarios=Offline"

I was unable to build the int8 models on RTX3090 but fp16/32 models worked and got them running.

@arjunsuresh
Copy link
Contributor Author

That's great. But the same code didn't work for me - still getting segmentation fault for fp16/fp32.

@Oseltamivir
Copy link
Contributor

That's strange.... personally i think the error is due to the model but being quantized/exported properly or the data was not properly preprocessed.

If you still have the commands history, is it possible for you to send all make command here? Thanks

@arjunsuresh
Copy link
Contributor Author

@Oseltamivir For me, the below line is the culprit. Commenting it out makes it work for me - though I haven't checked everything. I'll also try on different systems.

https://github.com/mlcommons/inference_results_v4.1/blob/main/closed/NVIDIA/code/actionhandler/calibrate.py#L24

@Oseltamivir
Copy link
Contributor

In that case it might be a issue caused by mitten.

But the maintainers don't seem to respond to issues posted in the mitten repo. I opened an issue there 3 months ago but got no reply. I ended up having to email Yiheng to ask about their implementation of mitten.

@arjunsuresh
Copy link
Contributor Author

yes, it is. We'll come back to it as currently we are using Nvidia v4.0 code to collect the inference results via github actions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants