Nvidia 4.1 inference code is giving segmentation fault for RTX 4090 (4.0 code works fine) #1847

arjunsuresh · 2024-09-12T11:59:46Z

Trying to run Nvidia v4.1 implementation for stable diffusion on RTX 4090.

(mlperf) arjun@mlperf-inference-arjun-x86-64-24944:/work$ make generate_engines RUN_ARGS="--benchmarks=stable-diffusion-xl --scenarios=Offline"
make: *** [Makefile:37: generate_engines] Segmentation fault (core dumped)

make download_model BENCHMARKS="stable-diffusion-xl"

ran successfully and produced int8 model. Below are the custom configs used for 2x RTX4090.

class SPR(OfflineGPUBaseConfig):
    system = KnownSystem.spr

    # Applicable fields for this benchmark are listed below. Not all of these are necessary, and some may be defined in the BaseConfig already and inherited.
    # Please see NVIDIA's submission config files for example values and which fields to keep.
    # Required fields (Must be set or inherited to run):
    gpu_batch_size = {'clip1': 32 * 2, 'clip2': 32 * 2, 'unet': 32 * 2, 'vae': 1}
    offline_expected_qps: float = 1.0
    precision: str = 'int8'

The text was updated successfully, but these errors were encountered:

Oseltamivir · 2024-10-28T19:02:35Z

Hi,

May I ask if it is possible to test this on fp32 model?

Specifically, change precision: str = 'fp16' in the custom config and change SDXLVAEBuilder in code/stable-diffusion-xl/tensorrt/builder.py to the following:

class SDXLVAEBuilder(SDXLBaseBuilder,
                     ArgDiscarder):
    """SDXL VAE builder class.
    """

    def __init__(self,
                 *args,
                 component_name: str,
                 batch_size: int,
                 model_path: PathLike,
                 **kwargs):
        vae_precision = 'fp32'
        vae_path = model_path + "onnx_models/vae/model.onnx"
        strongly_typed = False
        super().__init__(*args,
                         model=VAE(name=component_name, max_batch_size=batch_size, precision=vae_precision, device='cuda'),
                         model_path=vae_path,
                         batch_size=batch_size,
                         strongly_typed=strongly_typed,
                         use_native_instance_norm=True,
                         **kwargs)

then make generate_engines RUN_ARGS="--benchmarks=stable-diffusion-xl --scenarios=Offline"

I was unable to build the int8 models on RTX3090 but fp16/32 models worked and got them running.

arjunsuresh · 2024-10-28T22:25:02Z

That's great. But the same code didn't work for me - still getting segmentation fault for fp16/fp32.

Oseltamivir · 2024-10-29T02:47:02Z

That's strange.... personally i think the error is due to the model but being quantized/exported properly or the data was not properly preprocessed.

If you still have the commands history, is it possible for you to send all make command here? Thanks

arjunsuresh · 2024-10-29T10:33:45Z

@Oseltamivir For me, the below line is the culprit. Commenting it out makes it work for me - though I haven't checked everything. I'll also try on different systems.

https://github.com/mlcommons/inference_results_v4.1/blob/main/closed/NVIDIA/code/actionhandler/calibrate.py#L24

Oseltamivir · 2024-10-29T11:04:56Z

In that case it might be a issue caused by mitten.

But the maintainers don't seem to respond to issues posted in the mitten repo. I opened an issue there 3 months ago but got no reply. I ended up having to email Yiheng to ask about their implementation of mitten.

arjunsuresh · 2024-10-29T11:39:14Z

yes, it is. We'll come back to it as currently we are using Nvidia v4.0 code to collect the inference results via github actions.

arjunsuresh mentioned this issue Oct 27, 2024

Add Nvidia 4.1 inference code support mlcommons/cm4mlops#204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia 4.1 inference code is giving segmentation fault for RTX 4090 (4.0 code works fine) #1847

Nvidia 4.1 inference code is giving segmentation fault for RTX 4090 (4.0 code works fine) #1847

arjunsuresh commented Sep 12, 2024

Oseltamivir commented Oct 28, 2024

arjunsuresh commented Oct 28, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

Nvidia 4.1 inference code is giving segmentation fault for RTX 4090 (4.0 code works fine) #1847

Nvidia 4.1 inference code is giving segmentation fault for RTX 4090 (4.0 code works fine) #1847

Comments

arjunsuresh commented Sep 12, 2024

Oseltamivir commented Oct 28, 2024

arjunsuresh commented Oct 28, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024

Oseltamivir commented Oct 29, 2024

arjunsuresh commented Oct 29, 2024