Turing support #4

Dampfinchen · 2024-01-20T10:57:20Z

Why is Ampere or Ada (RTX 3000 and RTX 4000 series) required to support this?

Turing (RTX 2000 series) has INT4 tensor cores.

efrantar · 2024-01-21T19:05:30Z

Hi, Marlin does not use any INT4 tensor cores, 4-bit weights are decompressed on-the-fly and then the actual computation is carried out in FP16. The reason Turning is not support is that Marlin heavily relies on the cp.async instruction which was introduced with compute capability 8.0; this allows explicitly fetching global memory in the background while doing other work at the same time, which is crucial to reach peak performance in an FP16xINT4 setting. While you could probably reuse quite some work of Marlin for writing a Turing kernel, some significant changes will likely be necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turing support #4

Turing support #4

Dampfinchen commented Jan 20, 2024 •

edited

Loading

efrantar commented Jan 21, 2024

Turing support #4

Turing support #4

Comments

Dampfinchen commented Jan 20, 2024 • edited Loading

efrantar commented Jan 21, 2024

Dampfinchen commented Jan 20, 2024 •

edited

Loading