can this support lower bit quant? #11

vince62s · 2024-01-30T08:17:35Z

3-bit ?
2-bit ?

ChenMnZ · 2024-02-19T09:49:59Z

I am also curious about this.

efrantar · 2024-02-20T09:34:08Z

Hi,

currently Marlin supports only a limited set of quantization options (4bit + groupsize 128), selected for a good accuracy/speed trade-off, but therefore at very close to peak efficiency in many cases, including larger batchsizes.

That being said, Marlin can definitively be a good starting point for developing highly efficient kernels for other bitwidths or quantization schemes.

nivibilla · 2024-03-02T18:04:21Z

How can one go about making it work for 8bit gptq?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can this support lower bit quant? #11

can this support lower bit quant? #11

vince62s commented Jan 30, 2024

ChenMnZ commented Feb 19, 2024

efrantar commented Feb 20, 2024

nivibilla commented Mar 2, 2024

can this support lower bit quant? #11

can this support lower bit quant? #11

Comments

vince62s commented Jan 30, 2024

ChenMnZ commented Feb 19, 2024

efrantar commented Feb 20, 2024

nivibilla commented Mar 2, 2024