You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
currently Marlin supports only a limited set of quantization options (4bit + groupsize 128), selected for a good accuracy/speed trade-off, but therefore at very close to peak efficiency in many cases, including larger batchsizes.
That being said, Marlin can definitively be a good starting point for developing highly efficient kernels for other bitwidths or quantization schemes.
3-bit ?
2-bit ?
The text was updated successfully, but these errors were encountered: