Highlights
- Pro
Pinned Loading
-
flash-attention
flash-attention PublicForked from Dao-AILab/flash-attention
Fast and memory-efficient exact attention
-
optimum-quanto
optimum-quanto PublicForked from huggingface/optimum-quanto
A pytorch quantization backend for optimum
Python
-
marlin-scaled-zero-point
marlin-scaled-zero-point PublicForked from IST-DASLab/marlin
Modified version of Marlin (https://github.com/IST-DASLab/marlin) with scaled zero point as input
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.