Releases: alibaba/TinyNeuralNetwork
Releases · alibaba/TinyNeuralNetwork
Annoucing easyquant for speeding up LLM inference via quantization
With the help of quantization, we could achieve LLM inference efficiently with lower resource usage. Please install the package below and try out the examples here. We look forward to your feedback.