v0.1.3

tengomucho released this 09 Jul 10:31

· 29 commits to main since this release

Cleanup of previous fixed and lower batch size to prevent memory issues on Inference Endpoints with some models.

What's Changed

Few more Inference Endpoints fixes by @tengomucho in #69
feat(cache): use optimized StaticCache class for XLA by @tengomucho in #70
Lower TGI IE batch size by @tengomucho in #71

Full Changelog: v0.1.2...v0.1.3

Contributors

tengomucho

Assets 2