v0.1.3
Cleanup of previous fixed and lower batch size to prevent memory issues on Inference Endpoints with some models.
What's Changed
- Few more Inference Endpoints fixes by @tengomucho in #69
- feat(cache): use optimized StaticCache class for XLA by @tengomucho in #70
- Lower TGI IE batch size by @tengomucho in #71
Full Changelog: v0.1.2...v0.1.3