v0.1.1
TPU first release, allowing to have TPU Text Generation Inference and Inference Endpoints container images available.
What's Changed
- Basic TGI server on XLA by @tengomucho in #1
- Enable CI/CD by @tengomucho in #2
- Fix TGI Dockerfile by @shub-kris in #3
- Add static KV cache and test on Gemma-2B by @tengomucho in #4
- Small optimizations by @tengomucho in #5
- Enable compilation by @tengomucho in #6
- Revert "fix: attention mask should be 1 or 0" by @tengomucho in #8
- feat: use dynamic batching when generating by @tengomucho in #9
- Repo layout by @tengomucho in #10
- Add PyPI release workflow by @regisss in #11
- Xla parallel proxy by @tengomucho in #12
- Add documentation to the repository by @mfuntowicz in #13
- Adopt naming convention of transformers API by @mfuntowicz in #14
- Fix main doc build workflow by @regisss in #15
- Improve readme by @mfuntowicz in #16
- Fix layout in README by @mfuntowicz in #17
- Fix rule and instructions for TGI by @mfuntowicz in #18
- Fix typo in index.mdx by @mfuntowicz in #19
- Added some links to Cloud TPU documentation by @mikegre-google in #20
- Parallel sharding by @tengomucho in #21
- Bump version to 0.1.0.dev1 by @mfuntowicz in #24
- Bump version to 0.1.0.dev2 by @mfuntowicz in #25
- Fix TGI missing import by @mfuntowicz in #27
- Forward arguments from TGI launcher to the model by @mfuntowicz in #28
- Fix optimum-tpu pip install instructions by @mfuntowicz in #29
- Fix tests with do_sample=True by @tengomucho in #30
- Sharding in tgi by @tengomucho in #31
- Fix missing '=' to assign environment variables in the default case w… by @mfuntowicz in #33
- Include two different stages for building TGI image: by @mfuntowicz in #34
- Llama support by @tengomucho in #32
- chore(ci): added workflow for nightly tests by @tengomucho in #35
- fix(build): setup.py removed from build_dist dependencies by @tengomucho in #36
- Try again to fix nightly builds by @tengomucho in #37
- Basic Llama2 Tuning by @tengomucho in #39
- Bug doc builder by @pagezyhf in #40
- Fix typo ; Update llama_tuning.md by @furkanakkurt1335 in #42
- Update to Pytorch 2.3.0 and transformers v4.40.2 by @tengomucho in #41
- Fine tuning with FSDP v2 by @tengomucho in #44
- Minor fix for mispelled stage in TGI dockerfile. by @thealmightygrant in #46
- Align to Transformers 4.41.1 by @tengomucho in #45
- chore(training): Allow training on torch xla > 2.3.0, add warning by @tengomucho in #48
- fix(build): add missing setuptools_scm section by @tengomucho in #49
- fix(logging): correct logging usage by @tengomucho in #50
- fix(tests): fix decode sample expected outputs again by @tengomucho in #52
- fix(doc): update server and port when serving TGI by @tengomucho in #53
- fix(ci): correct secrets leak workflow check by @tengomucho in #55
- Add Mistral support 💨 by @tengomucho in #54
- Mistral nits by @tengomucho in #57
- chore: bump to version v0.1.0a1 by @tengomucho in #60
- feat(TGI): add release docker image build and push to registry workflow by @tengomucho in #62
- chore: bump to version v0.1.1 by @tengomucho in #63
New Contributors
- @tengomucho made their first contribution in #1
- @shub-kris made their first contribution in #3
- @regisss made their first contribution in #11
- @mfuntowicz made their first contribution in #13
- @mikegre-google made their first contribution in #20
- @pagezyhf made their first contribution in #40
- @furkanakkurt1335 made their first contribution in #42
- @thealmightygrant made their first contribution in #46
Full Changelog: https://github.com/huggingface/optimum-tpu/commits/v0.1.1