You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Earlier this year I was impressed with the offloading performance of FlexGen, and I wonder how it would compare with the performance currently provided by llama.cpp for Llama and Llama-2 models in a CPU offloading scenario.
We are pushing a refactoring of the current implementation to support most HF models, we will release that soon under a fork of this repo and will keep you informed.
Earlier this year I was impressed with the offloading performance of FlexGen, and I wonder how it would compare with the performance currently provided by llama.cpp for Llama and Llama-2 models in a CPU offloading scenario.
Any chance Llama support could be added to FlexGen @Ying1123 @keroro824?
The text was updated successfully, but these errors were encountered: