You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're sharing a roadmap of ideas for improving and speeding up Gemma. If you'd like to join in and help us get there faster, please reach out so we can coordinate :)
Threading
(jan-wassenberg) detect: total #logical, per-logical: package, chiplet, core, smt
We're sharing a roadmap of ideas for improving and speeding up Gemma. If you'd like to join in and help us get there faster, please reach out so we can coordinate :)
Threading
[x] Dot product
_mm*_dpbf16_ps
toHWY_AVX3_SPR
andHWY_AVX3_ZEN4
targets, plus defineHWY_NATIVE_DOT_BF16
inset_macros-inl.h
NEON_*
target that usesvbfdot
forReorderWidenMulAccumulate
!defined(HWY_NATIVE_DOT_BF16) || !HWY_NATIVE_DOT_BF16
, decompress bf16->f32 to temp array before MatVec (idea by Samuel, thank you!) - in Factor out deinterleaving of bf16 vectors for MatVecs. #166Matmul
Compression
Optimizations
HWY_NATIVE_DOT_BF16
Usability
File format
New models
[x] General infra
The text was updated successfully, but these errors were encountered: