Near-term roadmap #164

jan-wassenberg · 2024-04-26T05:50:13Z

We're sharing a roadmap of ideas for improving and speeding up Gemma. If you'd like to join in and help us get there faster, please reach out so we can coordinate :)

Threading

(jan-wassenberg) detect: total #logical, per-logical: package, chiplet, core, smt
(jan-wassenberg) detect: CPU name, L2D/L3 size
(Z.A.) CCX-aware pinning - ready, awaiting Highway 1.2 release
(jan-wassenberg) more efficient ThreadPool
command line arg to disable pinning
detect NUMA

[x] Dot product

Add _mm*_dpbf16_ps to HWY_AVX3_SPR and HWY_AVX3_ZEN4 targets, plus define HWY_NATIVE_DOT_BF16 in set_macros-inl.h
Faster SFP decode via table lookup
Add new NEON_* target that uses vbfdot for ReorderWidenMulAccumulate
If !defined(HWY_NATIVE_DOT_BF16) || !HWY_NATIVE_DOT_BF16, decompress bf16->f32 to temp array before MatVec (idea by Samuel, thank you!) - in Factor out deinterleaving of bf16 vectors for MatVecs. #166
Apply even/odd trick to SFP

Matmul

Compression

(pculliton, A.R.) Eval infrastructure
(A.R.) Arbiter model for eval
(Ray) add metadata to tensors, remove RawWeights
add TOC to BlobStore
decide whether NUQ is launchable

Optimizations

Replace attention matVec with matmul - requires reshaping a matrix
Convert f32 activations to bf16 beforehand if HWY_NATIVE_DOT_BF16
Integrate wraparound support into matmul
Fuse softmax and sampling
Vectorize RoPE
Faster/more accurate hwy/contrib/math functions by updating the polynomials
Vectorize RMSNorm
(A.R.?, ...) Smaller KVCache: bf16, possibly reorder for better locality

Usability

warn if unknown arguments given. std::map of known arg names?
multiple .cc files to speed up builds
Actionable error codes as return values: kLoadFailed, kSeqTooShort
move eval/test files to tests/
Ctrl+C signal handler to ensure profiler results are printed without requiring %q input
add --prompt flag to run.cc
random prompt generation for debug_prompt.cc

File format

store ModelInfo in weights BlobStore
store tensor info in BlobStore
store tokenizer in BlobStore

New models

(Daniel) Support PaliGemma
Split Model into ModelFamily and ModelSize

[x] General infra

(pculliton) Python wrapper
(pculliton, ...) Improved CI: run on Kaggle infra
AuxOut to hold timing info instead of printing in GenerateImpl.
Sampling struct holds rng and temperature, to reduce length of args
(P. C.) use new HWY_EXPORT_T to simplify dispatch - ready, awaiting Highway 1.2 release

The text was updated successfully, but these errors were encountered:

jan-wassenberg · 2024-05-03T13:45:38Z

Making good progress :)

MathiasSchindler · 2024-05-14T18:43:15Z

Is Paligemma part of the scope of gemma.cpp?

jan-wassenberg · 2024-05-15T07:46:08Z

Let's discuss in #185 :)

tilakrayal added the announcement Announcing the RoadMap label Apr 29, 2024

KumarGitesh2024 assigned jan-wassenberg Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Near-term roadmap #164

Near-term roadmap #164

jan-wassenberg commented Apr 26, 2024 •

edited

Loading

jan-wassenberg commented May 3, 2024

MathiasSchindler commented May 14, 2024

jan-wassenberg commented May 15, 2024

Near-term roadmap #164

Near-term roadmap #164

Comments

jan-wassenberg commented Apr 26, 2024 • edited Loading

Threading

[x] Dot product

Matmul

Compression

Optimizations

Usability

File format

New models

[x] General infra

jan-wassenberg commented May 3, 2024

MathiasSchindler commented May 14, 2024

jan-wassenberg commented May 15, 2024

jan-wassenberg commented Apr 26, 2024 •

edited

Loading