Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Near-term roadmap #164

Open
31 of 52 tasks
jan-wassenberg opened this issue Apr 26, 2024 · 3 comments
Open
31 of 52 tasks

Near-term roadmap #164

jan-wassenberg opened this issue Apr 26, 2024 · 3 comments
Assignees
Labels
announcement Announcing the RoadMap

Comments

@jan-wassenberg
Copy link
Member

jan-wassenberg commented Apr 26, 2024

We're sharing a roadmap of ideas for improving and speeding up Gemma. If you'd like to join in and help us get there faster, please reach out so we can coordinate :)

Threading

  • (jan-wassenberg) detect: total #logical, per-logical: package, chiplet, core, smt
  • (jan-wassenberg) detect: CPU name, L2D/L3 size
  • (Z.A.) CCX-aware pinning - ready, awaiting Highway 1.2 release
  • (jan-wassenberg) more efficient ThreadPool
  • command line arg to disable pinning
  • detect NUMA

[x] Dot product

  • Add _mm*_dpbf16_ps to HWY_AVX3_SPR and HWY_AVX3_ZEN4 targets, plus define HWY_NATIVE_DOT_BF16 in set_macros-inl.h
  • Faster SFP decode via table lookup
  • Add new NEON_* target that uses vbfdot for ReorderWidenMulAccumulate
  • If !defined(HWY_NATIVE_DOT_BF16) || !HWY_NATIVE_DOT_BF16, decompress bf16->f32 to temp array before MatVec (idea by Samuel, thank you!) - in Factor out deinterleaving of bf16 vectors for MatVecs. #166
  • Apply even/odd trick to SFP

Matmul

  • (pculliton) implement basic matmul and test. Not using BLAS because we want to fuse matmul and decompression.
  • (pculliton) 4x4 unrolled and vectorized matmul
  • (szabadka, B.B.) Update Prefill to use matmul (activation @ weights) instead of MatVec. Almost there.
  • Fused decompression inside matmul
  • Support offsets within the matrix, required by some call sites
  • (jan-wassenberg) Decompress weights to bf16 when native
  • (jan-wassenberg) Cache-aware tiling/packing
  • (jan-wassenberg) NUMA aware
  • (jan-wassenberg) 64-bit precision
  • (B.B.) Larger batch size
  • (A.V.) Avoid allocations for decompression

Compression

  • (pculliton, A.R.) Eval infrastructure
  • (A.R.) Arbiter model for eval
  • (Ray) add metadata to tensors, remove RawWeights
  • add TOC to BlobStore
  • decide whether NUQ is launchable

Optimizations

  • Replace attention matVec with matmul - requires reshaping a matrix
  • Convert f32 activations to bf16 beforehand if HWY_NATIVE_DOT_BF16
  • Integrate wraparound support into matmul
  • Fuse softmax and sampling
  • Vectorize RoPE
  • Faster/more accurate hwy/contrib/math functions by updating the polynomials
  • Vectorize RMSNorm
  • (A.R.?, ...) Smaller KVCache: bf16, possibly reorder for better locality

Usability

  • warn if unknown arguments given. std::map of known arg names?
  • multiple .cc files to speed up builds
  • Actionable error codes as return values: kLoadFailed, kSeqTooShort
  • move eval/test files to tests/
  • Ctrl+C signal handler to ensure profiler results are printed without requiring %q input
  • add --prompt flag to run.cc
  • random prompt generation for debug_prompt.cc

File format

  • store ModelInfo in weights BlobStore
  • store tensor info in BlobStore
  • store tokenizer in BlobStore

New models

  • (Daniel) Support PaliGemma
  • Split Model into ModelFamily and ModelSize

[x] General infra

  • (pculliton) Python wrapper
  • (pculliton, ...) Improved CI: run on Kaggle infra
  • AuxOut to hold timing info instead of printing in GenerateImpl.
  • Sampling struct holds rng and temperature, to reduce length of args
  • (P. C.) use new HWY_EXPORT_T to simplify dispatch - ready, awaiting Highway 1.2 release
@tilakrayal tilakrayal added the announcement Announcing the RoadMap label Apr 29, 2024
@jan-wassenberg
Copy link
Member Author

Making good progress :)

@MathiasSchindler
Copy link

Is Paligemma part of the scope of gemma.cpp?

@jan-wassenberg
Copy link
Member Author

Let's discuss in #185 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
announcement Announcing the RoadMap
Projects
None yet
Development

No branches or pull requests

3 participants