- Does an existing implementation already do what you need? (credit Ariel Rokem, @arokem)
- Start with a reference implementation: slow, but clear
- Test reference implementation
- At any point, stop if performances are "good enough" (credit Boris Gorelik, @boris_gorelik)
- Benchmark the reference implementation for bottlenecks
- Focus on the "low-hanging fruits" first
- Test each iteration against the reference implementation
- Step-back: a different algorithm may yield better performances
- Do not sacrifice clarity for speed...
- ... although, if you must, do it as last step.
- Start first with numpy vectorization
- Do not forget about numexpr
- Use Numba
- Switch to Cython for more control or if you need to distribute a smaller self-contained package
- To use multi-CPUs, release the GIL and use thread-based parallelism (joblib)
Both are excellent solutions. Which one to choose?
- Cython: excellent performances 🏃🏃🏃🏃
- Numba: excellent performances (sometimes a bit faster than cython!) 🏃🏃🏃
🏆 Winner: Numba, but almost a tie (application dependent)
- Cython: fair
- need to manully declare types and adding imports (more work than with Numba) 😐
- Cython does not accept UTF-8 identifiers (α, β, γ, 😢)
- Need duplicated code for a pure-python reference 😢
- Numba: easy
- just decorate with @numba.jit 😃
- A pure-python reference can be the same function undecorated 😃
- Default arguments needs to be passed 😢
🏆 Winner: Numba
- Cython: good understanding of how is optimized:
- HTML annotations: see "python-hits" in C code to guide optimization
- line_profiler: see which line takes the most time
- Numba: optimization is mostly a black-box. Either it works, or it doesn't. If things go wrong (fail to compile or poor performances) it is more of a guess work to understand why.
🏆 Winner: Cython
- Cython: you can create small pre-compiled packages with no run-time dependencies (wheels or conda packages)
- Numba: has a run-time dependency on LLVM. LLVM is by default in Anaconda for all platforms, but still is a "big" dependency.
🏆 Winner: Cython
- Cython: easy on Linux and macOS, compiler included in Anaconda. A pain on Windows: need to install MS Visual Studio 2015 (multi-GB) and let Cython find it. Alternatively, on Windows 10 you can try the "native" Linux subsystem: Using WSL and MobaXterm to Create a Linux Dev Environment on Windows.
- Numba: easy, the LVMM dependency is easy to install on Linux, macOS and Windows
🏆 Winner: Numba