Speed up the memory allocation and improve the GC performance, especially for dynamic-memory-heavy applications.
NOTE: need go1.18+.
- A large amount of memory never needs to be released. (global configs, read-only assets like navmesh)
- Massive temporary objects with deterministic lifetime. (protobuf objects sent to network)
Linear allocator:
- Mush faster on memory allocating. An allocation is just a pointer advancement internally.
- Can greatly reduce the object marking pressure of GC. Lac is just a few byte arrays internally.
- More general. Lac can allocate various types of objects.
- Much simpler and faster on reclaiming memories. No need to manually release every object back but just reset the allocation cursor.
- Much cheaper. Lac reuse memory chunks among each other via chunk pool.
- Memory efficient. Memories are more compact, CPU cache-friendly.
- Allows build-in allocated objects to be attached to the Lac allocated objects.
- Support concurrency.
- Provide protobuf2 style APIs.
- Never store pointers to build-in allocated objects into Lac allocated objects directly. (There's a debug mode for checking external pointers)
- Never store or use pointers to Lac allocated objects after the allocator is released. (In debug mode, the allocator traverses the objects and obfuscate the pointers to make any attempting usage panic)
- Map memory can't use Lac and fallback to build-in allocator.
- Faster(see benchmark results below).
- Support concurrency.
- Slice append can utilize Lac as well.
- Support debugging mode.
- Provide protobuf2 style APIs.
- Completely pointer free (no pointer bitmap initializing, no GC marking, etc).
- donot zero memory as much as possible.
type PbItem struct {
Id *int
Price *int
Class *int
Name *string
Active *bool
}
type PbData struct {
Age *int
Items []*PbItem
InUse *PbItem
}
func main() {
ac := acPool.Get()
defer ac.Release()
d := lac.New[PbData](ac)
d.Age = ac.Int(11)
n := 3
for i := 0; i < n; i++ {
item := lac.New[PbItem](ac)
item.Id = ac.Int(i + 1)
item.Active = ac.Bool(true)
item.Price = ac.Int(100 + i)
item.Class = ac.Int(3 + i)
item.Name = ac.String("name")
d.Items = Append(ac, d.Items, item)
}
}
Results from benchmark tests:
- go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: oops/lib/linear_ac/lac
cpu: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz
BenchmarkNew-6 65127 118814 ns/op 167 B/op 0 allocs/op
BenchmarkNewFrom-6 64368 117673 ns/op 217 B/op 1 allocs/op
Benchmark_RawMalloc-6 9584208 124.6 ns/op 88 B/op 5 allocs/op
Benchmark_LacMalloc-6 17156577 68.20 ns/op 0 B/op 0 allocs/op
Benchmark_LacMallocMt-6 17331070 69.52 ns/op 0 B/op 0 allocs/op
Benchmark_RawMallocLarge2-6 39134 35891 ns/op 27496 B/op 1656 allocs/op
Benchmark_LacMallocLarge2-6 115216 14820 ns/op 9135 B/op 0 allocs/op
Benchmark_LacMallocLarge2Mt-6 62988 19065 ns/op 0 B/op 0 allocs/op
- go test -bench . -tags=goexperiment.arenas -benchmem
(A simple test shows allocation performance compared with v1.20 arena)
goos: linux
goarch: amd64
pkg: oops/lib/linear_ac/lac
cpu: Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz
BenchmarkNew-6 65473 118687 ns/op 166 B/op 0 allocs/op
BenchmarkNewFrom-6 65342 118379 ns/op 215 B/op 1 allocs/op
Benchmark_RawMalloc-6 9832786 122.7 ns/op 88 B/op 5 allocs/op
Benchmark_LacMalloc-6 16667865 69.94 ns/op 0 B/op 0 allocs/op
Benchmark_LacMallocMt-6 17386522 70.15 ns/op 0 B/op 0 allocs/op
Benchmark_RawMallocLarge2-6 40267 35076 ns/op 27496 B/op 1656 allocs/op
Benchmark_LacMallocLarge2-6 109783 14897 ns/op 8239 B/op 0 allocs/op
Benchmark_LacMallocLarge2Mt-6 63415 18895 ns/op 0 B/op 0 allocs/op
Benchmark_RawMallocSmall-6 9126696 128.5 ns/op 88 B/op 5 allocs/op
Benchmark_LacMallocSmall-6 30242865 43.19 ns/op 0 B/op 0 allocs/op
Benchmark_ArenaMallocSmall-6 12157006 160.3 ns/op 88 B/op 0 allocs/op
Benchmark_RawMallocLarge-6 43189 34958 ns/op 27496 B/op 1656 allocs/op
Benchmark_LacMallocLarge-6 93844 12481 ns/op 27267 B/op 0 allocs/op
Benchmark_ArenaMallocLarge-6 35060 31891 ns/op 26637 B/op 45 allocs/op
- go test -bench . -benchmem
goos: windows
goarch: amd64
pkg: linear_ac/lac
cpu: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
BenchmarkNew-8 46989 115594 ns/op 131 B/op 0 allocs/op
BenchmarkNewFrom-8 50430 127946 ns/op 190 B/op 1 allocs/op
Benchmark_RawMalloc-8 7590996 163.1 ns/op 88 B/op 5 allocs/op
Benchmark_LacMalloc-8 15614894 81.11 ns/op 0 B/op 0 allocs/op
Benchmark_LacMallocMt-8 12081428 90.54 ns/op 0 B/op 0 allocs/op
Benchmark_RawMallocLarge2-8 32990 40105 ns/op 27496 B/op 1656 allocs/op
Benchmark_LacMallocLarge2-8 79690 18879 ns/op 1028 B/op 0 allocs/op
Benchmark_LacMallocLarge2Mt-8 44228 26353 ns/op 0 B/op 0 allocs/op
- go test -bench . -tags='goexperiment.arenas' -benchmem
goos: windows
goarch: amd64
pkg: linear_ac/lac
cpu: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
BenchmarkNew-8 46694 114681 ns/op 132 B/op 0 allocs/op
BenchmarkNewFrom-8 49684 122523 ns/op 192 B/op 1 allocs/op
Benchmark_RawMalloc-8 6185844 183.2 ns/op 88 B/op 5 allocs/op
Benchmark_LacMalloc-8 14673064 104.0 ns/op 0 B/op 0 allocs/op
Benchmark_LacMallocMt-8 10891394 99.24 ns/op 0 B/op 0 allocs/op
Benchmark_RawMallocLarge2-8 25015 47155 ns/op 27496 B/op 1656 allocs/op
Benchmark_LacMallocLarge2-8 75883 18138 ns/op 6 B/op 0 allocs/op
Benchmark_LacMallocLarge2Mt-8 43879 26313 ns/op 0 B/op 0 allocs/op
Benchmark_RawMallocSmall-8 7762598 158.6 ns/op 88 B/op 5 allocs/op
Benchmark_LacMallocSmall-8 21674264 60.13 ns/op 0 B/op 0 allocs/op
Benchmark_ArenaMallocSmall-8 6486002 179.4 ns/op 87 B/op 0 allocs/op
Benchmark_RawMallocLarge-8 24450 46036 ns/op 27496 B/op 1656 allocs/op
Benchmark_LacMallocLarge-8 62058 27365 ns/op 27265 B/op 0 allocs/op
Benchmark_ArenaMallocLarge-8 10000 169460 ns/op 26978 B/op 45 allocs/op