Lightweight Efficient WASM-first Dynamic Allocator — a simple and compact JS-side allocator for hand-written WASM.
yarn add @anireact/lewd
Sometimes you just need a tiny hand-written WASM module to optimize a certain function or data structure, but don’t want to bloat the module with a general-purpose allocator. Even worse, if you have multiple WASM modules and each of them requires an allocator to work, you’re forced to link an allocator into each one and bloat the resulting bundle.
Well, we have WASM-GC, but as of the moment, it’s not supported across all major
environments and doesn’t support arbitrary load
/store
operations on the heap
objects.
That’s where LEWD Alloc comes in. It can be shared across multiple modules, reduces memory overhead, doesn’t bloat the modules and saves the precious 4 KiB of binary size allowed for sync compilation, supports garbage collection so you can forget about memory leaks, and more.
But it is not a general-purpose allocator, so read the docs carefully, especially the Limitations and Getting started sections.
- Approx. 1.5 KiB minigzip, and ≈3 KiB minified no gzip.
- Bundler/minifier annotations to simplify tree-shaking and DCE.
- Best-fit strategy attempts to reduce memory fragmentation.
- Efficient bookkeeping with zero overhead (on the WASM side).
- GC integration to automatically deallocate unused memory.
- Scratch allocations for immediately dropped temporary buffers.
- 16-byte alignment to safely allocate all WASM types, including
v128
. - Memory shrink to release unused pages back to the host/OS.
- Resize notifications to update data views on memory resize.
- This is not a general-purpose allocator.
- Holding allocated pointers in the data structures on the WASM side should be done with extreme care, or otherwise they’ll be garbage-collected while the associated memory is still in use.
- No allocations from the WASM side.
- No
malloc()
-like strong allocations. - No
free()
-like explicit deallocation. - No
realloc()
andcalloc()
. - No
(data)
segments. - No range protection (to prevent
(data)
and/or stack corruption). - No async instantiation.
- No shared memories.
- No exported memories.
- No multiple allocator instances.
Some of the limitations can be worked around, some others can be worked around in a limited number of scenarios, and others are fundamental to our design and goals and can’t be worked around at all.
Let’s consider a simple PRNG written in plain WASM, that we want to wrap in a JS library. The algorithm has a 128-bit state and provides a number of methods to initialize the state and to generate random numbers. Additionally, our algorithm requires some precomputed constants or tables of, say, 1 KiB in size. We compute them at startup and store at a certain address.
The WASM module imports a memory and a single global and exports a number of functions:
;; The memory of at least 1 page:
(memory (import "env" "mem") 1)
;; Precomputed data address:
(global $tab (import "env" "tab") i32)
;; Precompute the constants, tables or
;; whatever of the total size 1 KiB
;; and save the result at `$tab`:
(func $init (export "init"))
;; Seed a PRNG instance at `$self`:
(func $seed (export "seed") (param $self i32) (param $seed i32))
;; Generate a 32-bit integer using a PRNG at `$self`:
(func $rand (export "rand") (param $self i32) (result i32))
;; Fill a 32-bit integer array at `$out` of
;; the byte length `$len` using a PRNG at `$self`:
(func $fill (export "fill") (param $self i32) (param $out i32) (param $len i32))
The corresponding C++ code looks something like this:
// Precomputed data address:
uint32_t* tab = NULL;
// Precompute the constants, tables or
// whatever of the total size 1 KiB and
// save the result in `*tab`:
void init();
class PRNG {
public:
// Instance state:
uint32_t a, b, c, d;
// Seed a PRNG instance:
void seed(uint32_t seed);
// Generate a 32-bit integer:
uint32_t rand();
// Fill an output array `out` of the byte length `len`:
void fill(uint32_t* out, size_t len);
};
First, we import everything we’ll need later, and re-export the
trim()
function:
import { alloc, trim, bind, memory, buffer, on } from '@anireact/lewd';
export { trim };
Then we define our WASM module object. And don’t forget about the #__PURE__
annotations to make tree-skaing/DCE easier:
const PRNG_MOD = /*#__PURE__*/ new WebAssembly.Module(
/*#__PURE__*/ new Uint8Array([
// Compiled WASM module bytes
]),
);
Sure, we could get the module object in a different way, for example using the
async/streaming WebAsembly API, but that makes tree-shaking/DCE virtually
impossible, especially for Terser, even with #__PURE__
annotations.
Theoretically, bundlers or optimizers like the Google Closure Compiler can be
slightly smarter, but I don’t trust them either.
Alternatively, we could use an import
statement to load a WASM module bundled
with a special loader, but we should be careful to use a loader that loads a
module, not its instance exports.
Of course, the synchronous WebAssembly API limits the module size to ≈4 KiB on the UI thread, but if you have a larger module, it’s likely that its size is already much larger than the size of a traditional WASM-side allocator, so there’s no problem linking it into the WASM module, so you don’t really need LEWD Alloc.
Before initializing of the module, we should allocate some memory for the precomputed tables it requires:
const tab = /*#__PURE__*/ alloc(1024, PRNG_MOD);
Here we allocate 1 KiB of memory with the lifetime token PRNG_MOD
, which is
guaranteed to have the same lifetime as the whole library/application.
Now it’s time to initialize the module and configure its respawn callback. This code has two parts, which will be discussed later:
let impl = /*#__PURE__*/ bind(
// Immediately invoked spawn callback:
() => {
// See below
},
// Respawn callback:
() => {
// See below
},
);
Again, the bind()
function call is annotated as #__PURE__
to
simplify tree-shaking/DCE. The function itself has the #__NO_SIDE_EFFECTS__
annotation, which tells the bundler to mark all its calls as #__PURE__
, but
this annotation has no reliable support across bundlers/minifiers, so we still
have to add the #__PURE__
annotation to each bind()
call.
The bind()
function spawns a WASM module and binds it to the
WebAssembly.Memory
object when the allocator migrates to a new memory.
The function takes two callbacks: the spawn callback and the respawn callback.
The spawn callback is invoked immediately on the bind()
call,
and its result is returned as it is:
// Spawn an instance and provide its imports:
let impl = new WebAssembly.Instance(PRNG_MOD, {
env: { memory, tab },
});
// Invoke the initialization procedure:
impl.init();
return impl;
First, we create a WebAssembly.Instance
object and provide its imports. In our
case, it’s the memory object memory
, imported as env.memory
,s and the
precomputed tables location imported as env.tab
.
Then we call the init()
procedure to precompute the tables and store the
result at the tab
address.
Finally, we just return the local WASM instance, which is immediately assigned
to the top-level impl
variable.
Oh, and technically it’s OK to use an async function here, but this makes tree-shaking/DCE virtually impossible and should be avoided in libraries.
The respawn callback is similar to the spawn callback, but instead of full initialization it should capture the current state of the WASM instance, spawn a new one, and restore the captured state.
In our case, we have no mutable state other than the memory itself, so the code is quite trivial:
// -> Capture code goes here <-
// Respawn the instance:
impl = new WebAssembly.Instance(PRNG_MOD, {
env: { memory, tab },
});
// -> Restore code goes here <-
The state to be captured and restored includes mutable globals (all of which should be exported), tables and other memories (all of which should be imported), and probably certain other objects. Of course, specific objects can be excluded from capture and restore, if their values are consumed synchronously when they’re updated and then no longer used.
If the module has tables or other memories, all of them should be imported,
just like the main memory, and provided as usual on instantiation. If the module
has (data)
segments mapped into other memories, or (elem)
segments mapped
into the tables, and the spaces these segments are mapped into are overwritten
at runtime, they should be captured and restored as well.
Oh, and we don’t support modules with (data)
segments mapped into the main
memory. This can be worked around in a number of scenarios, but in general, if
you have such modules, you should perfer other allocators, or wait for a
(distant) future version of this allocator.
Another important note is that unlike the spawn callback, no async code is allowed here, or otherwise the memory can be corrupted.
The boilerplate is finished, and now we can write a wrapper class for our PRNG:
export class MyPRNG {
// PRNG instance address:
#addr: number;
// PRNG instance memory view:
#view: Int32Array;
// The constructor and initializer:
constructor(seed: number);
// Method bindings:
rand(): number;
fill(out: Int32Array): void;
// State accessor:
state: number[];
}
The class has the #addr
field to hold the address of the PRNG and the #view
field for the memory view into the allocated space.
The constructor should initialize both the wrapper instance and the WASM-side instance:
class MyPRNG {
constructor(seed: number) {
// Type-hint the seed:
seed = seed | 0;
// Allocate:
this.#addr = alloc(16, this) | 0;
// Create the memory view:
this.#view = new Int32Array(buffer, this.#addr >>> 0, 4);
// Register the memory resize handler:
on('resize', this, () => {
this.#view = new Int32Array(buffer, this.#addr >>> 0, 4);
});
// Call the WASM-side constructor:
impl.seed(this.#addr | 0, seed | 0);
}
}
First, we type-hint the argument to improve the performance in some engines. The constructor can be pretty hot in some scenarios, so we should not ignore optimization opportunities here.
Then we allocate the memory with the alloc(16, this)
call. The allocation has the lifetime token this
, which triggers the
deallocation of the associated memory when a particular wrapper object is
garbage-collected by the host. In other words, we don’t have to worry about
memory leaks, because the allocator does everything for us automatically. And
just like the seed
argument, the call is type-hinted.
Now we create an initial view into the allocated memory and register a handler
to update it on memory resize with the on()
function
call. Just like the pointer, the handler is automatically unregistered when
this
is garbage-collected by the host. The important thing here is to cast the
pointer to an unsigned integer with the >>> 0
hint in the Int32Array
calls,
because the addresses higher than 2 GiB are interpreted as negative values, but
the Int32Array
constructor expects an unsigned integer.
Finally, we invoke the WASM-side constructor to actually initialize the PRNG instance. It doesn’t have a return value, so the call isn’t type-hinted, but we type-hint its arguments instead.
Now let’s write the method wrappers:
class MyPRNG {
rand = () => {
return impl.rand(this.#addr) | 0;
};
fill = (out: Int32Array) => {
// Allocate a scratch for the output buffer:
let scratch = alloc(out.byteLength | 0) | 0;
// Call the buffer fill method:
impl.fill(this.#addr | 0, scratch | 0, out.byteLength | 0);
// Copy the scratch into the JS-side buffer:
out.set(new Int32Array(buffer, scratch >>> 0, out.length));
// Release unused memory back to the host/OS:
trim();
};
}
There’s nothing interesting in the rand()
method, but we have a number of
things to discuss about the fill()
method.
The first thing we do here is a scratch allocation for the output data.
Scratches have zero lifetime, and can be thought of as immediately
deallocated pointers. But if there are no allocation calls while the scratch
memory is still in use, we can save some CPU cycles by skipping the underlying
bookkeeping. For scratch allocations, we use the same
alloc()
call, but don’t provide a lifetime token.
Then we just call the fill()
method, and when it fills the scratch, we copy
the data into the output array.
Finally, we release the unused memory pages back to the host/OS with the
trim()
call.
Oh, to keep the example simple, we didn’t handle some edge cases like OOM and filling the output by parts, but in the real-world code you shouldn’t ignore them.
Technically, we could stop here, but to make the wrapper complete, we’ll also write an accessor for its internal state:
class MyPRNG {
get state() {
return [...this.#view];
}
set state(state: number[]) {
this.#view.set([state[0]!, state[1]!, state[2]!, state[3]!]);
}
}
Nothing interesting, we just wrap the #view
we’ve configured in the
constructor.
function alloc(size: i32): i32;
function alloc(size: i32, token: WeakKey): i32;
size
:i32
— The number of bytes to allocate.token
:WeakKey
— The lifetime token.
Allocate size
bytes and get the allocated address.
If the token
is specified, it indicates the lifetime of the allocation, so the
pointer is automatically deallocated when the host garbage-collects the token
.
If no token
is specified, the function returns a scratch pointer. No memory
is actually allocated, and just an unused address to fit size
bytes is
returned. This can save some CPU cycles by skipping the underlying bookkeeping,
but should be used carefully to avoid any allocations while the scratch memory
is still in use.
Exceptions:
- On OOM, or if
size
is zero, throws aRangeError
.
function grow(diff: i32): bool;
diff
:i32
— Number of 64 KiB pages to grow the memory by.
Grow the memory by diff
64 KiB pages and return 0
on success or 1
on OOM.
If diff
is zero or negative, does nothing.
function trim(): void;
Release unused memory pages in the end of memory back to the host/OS.
Internally, it creates a trimmed copy of memory, copies the data into it, and
then triggers the memory shrink handlers defined as the respawn callbacks of the
bind()
function. Can cause OOM, if the host is short on RAM.
function bind<t>(initial: () => t, respawn: () => unknown): t;
spawn
:fn
— The immediately invoked spawn callback. See the Spawn callback section for details.respawn
:fn
— The handler to respawn the WASM instance on memory shrink. Should also capture and restore mutable globals, tables, etc. See the Respawn callback section for details.
Register a memory shrink handler respawn
, immediately invoke the spawn
callback, and get its result.
NB:
- If a WASM module has mutable globals, tables, etc., don’t forget to capture and restore them on respawn.
- If a module has a global initialization procedure, don’t call it on respawn; instead, capture and restore the initialized state (other than memory).
The active memory object.
The active buffer object.
function on(type: 'resize', token: WeakKey, fn: (_: Resize) => any): void;
function on(type: 'grow', token: WeakKey, fn: (_: Grow) => any): void;
function on(type: 'trim', token: WeakKey, fn: (_: Trim) => any): void;
function on(type: 'oom', token: WeakKey, fn: (_: OOM) => any): void;
type
:string
— The event type.token
:WeakKey
— The handler’s lifetime token.fn
:fn
— The handler function.
Register an event handler. The handler is automatically unregistered when the
host garbage-collects its token
. Each handler can be registered once per
token.
function off(type: 'resize', token: WeakKey, fn?: (_: Resize) => any): void;
function off(type: 'grow', token: WeakKey, fn?: (_: Grow) => any): void;
function off(type: 'trim', token: WeakKey, fn?: (_: Trim) => any): void;
function off(type: 'oom', token: WeakKey, fn?: (_: OOM) => any): void;
type
:string
— The event type.token
:WeakKey
— The handler’s lifetime token.fn
:fn
— The handler function.
Explicitly unregister the event handler. If no handler specified, all handlers
of the given token
are removed.
The 'resize'
event object type:
type
:'resize'
— The event type.info
:Resize.Info
— Resize details.
The 'grow'
event object type:
type
:'grow'
— The event type.info
:Resize.Info
— Resize details.
The 'trim'
event object type:
type
:'trim'
— The event type.info
:Resize.Info
— Resize details.
The 'oom'
event object type:
type
:'oom'
— The event type.info
:OOM.Info
— OOM details.
The resize details object:
oldSize
:f64
— Old memory size in bytes.newSize
:f64
— New memory size in bytes.memory
:WebAssembly.Memory
— Active memory object.buffer
:ArrayBuffer
— Active buffer object.
The OOM details object:
currentCapacity
:i32
— Current capacity in 64 KiB pages.requestCapacity
:i32
— Requested capacity in 64 KiB pages.maximumCapacity
:i32
— Maximum capacity in 64 KiB pages.
totalBytes():f64
— Get the memory capacity in bytes.usedBytes():f64
— Get the number of allocated bytes.runBytes():f64
— Get the used span size in bytes, including free gaps between allocated chunks.freeBytes():f64
— Get the number of free bytes.headBytes():f64
— Get the number free bytes in the beginning of memory. Calculated independently oftailBytes
.tailBytes():f64
— Get the number free bytes in the end of memory. Calculated independently ofheadBytes
.largestBytes():f64
— Get the largest free chunk size in bytes.
totalAtoms():i32
— Get the memory capacity in 16-byte atoms.usedAtoms():i32
— Get the number of allocated 16-byte atoms.runAtoms():i32
— Get the used span size in 16-byte atoms, including free gaps between allocated chunks.freeAtoms():i32
— Get the number of free 16-byte atoms.headAtoms():i32
— Get the number free 16-byte atoms in the beginning of memory. Calculated independently oftailAtoms
.tailAtoms():i32
— Get the number free 16-byte atoms in the end of memory. Calculated independently ofheadAtoms
.largestAtoms():i32
— Get the largest free chunk size in 16-byte atoms.
-
malloc()
,realloc()
,calloc()
,free()
— C-like API. - Span protection to prevent
(data)
and/or stack corruption (and effectively support__heap_base
or whatever your compiler uses instead). - Restricted working span to allow this allocator to be safely used alongside another one.
- Safe handling of
(data)
segments in multiple modules. - Arbitrary alignment.
- Best effort next-fit strategy.
- Pointer kind conversion (strong to weak and vice versa).
- Debugging API.
- Configurable limits.
- Relocatable pointers.
- Shared memory support.
- Exported memory support.
If you have any issues, feel free to fork this module, or contact me on Telegram https://t.me/miyaokamarina.
MIT © 2024 Yuri Zemskov