The library uses a lot of stack #13

faern · 2022-06-01T07:56:42Z

We have noticed that for the larger key sizes that the library supports, we need to run the functions in a thread with increased stack size. Even if we pass in heap allocated buffers for the key out parameters. My guess is that the library stores the public key on the stack somewhere internally. We have not looked at the code. But it would be great if the caller could somehow tell the library to use the heap more, so it can run in threads with default stack sizes.

This is somewhat related to #12, where we also need a way for the caller to control how the library handles memory buffers.

prokls · 2022-06-03T07:35:27Z

Ok, this is possible. Public keys of Classic McEliece are huge and various operations are applied. Fundamentally I preferred to use the stack for allocation over the heap. If there is some unnecessary copy operation somewhere, I will gladly fix it. It is difficult to fix it without knowing the location though.

dkales · 2022-08-24T12:40:27Z

Probably at some point it would be nice to look at temporary buffers that are used by keygen,encaps,decaps and see if there are some that are huge and can be reduced.

faern · 2022-08-25T09:49:42Z

I tried to run https://crates.io/crates/cargo-call-stack on this crate. But it failed. But using something like that to see which functions require a lot of stack could be very helpful in finding out where the lowest hanging fruits are.

dkales · 2022-08-25T10:46:49Z

I actually tried the same project yesterday, got a bit further by using the musl target, but there are a lot of unhandled llvm intrinsics atm. When I get the time ill try a patched version that just assumes all llvm intrinsics lower to machine code.

meisterluk · 2024-10-14T21:17:14Z

FTR: My account @prokls is inactive since I left university quite some time ago. I am going to use my private account @meisterluk now to contribute.

I did a code review for the current HEAD and inserted the highest numbers for the configuration with the largest keys. I was able to find the following (larger) stack allocations:

file	function / line	source code	bytes
`benes.rs`	`apply_benes` line 140	`let mut bs = [0u64; 64];`	512
		`let mut cond = [0u64; 64];`	512
`benes.rs`	`apply_benes` line 257	`let mut r_int_v = [[0u64; 64]; 2];`	1024
		`let mut r_int_h = [[0u64; 64]; 2];`	1024
		`let mut b_int_v = [0u64; 64];`	512
		`let mut b_int_h = [0u64; 64];`	512
`benes.rs`	`support_gen` line 348	`let mut l: [[u8; 1024]; 13]`	13312
`controlbits.rs`	`controlbitsfrompermutation` line 293	`let mut temp: [i32, 16384]`	49152
		`let mut pi_as_i32: [i32; 4096]`	16384
`decrypt.rs`	`decrypt` line 18	`let mut l = [0u16; 8192];`	16384
		`let mut images = [0u16; 8192];`	16384
`encrypt.rs`	`syndrome` line 189	`let mut row = [0u8; 1024];`	1024
`pk_gen.rs`	`pk_gen` line 210	`let mut buf = [0u64; 8192];`	65536
		`let mut mat = [[0u8; 1024]; 1664];`	1703936
		`let mut g = [0u16; 129];`	258
		`let mut l = [0u16; 8192];`	16384
		`let mut inv = [0u16; 8192];`	16384
`sk_gen.rs`	`genpoly_gen` line 9	`let mut mat = [[0u16; 128]; 129];`	33024

Additional attention must be spent on:

struct PublicKey<'a>(KeyBuffer<'a, 1357824>); and struct SecretKey<'a>(KeyBufferMut<'a, 14120>); must be heap-allocated
On my machine, the maximum stack size is 8388608 (ulimit -s returns the kilobyte count on my Arch Linux kernel 6.10)

Then the following conclusions can be made:

PublicKey and SecretKey are handled properly. KeyBuffer* instances provide a proper enum variants to represent borrowed or owned keys. And util::alloc_boxed_array::<SIZE>() provide a utility to enforce heap allocation which is usually done.
mat in pk_gen.rs has the largest stack allocation size with 1703936 bytes (i.e. 20% of the allowed maximum stack size on my machine). The second placed buf with 65536 bytes in the same stack amounts to 0.8% of the maximum stack size. Evidently, mat is the elephant in the room.
crypto_kem_keypair in operations.rs might call pk_gen several times, but several stack frames with mat are never allocated simultaneously.

I changed let mut mat = [[0u8; SYS_N / 8]; PK_NROWS] to let mut mat = Box::new([[0u8; SYS_N / 8]; PK_NROWS]) and looked at runtime result from cargo test. The results are completely inconclusive and I have to run the actual benchmarks.

faern · 2024-10-16T21:33:05Z

This is apparently not as bad as it used to be. Maybe this is even a non-issue? I have not analyzed this in a long time. I think we have lowered the amount of stack we assign the special CME thread at some point. But we still run it in a separate thread, so we don't have to rely on default stack sizes. Yours seems to be 8 MiB, but we can't rely on that being the case everywhere. The code has to run fine on mobile etc for us as well.

This is our current workaround for the stack problem. But the comment seems to indicate that we no longer know if this actually is a problem 🙃
https://github.com/mullvad/mullvadvpn-app/blob/8a06592ba05a2e7cc56261532fdfda2688fef204/talpid-tunnel-config-client/src/classic_mceliece.rs#L4-L8

EDIT: We of course run it in a separate thread in order to make it not block the async runtime. This is a way to offload the computation.

When using the library in the variant with the largest public key (namely mceliece8192128), the stack allocation amounts to 1,703,936 bytes. This is 20% of the maximum admissible stack size of my system. This high stack consumption is a problem on several systems. Thus, I propose to allocate this huge array on the heap. Context: issue #13 #13

meisterluk mentioned this issue Oct 27, 2024

Reduce stack usage on alloc #48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The library uses a lot of stack #13

The library uses a lot of stack #13

faern commented Jun 1, 2022

prokls commented Jun 3, 2022

dkales commented Aug 24, 2022 •

edited

Loading

faern commented Aug 25, 2022

dkales commented Aug 25, 2022

meisterluk commented Oct 14, 2024

faern commented Oct 16, 2024 •

edited

Loading

The library uses a lot of stack #13

The library uses a lot of stack #13

Comments

faern commented Jun 1, 2022

prokls commented Jun 3, 2022

dkales commented Aug 24, 2022 • edited Loading

faern commented Aug 25, 2022

dkales commented Aug 25, 2022

meisterluk commented Oct 14, 2024

faern commented Oct 16, 2024 • edited Loading

dkales commented Aug 24, 2022 •

edited

Loading

faern commented Oct 16, 2024 •

edited

Loading