The Reed-Muller encoder inside every HQC implementation leaks the full 128-bit encapsulation message with 96.9% success from a single power trace on ARM Cortex-M4 (using 5,000 total profiling + evaluation traces). Not because the code is badly written — because the algorithm forces it to test message bits. PermNet-RM removes that test entirely by reformulating encoding as a fixed-topology butterfly network where message bits never control any operation, with a shared-output Boolean-masked d=1 variant that closes the residual 32-bit ARM leakage under a d=1 probing model.
No cryptography background needed to understand this
NIST selected HQC for standardisation in March 2025. It will protect encrypted communications against future quantum computers. At its core, HQC wraps every message in a Reed-Muller error-correcting code before transmitting it. That encoding step is where the problem lives.
Every HQC implementation uses the same encoder. It works by testing each secret bit one at a time, and XOR-ing a generator row into the output if the bit is 1. This means the chip draws different amounts of power depending on each bit. Jeon et al. (ePrint 2026/071) plugged a power meter into an STM32F303 (ARM Cortex-M4) and recovered the full 128-bit encapsulation message with 96.9% success from a single decapsulation trace, using 5,000 total profiling + evaluation traces.
Previous defences — masking the decoder, shuffling operation order, rewriting the loop as an FFT — all failed because the root problem is mathematical: the algorithm inherently needs to know which bits are set. Any code that expresses that will leak. The only fix is to stop the algorithm from testing bits at all.
The engineering observation: Reed-Muller encoding is algebraically identical to a GF(2) zeta transform applied to a fixed indicator vector (a classical result that traces to Yates, 1937). This transform has a butterfly decomposition where every stage is a fixed XOR-and-shift — the wiring never changes. Message bits enter as the initial register state and are never read again. Zero branching. Zero message-dependent operations. Zero timing spread across all 256 inputs on x86-64. A per-stage compiler barrier prevents GCC from folding isolated-bit propagation back into a neg-equivalent BIT0MASK instruction.
The standard encoder asks "is bit i set?" for every generator row. Any way you write that check will leak. PermNet-RM never asks the question. Each message bit is placed at a fixed position in a 128-bit register, and then a predetermined sequence of XOR operations propagates it to all the right output positions — like a sorting network, but for bits. The encoder is provably constant-time at the binary level across all six GCC optimisation levels tested. It is a direct drop-in for reed_muller_encode() with zero changes to calling code.
PermNet-RM costs 1.33 additional cycles per encode compared to the reference encoder. Across all 16 encoder invocations per HQC-128 encapsulation, that is 21 nanoseconds — unmeasurable against the 200–500 µs total encapsulation time.
Every one of the 256 possible input bytes takes exactly 6 cycles to encode. The BIT0MASK and Branched encoders exhibit 1-cycle spread — confirming data-dependent timing at -O3 on this measurement setup.
-(uint64_t)((m >> i) & 1) whose Hamming weight is 0 or 64 — directly encoding the message bit in power consumption.gf_carryless_mul, find_peaks, Karatsuba base cases, key re-sampling). YODO does not attack the RM encoder directly; it motivates constant-time work across the HQC codebase.__asm__ volatile ("" : "+r"(x))) additionally prevents GCC from folding the butterfly on an isolated 1-bit register into a single neg / negs instruction that would syntactically reintroduce the BIT0MASK idiom.(cw_share0, cw_share1) and performs no unmask XOR inside the encoder. ELMO: 11.1× peak, 11.7× mean, 31× on bit 6 versus BIT0MASK. Leaking-cycle fraction drops from 68% (BIT0MASK) to 1.06%.reed_muller_encode() (unmasked variant). Covers HQC-128 (RM(1,7)) and HQC-192/256 (RM(1,8)). Masked shared-output variant changes the ABI to return two codeword halves; downstream HQC consumer holds both until reconstruction is safe.Seven butterfly stages. No conditionals. No loops. All masks and shifts are compile-time constants. Exhaustively verified against the generator-matrix encoder across all 256 inputs.