ERIM: Secure Efficient in-process isolation with protection keys
The 2019 USENIX Internet Defense Prize went to the Software Systems group at Max Planck for their work on ERIM: Secure, Efficient In-process Isolation with Protection Keys(pdf). The idea is to use the Memory Protection (MPK) hardware and binary re-writing to protect individual parts of processes from the other portions, allowing finer-grained isolation than processes and potentially lower overhead than IPC.
The "new" MPK hardware debuted four years ago (2015) and allows applications to divide their address space into sixteen regions using four previously unused page table bits. There are also 32-bit per-thread register that select which of the regions are currently accessible and writable.
The motivation for this new hardware is improving performance. Changing these MPK protection bits does not require TLB flushes, so it is significantly lower overhead than adjusting page table mappings. System calls like
mprotect() are quite slow for changing the state of gigabytes or terabytes of memory, so an application can use MPK to mark an entire NVRAM device or memory-mapped database as read-only with a single instruction.
The threats that MPK prevents are from other processes on the system with arbitrary read access, but not from a malicious root user or hypervisor. Nor does it protect against side-channel attacks; Meltdown, Row Hammer and others are explicitly excluded. It is not clear if it protects against hostile hardware with DMA access; the interaction of DMA with MPK is not examined in the paper. Additionally, MPK does not have any way to attest that the process has been started in protected memory, so it is not useful in an untrusted cloud environment. And finally, an attacker with arbitrary code execution in the application can enable all the regions and read any protected contents.
While MPK doesn't address most active malicious threats -- it is not an enclave like SGX or SEV -- the MPK hardware does not add much additional latency and setting the register is quite fast, so it is a useful way to prevent accidental leaks of the data. Calls into the code that operates on data in the protected regions requires call gates to enable the data, which adds a small amount of overhead. They also add binary instrumentation to scan applications or libraries for use of the
XRSTOR instructions to ensure that they are always used in pairs and with safe operations, which ensures that ROP or other control-flow attacks can't exploit them.