Sleep Attack: Intel Bootguard vulnerability waking from S3

CVE-2020-8705 allows a local attacker with physical access to the SPI flash chip on a sleeping machine to perform a Time-of-check/Time-of-use (TOCTOU) attack against Intel's Bootguard hardware root of trust and gain control of the firmware when resuming from S3 sleep. It was patched by Intel as part of INTEL-SA-00391 with the usual terse description:

CVEID: CVE-2020-8705
Description: Insecure default initialization of resource in Intel(R) Boot Guard in Intel(R) CSME versions before 11.8.80, 11.12.80, 11.22.80, 12.0.70, 13.0.40, 13.30.10, 14.0.45 and 14.5.25, Intel(R) TXE versions before 3.1.80 and 4.0.30, Intel(R) SPS versions before E5_04.01.04.400, E3_04.01.04.200, SoC-X_04.00.04.200 and SoC-A_04.00.04.300 may allow an unauthenticated user to potentially enable escalation of privileges via physical access.
CVSS Base Score: 7.1 High
CVSS Vector: CVSS:3.1/AV:P/AC:H/PR:N/UI:N/S:C/C:H/I:H/A:H

Overview

When modern Intel x86 computers go to sleep, they enter what is called "S3". This mode maintains power to DRAM, but shuts off the CPU entirely. All CPU state is lost and must be restored upon waking from S3. The cold boot and resume from sleep start with the same code path in the SPI flash (described in more detail below) and diverge once the firmware recognizes that the system is resuming from sleep and does not need to initialize the DRAM controller.

During a normal boot the firmware validates the contents of the SPI flash are signed by the OEM, but this step is skipped on many platforms during resume from S3. This allows a local attacker with physical access to substitute their unsigned code in place of the official firmware and take control of the system very early in the resume process, without disturbing the contents of the main memory.

Since the attacker has code execution before any protections are in place, they are able to walk through all of memory to look for secrets or disable any OS-level protections like screen lock processes. This risk is especially dangerous since the LUKS and BitLocker disk encryption keys are typically stored in RAM while the computer is asleep, allowing the attacker to gain access to the decrypted disks without requiring any user passwords.

Additionally, System Management Mode (SMM) memory (SMRAM) is not normally modified during the resume path. Normally all of the SMM modules are installed during the cold boot and then the SMRAM is locked so that it can not be modified by the OS. On a resume, the SMRAM is unlocked, so the firmware normally leaves all of the current SMM modules in place and locks the SMRAM as-is. Since the attacker has access with the SMRAM unlocked, it can gain temporary persistence by writing malicious code into SMRAM, and then locking it, which would provide a nearly undetectable backdoor until the next hard reboot.

Attack scenarios

Since CVE-2020-8705 requires physical access, it is harder for an attacker to use than a remote exploit. However, there are a few realistic attack scenarios where it could be used.

One example is when clearing customs at an airport. Most travellers close their laptop during descent and allow it to enter S3 sleep. If the device is taken by the adversarial agency upon landing, the disk encryption keys are still in memory. The adversary can remove the bottom cover and attach an in-system flash emulator like the spispy to the flash chip. They can wake the machine and provide it with their firmware via the spispy. This firmware can scan memory to locate the OS lock screen process and disable it, and then allow the system to resume normally. Now they have access to the unlocked device and its secrets, with no need to compel the owner to provide a password.

The adversary can also install their own SMM "Ring -2" rootkit at this point, which will remain resident until the next hard reboot. This could provide them with code execution on the system when it has moved to a trusted network, potentially allowing horizontal movement.

Another example is a hardware implant that emulates the SPI flash. The iCE40up5k used in one of the variants of the spispy fits easily inside or underneath an SOIC-8 package, allowing a persistent attack against the resume path. Since the FPGA can easily distinguish between a cold boot and validation from the system resuming from sleep, the device can provide a clean version of the firmware with the correct signature when it is being validated or read by a tool like flashrom, and only provide the modified version during a resume from sleep. This sort of implant would be very difficult to detect via software, and if done well, would not look out of place on the mainboard.

Previous attacks

CVE-2020-8705 is not the first time there have been security vulnerabilities in the wake-from-sleep boot path, nor the first TOCTOU against firmware. Since resuming from S3 sleep is essentially a cold reboot for the CPU, there are many internal states that need to be restored before it can resume running from the contents of memory. Much of the platform security depends on one-time writable lock registers, and there have been multiple vulnerabilities that caused by race conditions in setting them, or in some cases where the vendors forgot to re-enable the protection when resuming from sleep.

S3 sleep attacks

Some examples of prior sleep attacks are the Snorlax (VU#577140) vulnerability found by Cornwell, Butterworth, Kovah, and Kallenberg, in which the BIOS Lock Enable (BLE) bit in the BIOS_CNTL register is reset to its default value after a system reset. If the BIOS does not set it again after waking from sleep, then any code running in ring 0 would be able to overwrite the system firmware.

Vilaça discovered and disclosed the similar Prince Harming vulnerability. He found that Apple did not set the Flash Lock Down (FLOCKDN) and the Protected Range Registers (PRR) when resuming from sleep on some platforms, which left the SPI flash writable to any ring 0 code or process that could access physical memory.

Wojtczuk and Kallenberg found Darth Venamis (VU#976132), which allowed an attacker to modify the S3 resume script on some platforms to gain code execution in the resume path prior to the FLOCKDN, BIOS_CNTL or other platform bits being reset, giving the attacker write access to the firmware flash during the resume.

Kovah and I ported the Darth Venamis attack to the MacBooks as part of Thunderstrike 2, showing that these vulnerabilities were portable across operating systems and firmware implementations. Darth Venamis and Thunderstrike 2 were largely fixed by storing the S3 resume script in SMRAM, the "S3 Lockbox", so that it is not accessible to normal processes after the firmware has finished.

Other researchers have found that some OEMs are skipping validation during S3 resume in the later steps as well. Alexander Ermolov found that Dell performed Bootguard validation in the DXE phase, which was trivially bypassable by removing the BootGuardDxe module. He also observed in his S3 microcode downgrade talk that the "ACM does not verify BIOS when waking from S3 (performance optimizations) except each 12 boot", which is the same root cause as CVE-2020-8705.

Matrosov hinted that "S3 rootkits coming :)", and disclosed some vendors that were not properly protecting the S3 resume path on their platform, allowing Bootguard to be bypassed.

Sleep attacks on other devices

Other hardware has suffered from this sort of sleep attack as well. Han's "Bad Dream" TPM attacks (CVE-2018-6622) provide malicious data to the TPM on resume, allowing an attacker to subvert the measured boot on TPM sealed data. Since the TPM does not maintain state during sleep, it trusted that the CPU was providing the correct blob for the PCRs upon resume.

TOCTOU on firmware

In 2019, Bosch and I discovered that there were TOCTOU attacks possible against Bootguard (on normal boots) due to cache flushes causing some instructions to be re-read from the SPI flash after the ACM had validated the signature on the PEI region. This vulnerability was assigned CVE-2019-11098 was only recently patched due to the difficulty in fixing the entire class of XIP bugs.

Bootguard background

Modern Intel x86 CPUs no longer boot by a simple jump to the legacy reset vector at 0xF00:FFF0 in real mode. Unfortunately the details are not public for some reason, so security researchers have had to reverse engineer the behaviour from observing the devices. This is a summary of my observations as well as detailed descriptions by Matrosov and Ermolov.

The CPU's on-die boot ROM first reads the Firmware Interface Table (FIT) at 0xFFFFFFC0 in the memory mapped flash and installs any appropriate microcode updates listed in the FIT.
The CPU then locates the Intel signed Authenticated Code Module (ACM) entry in the FIT and copies the ACM into cache. Intel's signature on the ACM is validated by the CPU before jumping to the ACM's entry point.

The ACM receives Bootguard configuration from the Intel Management Engine (ME) (which has it's own chain-of-trust and hardware boot ROM). This configuration includes the hash of the OEM public key and the configuration registers, and is stored in one-time-programmable (OTP) fuses (also known as Field Programmable Fuses (FPF)). These fuse values are "blown" by the OEM at the end of manufacturing and are permanent -- they physically can not be changed after the configuration is programmed.

On a cold boot, the ACM locates the Bootguard Key Manifest (KEYM) and copies it into cache, then validates the OEM signature on it, ensuring that it was signed with the public key that matches the hash from the OTP fuses. The key in the key manifest is used to sign the Bootguard Policy Manifest (ACBP), which is also copied into cache and the signature validated.

One section of the ACBP is the Initial Bootblock Element (IBBS), which has the list of Initial Boot Block (IBB) segments that are protected by Bootguard. These typically include the UEFI PEI and sometimes the DXE firmware volumes, although not the NVRAM regions. The ACM copies all of the segments of the IBB from the flash into cache and computes the hash of it, which is then compared to the hash stored in the signed IBBS.

Since this chain of signatures and hashes goes all the way to an on-die boot ROM with a permanent public key, Bootguard is said to be a hardware root of trust. Bootguard allows the CPU manufacturer to verify the ACM, which verifies the OEM's firmware and transfers the chain to the OEM's IBB. The OEM is then responsible for verifying the rest of its firmware, which then verifies the bootloader signature, which then verifies the OS kernel signature, and so on. This is a significant improvement over OS level security, which assumes that the bootloader is valid, or UEFI SecureBoot, which validates the bootloader, but assumes that the firmware is unmodified.

If any of the KEYM or ACBP signature checks fail, or if the IBB hash comparison fails, then the ACM takes action depending on the Bootguard OTP configuration. These Bootguard configurations available have not publicly disclosed by Intel, although the FIT tool has several profiles that can be selected related to the security configuration and some researchers have tried to reverse engineer it:

typedef struct BG_PROFILE
{
    unsigned long Force_Boot_Guard_ACM : 1;
    unsigned long Protect_BIOS_Environment : 1;
    unsigned long CPU_Debugging : 1;
    unsigned long BSP_Initialization : 1;
    unsigned long Measured_Boot : 1;
    unsigned long Verified_Boot : 1;
    unsigned long Key_Manifest_ID : 4;
    // 00b – do nothing
    // 01b – shutdown timeout
    // 11b – immediate shutdown
    unsigned long Enforcement Policy : 2;
    unsigned long : 20;
};

Many OEMs seem to select the "immediate shutdown" option, although there some choose "do nothing" and push validation elsewhere, which can lead to its own set of vulnerabilities.

During a S3 resume, the CPU starts up in the same way. The ACM is copied from flash into cache and validated by the CPU hardware, then the ACM checks to see if the system has started from a cold boot or if it has awakened from sleeping. If it is a resume, the OTP configuration can select to either quickly jump into the resume path or to re-validate the IBB.

Root cause

Bootguard Profile in the FIT tool

The root cause of CVE-2020-8705 is that most OEM's select the Bootguard configuration in the Intel ME's OTP fuses to validate the hashes of the firmware flash only on a cold boot. Since this is the configured behavior, Intel initially dismissed the vulnerability as "functioning as designed" and closed the issue:

After review we have determined that Boot Guard is functioning as designed. Configuring Boot Guard to check every S3 resume is an available option to OEMs.

The OEMs selected the fast-resume option in the OTP fuses so that the wake-from-sleep is as fast as possible to meet Microsoft's logo requirements:

The S3 resume logo test uses the PwrTest tool to cycle the system through four ACPI S3 sleep transitions. The total resume time of each sleep transition is measured. The total resume time is validated to be less than 2000 milliseconds for each of the last three sleep transitions. Total resume time is measured as the sum of the BIOS Initialization and Driver Initialization phase durations.

The assumption is that the flash was checked during the initial boot, so that it is not necessary to check it during resume since it is a "ROM". This is a classic Time-of-check / Time-of-Use (TOCTOU) error, since the SPI flash is not read-only and can be altered after the initial boot. When the OEMs made the choice for faster boot there were not easy tools for TOCTOU against SPI flash, so it might not have seemed as important. Unfortunately the choices they made were burned in the OTP fuses, so it is not possible for them to now retroactively patch the systems without desoldering and replacing the PCH.

Additionally, Bootguard does recheck the flash every X resumes (12?), however the access pattern is different from a normal resume which makes it possible for a flash emulator to distinguish between the two resume paths.

There is a secondary vulnerability in that even if the firmware flash were validated, the Memory Type Range Registers (MTRR) that control the caches were not being correctly setup in the ACM, allowing an Execute-in-Place (XIP) attack against the resume process if the attacker had a TOCTOU capable flash emulator.

Mitigation

After initially dismissing it as not a bug, Intel reopened the original issue, possibly after pressure from OEMs. The biggest problem for the OEMs is that this Bootguard configuration is stored in OTP fuses in the Platform Controller Hub (PCH), which can only be written once and which are permanently programmed when the device exits manufacturing mode. OEMs that wanted to retroactively secure the millions of devices already in the field are not able to do so, since the OTP can not be changed.

As a result, Intel has built a new CSME firmware as part of INTEL-SA-00391. They don't disclose how they have fixed CVE-2020-8705 (along with a dozen others), although my assumption is that it does not send the actual fuses to the ACM, but instead tells it to always check the IBB in the flash. This patch allows the OEMs to update their devices without requiring hardware modifications, so it should improve the security of sleeping systems.

This workaround also points to a weakness in the OTP fuses - access to them is through the software running on the ME, so the x86 can not trust that they are receiving the actual one-time-programmed values. A code execution attack against the ME, such as Mark Ermolov's DMA attack against the CSME or Intel SA-00086, allow arbitrary a bypass of Bootguard configuration.

Proof of concept


spispy monitoring an Intel NUC	TOCTOU on a Lenovo X1 laptop

The CVE-2020-8705 vulnerability was detected and the proof of concept was demonstrated using the spispy open source SPI flash emulator, an open source hardware board with an ECP5 FPGA and 32MB DRAM that pretends to be a SPI flash.

The spispy can operate in a passive monitoring mode, which logs the addresses being accessed by the CPU, and these logs can be analyzed to find repeated reads. The spispy can also operate in an TOCTOU mode that allows it to replace the data being read at certain addresses based on the access patterns. It also has a full emulation mode that replaces all of the flash, which is useful for developing or testing new firmware since the DRAM can be updated much faster than the SPI NOR flash chips.

The vulnerability was detected in several systems as part of continuing research into firmware TOCTOU attacks, similar to the work with Peter Bosch on CVE-2019-11098. Tested systems included a Intel NUC 8 and a Lenovo Thinkpad X1 and T490, although it is likely to affect nearly all commodity systems with Intel CPUs.

Timeline

2020-01-24: Bootguard S3 resume vulnerability reported to Intel
2020-01-25: Intel responds that they are looking into it
2020-01-27: XIP in resume path reported to Intel
2020-01-27: Cache flush in resume path reported to Intel
2020-02-05: Intel responds that the XIP and cache issues are AMI's responsibility
2020-02-18: Intel closes Bootguard issue with "functioning as designed"
2020-03-03: Issue disclosed to OEMs since Intel did not consider it a bug
2020-04-07: Intel re-opens Bootguard issue (influenced by OEMs?)
2020-06-01: Intel sends updated ME firmware with fix in place for testing
2020-xx-xx: Intel agrees on November publication date
2020-11-10: Public disclosure and publication

External links

Hackers can use just-fixed Intel bugs to install malicious firmware on PCs (Dan Goodin, Ars Technica)
Serious Intel Boot Guard Exploit Leaves Unpatched PCs Vulnerable To Firmware Attacks (Nathan Ord, Hot Hardware)
Intel załatał kilka poważnych usterek. Pozwalały instalować złośliwe oprogramowanie (Jakub Krawczynski, dobreprogramy)
INTEL-SA-00391
Lenovo LEN-39432
Dell November 2020 Platform Update

2020 Security