Understanding TPM Sniffing Attacks

The Dolos Group published a detailed walk-through of how they extracted the TPM protected Bitlocker keys from a "stolen" laptop as part of a penetration test:

To recap, we took a locked down FDE laptop, sniffed the BitLocker decryption key coming out of the TPM, backdoored a virtualized image, and used its VPN auto-connect feature to attack the internal corporate network. [...] A pre-equipped attacker can perform this entire attack chain in less than 30 minutes with no soldering, simple and relatively cheap hardware, and publicly available tools. A process that places it squarely into Evil-Maid territory.

The laptop system that they were analyzing in this evaluation seemed to be very well configured: UEFI SecureBoot enabled, firmware password enabled, early IOMMU protection, USB and network interfaces locked down, hardware-backed full disk encryption. These configurations are similar to the recommendations in NIST 800-147 and the NSA UEFI SecureBoot customization guide, as well as the safeboot.dev installation instructions.

But the system they were examining had a fatal flaw in its configuration:

The laptop booted directly to Windows without any user intervention

Because of this, the Dolos group realized that the hardware-protected disk encryption key was unsealed by the TPM and sent to the x86 CPU during the boot process. By using a common logic analyzer on the the SPI bus, they observed that it was sent in the clear, and were able to then decrypt an image they had made of the laptop's drive. From there they were able to use secrets stored on the disk to connect to the VPN and to move horizontally through the customer's infrastructure.

Matthew Garrett posted a twitter thread that identified several fixes:

Require a user password in addition to the TPM sealed key
Use TPM parameter encryption to protect the secrets between the TPM and the x86
Don't trust a machine just because it is on the VPN
Store more keys on the TPM, like the VPN keys, so that a virtual copy can't use them

I would add a few additional changes:

The user password should be for authorization of the TPM sealed secret so that dictionary attacks can be stopped by the TPM hardware.
Prevent phishing attacks for the user authorization with tpm2-totp
Use cpHash and rpHash authorization to ensure that a TPM interposer like the TPM Genie isn't modifying commands.
Case tamper switches should prevent a local attacker from easily making hardware changes.
Using the Management Engine fTPM is slightly harder to tap than a SPI or i2c attached discrete TPM.
Remote attestation should be used to verify the integrity of the system before allowing it to associate to the VPN.

Most of these features are already present in safeboot.dev tree and can be used today to make your Linux system slightly more secure. Read on for more details about each of these suggested changes and how they affect the threat model.

User authorization values

The primary failure in this case is succinctly summarized by @XMPPwocky's tweet :

Frog put the keys in a TPM.
"There", he said.
"Now the disk is encrypted when the machine is off."
"But we can turn the machine on and sniff the keys", said Toad.
"That is true", said Frog.

As Toad pointed out, when the machine boots, the TPM automatically provides the encryption key to the x86. Even if an attacker didn't have the hardware to sniff the keys on the bus, or they were faced with parameter encryption or tamper switches as described later, they could remove the TPM from the board and directly provide it with the PCR extension values and ask the TPM to unseal the key.

One approach would be to add a user password in addition to the TPM sealed secret. If this password is used by the disk encryption software, it would be subject to dictionary attacks, so a better approach that a) wouldn't require an extended long passphrase, and b) would be rate limited in hardware would be to use a TPM authorization value with dalockout enabled. If the attacker attempts to brute-force the PIN, the TPM will shutdown and refuse to unseal the secret. There is also some support for detecting attacks that attempt to power-down the TPM suddenly after a PIN trial to avoid the dictionary lockout -- if the TPM startup command is received without having shutdown cleanly, the dalockout counter is also incremented.

TOTP local attestation

A very savvy attacker could disable UEFI SecureBoot, replace the booting kernel and initrd with one that just asks for the PIN and exfiltrates it once the user provides it, then re-enables all of the original SecureBoot configuration and pretends to crash. The unsuspecting user might just say "oh, that's weird" and put their PIN in a second time. To protect against this, the firmware can perform a local attestation with a TPM assisted Time-based One-Time Password -- before entering their PIN, the user can verify that the computer has produced the same 6-digit code as their authenticator app. The TPM will only perform the HMAC if the firmware is unmodified and it does not reveal the HMAC key, so it is not possible for an adversary to generate fake codes for arbitrary times in the future.

The downside to requiring user authorization is that the system is no longer capable of unattended reboots. For a laptop this might be acceptable, especially when combined with a local attestation like tpm2-totp, although for servers a Remote Attestation is necessary to ensure that the stolen system is not being booted by an attacker.

Tamper Switches

Another way to make it more difficult for an attacker to sniff the TPM communication while booting is to enable tamper switches on the hardware. Lenovo's firmware can be configured to require an administrator password to boot at all after a tamper switch has been engaged. This is not perfect; older models stored the tamper state in the CMOS NVRAM, which is trivial to clear, while newer models store it in the Embedded Controller (EC), an external microcontroller that manages the hardware in the laptop. While the simpler EC can be somewhat more robust than the x86, Alex Matrosov demonstrated that the EC is not a security boundary, and that they create another attach surface that needs to be protected.

My Sleep attack also demonstrated a problem with the tamper switches on Lenovo systems that was combined with a Bootguard bypass: the switches and firmware signatures are only checked at cold-boot, so an adversary could gain code execution in Ring 0 and SMM during a resume from S3 sleep. This attack would not be detected until the next cold reboot, which for many laptops might be many months.

The tamper switches do make it more challenging for an adversary that wants to surreptitiously duplicate the system since bypassing them might require more extensive hardware modifications or damage the case, and a tamper switch tripping would alert the user that the system might have been compromised.

However, for an adversary who is planning on removing the TPM for external analysis does not care about the tamper switches. The TPM is not connected to the tamper switches directly, so it does not have any way to zeroize its contents when they are triggered.

fTPM vs dTPM

Dolos was able to sniff the TPM traffic since the system used an external SPI attached TPM, and they would have been able to do so for i2c or LPC attached ones as well since those protocols are fairly simple and slow.

An Intel Management Engine (ME) fTPM in the Platform Contoller Hub (PCH) is a little more complex since the HECI bus is faster and wider. Directly probing or tapping the TPM becomes even harder once the PCH is built into the CPU package and the SoC is soldered to the board. Starting with Broadwell and mobile Skylake, the Management Engine in the PCH is packaged with the CPU and the HECI bus they use for communication is entirely inside the package. This makes both tapping the bus significantly more difficult and also almost entirely removes the ability to extract the TPM for directly probing it.

However, the Management Engine is its own attack surface and trusting it to keep secrets depends on your threat model. Some security experts recommend turning it off with the me_cleaner, and Ermolov and Goryachy have demonstrated several physical vulnerabilities that provided code execution inside the ME, is essentially game-over for platform security. In addition to directly compromising the fTPM sealed secrets, the ME can disable the Bootguard protections to bypass firmware signature checks, as well as read host memory to locate secrets after they have been unsealed from the discrete TPM.

TPM Parameter Encryption

screenshot of the session key establishment protocol

One of Garrett's suggestions for discrete TPM devices is to enable encrypted session to ensure that the parameters to the TPM and the reply with the unsealed secrets are protected. Unfortunately adding it to the closed source BitLocker is not possible for anyone outside of Microsoft (although there might be a workaround using kexec), and even for open source projects the tpm2-tools utilities do not make it the easiest thing to do since not all commands support sessions. Additionally, finding documentation on exactly what is protected requires digging into the tpm2-tss source code or trying to parse the lengthy TCG specs.

Essentially the system generates a random value and uses a key derivation function to produce a session key, which it encrypts with the public part of a TPM primary key and sends that to the TPM when an encrypted session is started with:

tpm2 startauthsession --key-context primary.ctx

However, there is now a chicken-and-egg problem for establishing trust in the primary key: how does the system know that the primary key came from the real TPM? The easy, yet not quite right answer is that the system can encrypt a challenge with the Endorsement Key's public part, and then use tpm2 activatecredential to verify that the primary key was generated by that TPM.

This of course then leads to the followup question: how does the software trust that the Endorsement Key came from the TPM in the system? The EK is signed by the TPM OEM, so the firmware could validate this on each boot, and if the Bootguard root of trust holds, then the EK might be trustworthy.

But... this only provides protection if a user authorization is also included. Without that, an adversary that plans to remove the TPM and send it the PCR extensions directly to unseal the secret has no need to care about these keys. The PCR values are not secret since the adversary can predict them based on a firmware dump, so unattended booting must not be allowed since there is no where to store secrets on the x86.

Remote Attestation

Since there is nowhere local to store secrets on the x86, the natural approach is to store them elsewhere and perform a remote attestation to retrieve the secrets. The TPM can sign a quote that includes the firmware contents, the firmware configuration, the bootloader, etc and the remote attestation server can decide if this is a trustworthy machine to allow to boot. Stolen machines can be de-authorized, suspicious activities like numerous reboots can be logged, and unattended reboots can be permitted. This works well for servers that are on a consistent network, and can be made to work for laptops using more full-featured firmware like Heads and bootloaders like safeboot.

Windows 10 kexec'ed from Linux

This technique can even kexec into Windows 10, allowing the Bitlocker keys to be remotely provisioned to a server or laptop through the safeboot loader, without the Bitlocker key even touching the disk and without requiring any changes to the Windows bootloader.

tl;dr

Your threat model is not my threat model

If physical attacks are in your threat model, using the TPM can help prevent many of them or make them significantly more costly for an adversary. There are no perfect solutions, only solutions that address specific risks and you must decide if the remaining risks are acceptable to your organization.

2021 Security