Understanding TPM Sniffing Attacks
The Dolos Group published a detailed walk-through of how they extracted the TPM protected Bitlocker keys from a "stolen" laptop as part of a penetration test:
To recap, we took a locked down FDE laptop, sniffed the BitLocker decryption key coming out of the TPM, backdoored a virtualized image, and used its VPN auto-connect feature to attack the internal corporate network. [...] A pre-equipped attacker can perform this entire attack chain in less than 30 minutes with no soldering, simple and relatively cheap hardware, and publicly available tools. A process that places it squarely into Evil-Maid territory.
The laptop system that they were analyzing in this evaluation seemed to be very well configured: UEFI SecureBoot enabled, firmware password enabled, early IOMMU protection, USB and network interfaces locked down, hardware-backed full disk encryption. These configurations are similar to the recommendations in NIST 800-147 and the NSA UEFI SecureBoot customization guide, as well as the safeboot.dev installation instructions.
But the system they were examining had a fatal flaw in its configuration:
The laptop booted directly to Windows without any user intervention
Because of this, the Dolos group realized that the hardware-protected disk encryption key was unsealed by the TPM and sent to the x86 CPU during the boot process. By using a common logic analyzer on the the SPI bus, they observed that it was sent in the clear, and were able to then decrypt an image they had made of the laptop's drive. From there they were able to use secrets stored on the disk to connect to the VPN and to move horizontally through the customer's infrastructure.
Matthew Garrett posted a twitter thread that identified several fixes:
- Require a user password in addition to the TPM sealed key
- Use TPM parameter encryption to protect the secrets between the TPM and the x86
- Don't trust a machine just because it is on the VPN
- Store more keys on the TPM, like the VPN keys, so that a virtual copy can't use them
I would add a few additional changes:
- The user password should be for authorization of the TPM sealed secret so that dictionary attacks can be stopped by the TPM hardware.
- Prevent phishing attacks for the user authorization with
tpm2-totp
- Use
cpHash
andrpHash
authorization to ensure that a TPM interposer like the TPM Genie isn't modifying commands. - Case tamper switches should prevent a local attacker from easily making hardware changes.
- Using the Management Engine fTPM is slightly harder to tap than a SPI or i2c attached discrete TPM.
- Remote attestation should be used to verify the integrity of the system before allowing it to associate to the VPN.
Most of these features are already present in safeboot.dev tree and can be used today to make your Linux system slightly more secure. Read on for more details about each of these suggested changes and how they affect the threat model.
User authorization values
The primary failure in this case is succinctly summarized by @XMPPwocky's tweet :
Frog put the keys in a TPM.
"There", he said.
"Now the disk is encrypted when the machine is off."
"But we can turn the machine on and sniff the keys", said Toad.
"That is true", said Frog.
As Toad pointed out, when the machine boots, the TPM automatically provides the encryption key to the x86. Even if an attacker didn't have the hardware to sniff the keys on the bus, or they were faced with parameter encryption or tamper switches as described later, they could remove the TPM from the board and directly provide it with the PCR extension values and ask the TPM to unseal the key.
One approach would be to add a user password in addition to the TPM
sealed secret. If this password is used by the disk encryption software,
it would be subject to dictionary attacks, so a better approach that
a) wouldn't require an extended long passphrase, and
b) would be rate limited in hardware
would be to use a TPM authorization value with dalockout
enabled.
If the attacker attempts to brute-force the PIN, the TPM will shutdown
and refuse to unseal the secret.
There is also some support for detecting attacks that attempt to
power-down the TPM suddenly after a PIN trial to avoid the dictionary
lockout -- if the TPM startup command is received without having shutdown
cleanly, the dalockout
counter is also incremented.
TOTP local attestation
A very savvy attacker could disable UEFI SecureBoot, replace the booting kernel and initrd with one that just asks for the PIN and exfiltrates it once the user provides it, then re-enables all of the original SecureBoot configuration and pretends to crash. The unsuspecting user might just say "oh, that's weird" and put their PIN in a second time. To protect against this, the firmware can perform a local attestation with a TPM assisted Time-based One-Time Password -- before entering their PIN, the user can verify that the computer has produced the same 6-digit code as their authenticator app. The TPM will only perform the HMAC if the firmware is unmodified and it does not reveal the HMAC key, so it is not possible for an adversary to generate fake codes for arbitrary times in the future.
The downside to requiring user authorization is that the system is no
longer capable of unattended reboots. For a laptop this might be
acceptable, especially when combined with a local attestation like
tpm2-totp
, although for servers a Remote Attestation is necessary
to ensure that the stolen system is not being booted by an attacker.
Tamper Switches
Another way to make it more difficult for an attacker to sniff the TPM communication while booting is to enable tamper switches on the hardware. Lenovo's firmware can be configured to require an administrator password to boot at all after a tamper switch has been engaged. This is not perfect; older models stored the tamper state in the CMOS NVRAM, which is trivial to clear, while newer models store it in the Embedded Controller (EC), an external microcontroller that manages the hardware in the laptop. While the simpler EC can be somewhat more robust than the x86, Alex Matrosov demonstrated that the EC is not a security boundary, and that they create another attach surface that needs to be protected.
My Sleep attack also demonstrated a problem with the tamper switches on Lenovo systems that was combined with a Bootguard bypass: the switches and firmware signatures are only checked at cold-boot, so an adversary could gain code execution in Ring 0 and SMM during a resume from S3 sleep. This attack would not be detected until the next cold reboot, which for many laptops might be many months.
The tamper switches do make it more challenging for an adversary that wants to surreptitiously duplicate the system since bypassing them might require more extensive hardware modifications or damage the case, and a tamper switch tripping would alert the user that the system might have been compromised.
However, for an adversary who is planning on removing the TPM for external analysis does not care about the tamper switches. The TPM is not connected to the tamper switches directly, so it does not have any way to zeroize its contents when they are triggered.
fTPM vs dTPM
Dolos was able to sniff the TPM traffic since the system used an external SPI attached TPM, and they would have been able to do so for i2c or LPC attached ones as well since those protocols are fairly simple and slow.
An Intel Management Engine (ME) fTPM in the Platform Contoller Hub (PCH) is a little more complex since the HECI bus is faster and wider. Directly probing or tapping the TPM becomes even harder once the PCH is built into the CPU package and the SoC is soldered to the board. Starting with Broadwell and mobile Skylake, the Management Engine in the PCH is packaged with the CPU and the HECI bus they use for communication is entirely inside the package. This makes both tapping the bus significantly more difficult and also almost entirely removes the ability to extract the TPM for directly probing it.
However, the Management Engine is its own attack surface and trusting it
to keep secrets depends on your threat model. Some security experts
recommend turning it off with the me_cleaner
,
and Ermolov and Goryachy have
demonstrated several physical vulnerabilities
that provided code execution inside the ME,
is essentially game-over for platform security. In addition to
directly compromising the fTPM sealed secrets, the ME can disable
the Bootguard protections to bypass firmware signature checks, as
well as read host memory to locate secrets after they have been unsealed
from the discrete TPM.
TPM Parameter Encryption
One of Garrett's suggestions for discrete TPM devices is to enable
encrypted session to ensure that the parameters to the TPM and the reply
with the unsealed secrets are protected.
Unfortunately adding it to the closed source BitLocker is not possible for anyone
outside of Microsoft (although there might be a workaround using kexec
),
and even for open source projects the
tpm2-tools utilities
do not make it the easiest thing to do since not all commands support sessions.
Additionally, finding documentation on exactly what is protected
requires digging into the tpm2-tss
source code
or trying to parse the lengthy TCG specs.
Essentially the system generates a random value and uses a key derivation function to produce a session key, which it encrypts with the public part of a TPM primary key and sends that to the TPM when an encrypted session is started with:
tpm2 startauthsession --key-context primary.ctx
However, there is now a chicken-and-egg problem for establishing trust
in the primary key: how does the system know that the primary key came
from the real TPM? The easy, yet not quite right answer is that the system
can encrypt a challenge with the Endorsement Key's public part, and then
use tpm2 activatecredential
to verify that the primary key was generated
by that TPM.
This of course then leads to the followup question: how does the software trust that the Endorsement Key came from the TPM in the system? The EK is signed by the TPM OEM, so the firmware could validate this on each boot, and if the Bootguard root of trust holds, then the EK might be trustworthy.
But... this only provides protection if a user authorization is also included. Without that, an adversary that plans to remove the TPM and send it the PCR extensions directly to unseal the secret has no need to care about these keys. The PCR values are not secret since the adversary can predict them based on a firmware dump, so unattended booting must not be allowed since there is no where to store secrets on the x86.
Remote Attestation
Since there is nowhere local to store secrets on the x86, the natural approach is to store them elsewhere and perform a remote attestation to retrieve the secrets. The TPM can sign a quote that includes the firmware contents, the firmware configuration, the bootloader, etc and the remote attestation server can decide if this is a trustworthy machine to allow to boot. Stolen machines can be de-authorized, suspicious activities like numerous reboots can be logged, and unattended reboots can be permitted. This works well for servers that are on a consistent network, and can be made to work for laptops using more full-featured firmware like Heads and bootloaders like safeboot.
This technique can even kexec
into Windows 10, allowing the Bitlocker keys to be remotely provisioned to a server or laptop
through the safeboot loader, without the Bitlocker key even touching the disk
and without requiring any changes to the Windows bootloader.
tl;dr
If physical attacks are in your threat model, using the TPM can help prevent many of them or make them significantly more costly for an adversary. There are no perfect solutions, only solutions that address specific risks and you must decide if the remaining risks are acceptable to your organization.