In this article I will show you how to start your current operating system
inside a virtual machine. That is: launching the operating system (with all
your settings, files, and everything), inside a virtual machine, while you’re
using it.
This article was written for Ubuntu, but it can be easily adapted to other
distributions, and with appropriate care it can be adapted to non-Linux kernels
and operating systems as well.
Motivation
Before we start, why would a sane person want to do this in the first place?
Well, here’s why I did it:
-
To test changes that affect Secure Boot without a reboot.
Recently I was doing some experiments with Secure Boot and the Trusted
Platform Module (TPM) on a new laptop, and I got frustrated by how time
consuming it was to test changes to the boot chain. Every time I modified a
file involved during boot, I would need to reboot, then log in, then
re-open my terminal windows and files to make more modifications… Plus,
whenever I screwed up, I would need to manually recover my system, which
would be even more time consuming.
I thought that I could speed up my experiments by using a virtual machine
instead.
-
To predict the future TPM state (in particular, the values of PCRs 4, 5,
8, and 9) after a change, without a reboot.
I wanted to predict the values of my TPM PCR banks after making changes to
the bootloader, kernel, and initrd. Writing a script to calculate the PCR
values automatically is in principle not that hard (and I actually did it
before, in a different context), but I wanted a robust, generic solution
that would work on most systems and in most situations, and emulation was
the natural choice.
-
And, of course, just for the fun of it!
To be honest, I’m not a big fan of Secure Boot. The reason why I’ve been
working on it is simply that it’s the standard nowadays and so I have to stick
with it. Also, there are no real alternatives out there to achieve the same
goals. I’ll write an article about Secure Boot in the future to explain the
reasons why I don’t like it, and how to make it work better, but that’s another
story…
Procedure
The procedure that I’m going to describe has 3 main steps:
- create a copy of your drive
- emulate a TPM device using swtpm
- emulate the system with QEMU
I’ve tested this procedure on Ubuntu 23.04 (Lunar) and 23.10 (Mantic), but it
should work on any Linux distribution with minimal adjustments. The general
approach can be used for any operating system, as long as appropriate
replacements for QEMU and swtpm exist.
Prerequisites
Before we can start, we need to install:
- QEMU: a virtual machine emulator
- swtpm: a TPM emulator
- OVMF: a UEFI firmware implementation
On a recent version of Ubuntu, these can be installed with:
sudo apt install qemu-system-x86 ovmf swtpm
Note that OVMF only supports the x86_64 architecture, so we can only emulate
that. If you run a different architecture, you’ll need to find another UEFI
implementation that is not OVMF (but I’m not aware of any freely available
ones).
Create a copy of your drive
We can decide to either:
-
Choice #1: run only the components involved early at
boot (shim, bootloader, kernel, initrd). This
is useful if you, like me, only need to test those components and how they
affect Secure Boot and the TPM, and don’t really care about the rest (the
init process, login manager, …).
-
Choice #2: run the entire operating system. This can
give you a fully usable operating system running inside the virtual machine,
but may also result in some instability inside the guest (because we’re
giving it a filesystem that is in use), and may also lead to some data loss
if we’re not careful and make typos. Use with care!
Choice #1: Early boot components only
If we’re interested in the early boot components only, then we need to make a
copy the following from our drive: the GPT partition table, the EFI partition,
and the /boot
partition (if we have one). Usually all these 3 pieces are at
the “start” of the drive, but this is not always the case.
To figure out where the partitions are located, run:
On my system, this is the output:
Model: WD_BLACK SN750 2TB (nvme)
Disk /dev/nvme0n1: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 525MB 524MB fat32 boot, esp
2 525MB 1599MB 1074MB ext4
3 1599MB 2000GB 1999GB lvm
In my case, the partition number 1 is the EFI partition, and the partition
number 2 is the /boot
partition. If you’re not sure what partitions to look
for, run mount | grep -e /boot -e /efi
. Note that, on some distributions
(most notably the ones that use systemd-boot
), a /boot
partition may not
exist, so you can leave that out in that case.
Anyway, in my case, I need to copy the first 1599 MB of my drive, because
that’s where the data I’m interested in ends: those first 1599 MB contain the
GPT partition table (which is always at the start of the drive), the EFI
partition, and the /boot
partition.
Now that we have identified how many bytes to copy, we can copy them to a file
named drive.img
with dd
(maybe after running sync
to make sure that all
changes have been committed):
# replace '/dev/nvme0n1' with your main drive (which may be '/dev/sda' instead),
# and 'count' with the number of MBs to copy
sync && sudo -g disk dd if=/dev/nvme0n1 of=drive.img bs=1M count=1599 conv=sparse
Choice #2: Entire system
If we want to run our entire system in a virtual machine, then I would
recommend creating a QEMU copy-on-write (COW) file:
# replace '/dev/nvme0n1' with your main drive (which may be '/dev/sda' instead)
sudo -g disk qemu-img create -f qcow2 -b /dev/nvme0n1 -F raw drive.qcow2
This will create a new copy-on-write image using /dev/nvme0n1
as its “backing
storage”. Be very careful when running this command: you don’t want to mess up
the order of the arguments, or you might end up writing to your storage device
(leading to data loss)!
The advantage of using a copy-on-write file, as opposed to copying the whole
drive, is that this is much faster. Also, if we had to copy the entire drive,
we might not even have enough space for it (even when using sparse files).
The big drawback of using a copy-on-write file is that, because our main drive
likely contains filesystems that are mounted read-write, any modification to
the filesystems on the host may be perceived as data corruption on the guest,
and that in turn may cause all sort of bad consequences inside the guest,
including kernel panics.
Another drawback is that, with this solution, later we will need to give QEMU
permission to read our drive, and if we’re not careful enough with the
commands we type (e.g. we swap the order of some arguments, or make some
typos), we may potentially end up writing to the drive instead.
Emulate a TPM device using swtpm
There are various ways to run the swtpm emulator. Here I will use the “vTPM
proxy” way, which is not the easiest, but has the advantage that the emulated
device will look like a real TPM device not only to the guest, but also to the
host, so that we can inspect its PCR banks (among other things) from the host
using familiar tools like tpm2_pcrread
.
First, enable the tpm_vtpm_proxy
module (which is not enabled by default on
Ubuntu):
sudo modprobe tpm_vtpm_proxy
If that worked, we should have a /dev/vtpmx
device. We can verify its
presence with:
swtpm in “vTPM proxy” mode will interact with /dev/vtpmx
, but in order to do
so it needs the sys_admin
capability. On Ubuntu, swtpm ships with this
capability explicitly disabled by AppArmor, but we can enable it with:
sudo sh -c "echo ' capability sys_admin,' > /etc/apparmor.d/local/usr.bin.swtpm"
systemctl reload apparmor
Now that /dev/vtpmx
is present, and swtpm can talk to it, we can run swtpm
in “vTPM proxy” mode:
sudo mkdir /tpm/swtpm-state
sudo swtpm chardev --tpmstate dir=/tmp/swtpm-state --vtpm-proxy --tpm2
Upon start, swtpm should create a new /dev/tpmN
device and print its name on
the terminal. On my system, I already have a real TPM on /dev/tpm0
, and
therefore swtpm allocates /dev/tpm1
.
The emulated TPM device will need to be readable and writeable by QEMU, but the
emulated TPM device is by default accessible only by root, so either we run
QEMU as root (not recommended), or we relax the permissions on the device:
# replace '/dev/tpm1' with the device created by swtpm
sudo chmod a+rw /dev/tpm1
Make sure not to accidentally change the permissions of your real TPM device!
Emulate the system with QEMU
Inside the QEMU emulator, we will run the OVMF UEFI firmware. On Ubuntu, the
firmware comes in 2 flavors:
- with Secure Boot enabled (
/usr/share/OVMF/OVMF_CODE_4M.ms.fd
), and
- with Secure Boot disabled (in
/usr/share/OVMF/OVMF_CODE_4M.fd
)
(There are actually even more flavors, see this AskUbuntu
question for the details.)
In the commands that follow I’m going to use the Secure Boot flavor, but if you
need to disable Secure Boot in your guest, just replace .ms.fd
with .fd
in
all the commands below.
To use OVMF, first we need to copy the EFI variables to a file that can be read
& written by QEMU:
cp /usr/share/OVMF/OVMF_VARS_4M.ms.fd /tmp/
This file (/tmp/OVMF_VARS_4M.ms.fd
) will be the equivalent of the EFI flash
storage, and it’s where OVMF will read and store its configuration, which is
why we need to make a copy of it (to avoid modifications to the original file).
Now we’re ready to run QEMU:
-
If you copied only the early boot files (choice
#1):
# replace '/dev/tpm1' with the device created by swtpm
qemu-system-x86_64 \
-accel kvm \
-machine q35,smm=on \
-cpu host \
-smp cores=4,threads=1 \
-m 4096 \
-vga virtio \
-bios /usr/share/ovmf/OVMF.fd \
-drive if=pflash,unit=0,format=raw,file=/usr/share/OVMF/OVMF_CODE_4M.ms.fd,readonly=on \
-drive if=pflash,unit=1,format=raw,file=/tmp/OVMF_VARS_4M.ms.fd \
-drive if=virtio,format=raw,file=drive.img \
-tpmdev passthrough,id=tpm0,path=/dev/tpm1,cancel-path=/dev/null \
-device tpm-tis,tpmdev=tpm0
-
If you have a copy-on-write file for the entire system (choice
#2):
# replace '/dev/tpm1' with the device created by swtpm
sudo -g disk qemu-system-x86_64 \
-accel kvm \
-machine q35,smm=on \
-cpu host \
-smp cores=4,threads=1 \
-m 4096 \
-vga virtio \
-bios /usr/share/ovmf/OVMF.fd \
-drive if=pflash,unit=0,format=raw,file=/usr/share/OVMF/OVMF_CODE_4M.ms.fd,readonly=on \
-drive if=pflash,unit=1,format=raw,file=/tmp/OVMF_VARS_4M.ms.fd \
-drive if=virtio,format=qcow2,file=drive.qcow2 \
-tpmdev passthrough,id=tpm0,path=/dev/tpm1,cancel-path=/dev/null \
-device tpm-tis,tpmdev=tpm0
Note that this last command makes QEMU run as the disk
group: on Ubuntu,
this group has the permission to read and write all storage devices, so
be careful when running this command, or you risk losing your files
forever! If you want to add more safety, you may consider using an
ACL to
give the user running QEMU read-only permission to your backing storage.
In either case, after launching QEMU, our operating system should boot…
while running inside itself!
In some circumstances though it may happen that the wrong operating system is
booted, or that you end up at the EFI setup screen. This can happen if your
system is not configured to boot from the “first” EFI entry listed in the EFI
partition. Because the boot order is not recorded anywhere on the storage
device (it’s recorded in the EFI flash memory), of course OVMF won’t know which
operating system you intended to boot, and will just attempt to launch the
first one it finds. You can use the EFI setup screen provided by OVMF to change
the boot order in the way you like. After that, changes will be saved into the
/tmp/OVMF_VARS_4M.ms.fd
file on the host: you should keep a copy of that file
so that, next time you launch QEMU, you’ll boot directly into your operating
system.
Reading PCR banks after boot
Once our operating system has launched inside QEMU, and after the boot process
is complete, the PCR banks will be filled and recorded by swtpm.
If we choose to copy only the early boot files (choice
#1), then of course our operating system won’t be
fully booted: it’ll likely hang waiting for the root filesystem to appear,
and may eventually drop to the initrd shell. None of that really matters if all
we want is to see the PCR values stored by the bootloader.
Before we can extract those PCR values, we first need to stop QEMU (Ctrl-C is
fine), and then we can read it with tpm2_pcrread
:
# replace '/dev/tpm1' with the device created by swtpm
tpm2_pcrread -T device:/dev/tpm1
Using the method described here in this article, PCRs 4, 5, 8, and 9 inside the
emulated TPM should match the PCRs in our real TPM. And here comes an
interesting application of this method: if we upgrade our bootloader or kernel,
and we want to know the future PCR values that our system will have after
reboot, we can simply follow this procedure and obtain those PCR values without
shutting down our system! This can be especially useful if we use TPM sealing:
we can reseal our secrets and make them unsealable at the next reboot without
trouble.
Restarting the virtual machine
If we want to restart the guest inside the virtual machine, and obtain a
consistent TPM state every time, we should start from a “clean” state every
time, which means:
- restart swtpm
- recreate the
drive.img
or drive.qcow2
file
- launch QEMU again
If we don’t restart swtpm, the virtual TPM state (and in particular the PCR
banks) won’t be cleared, and new PCR measurements will simply be added on top
of the existing state. If we don’t recreate the drive file, it’s possible that
some modifications to the filesystems will have an impact on the future PCR
measurements.
We don’t necessarily need to recreate the /tmp/OVMF_VARS_4M.ms.fd
file every
time. In fact, if you need to modify any EFI setting to make your system
bootable, you might want to preserve it so that you don’t need to change EFI
settings at every boot.
Automating the entire process
I’m (very slowly) working on turning this entire procedure into a script, so
that everything can be automated. Once I find some time I’ll finish the script
and publish it, so if you liked this article, stay tuned, and let me know if
you have any comment/suggestion/improvement/critique!