Skip to content

Commit f9a705a

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2 parents 9313f80 + 29cf0f5 commit f9a705a

119 files changed

Lines changed: 8444 additions & 4976 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/virt/kvm/api.rst

Lines changed: 207 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4498,11 +4498,14 @@ Currently, the following list of CPUID leaves are returned:
44984498
- HYPERV_CPUID_ENLIGHTMENT_INFO
44994499
- HYPERV_CPUID_IMPLEMENT_LIMITS
45004500
- HYPERV_CPUID_NESTED_FEATURES
4501+
- HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS
4502+
- HYPERV_CPUID_SYNDBG_INTERFACE
4503+
- HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES
45014504

45024505
HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
45034506
enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
45044507

4505-
Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
4508+
Userspace invokes KVM_GET_SUPPORTED_HV_CPUID by passing a kvm_cpuid2 structure
45064509
with the 'nent' field indicating the number of entries in the variable-size
45074510
array 'entries'. If the number of entries is too low to describe all Hyper-V
45084511
feature leaves, an error (E2BIG) is returned. If the number is more or equal
@@ -4704,6 +4707,106 @@ KVM_PV_VM_VERIFY
47044707
Verify the integrity of the unpacked image. Only if this succeeds,
47054708
KVM is allowed to start protected VCPUs.
47064709

4710+
4.126 KVM_X86_SET_MSR_FILTER
4711+
----------------------------
4712+
4713+
:Capability: KVM_X86_SET_MSR_FILTER
4714+
:Architectures: x86
4715+
:Type: vm ioctl
4716+
:Parameters: struct kvm_msr_filter
4717+
:Returns: 0 on success, < 0 on error
4718+
4719+
::
4720+
4721+
struct kvm_msr_filter_range {
4722+
#define KVM_MSR_FILTER_READ (1 << 0)
4723+
#define KVM_MSR_FILTER_WRITE (1 << 1)
4724+
__u32 flags;
4725+
__u32 nmsrs; /* number of msrs in bitmap */
4726+
__u32 base; /* MSR index the bitmap starts at */
4727+
__u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */
4728+
};
4729+
4730+
#define KVM_MSR_FILTER_MAX_RANGES 16
4731+
struct kvm_msr_filter {
4732+
#define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0)
4733+
#define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0)
4734+
__u32 flags;
4735+
struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES];
4736+
};
4737+
4738+
flags values for ``struct kvm_msr_filter_range``:
4739+
4740+
``KVM_MSR_FILTER_READ``
4741+
4742+
Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap
4743+
indicates that a read should immediately fail, while a 1 indicates that
4744+
a read for a particular MSR should be handled regardless of the default
4745+
filter action.
4746+
4747+
``KVM_MSR_FILTER_WRITE``
4748+
4749+
Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap
4750+
indicates that a write should immediately fail, while a 1 indicates that
4751+
a write for a particular MSR should be handled regardless of the default
4752+
filter action.
4753+
4754+
``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE``
4755+
4756+
Filter both read and write accesses to MSRs using the given bitmap. A 0
4757+
in the bitmap indicates that both reads and writes should immediately fail,
4758+
while a 1 indicates that reads and writes for a particular MSR are not
4759+
filtered by this range.
4760+
4761+
flags values for ``struct kvm_msr_filter``:
4762+
4763+
``KVM_MSR_FILTER_DEFAULT_ALLOW``
4764+
4765+
If no filter range matches an MSR index that is getting accessed, KVM will
4766+
fall back to allowing access to the MSR.
4767+
4768+
``KVM_MSR_FILTER_DEFAULT_DENY``
4769+
4770+
If no filter range matches an MSR index that is getting accessed, KVM will
4771+
fall back to rejecting access to the MSR. In this mode, all MSRs that should
4772+
be processed by KVM need to explicitly be marked as allowed in the bitmaps.
4773+
4774+
This ioctl allows user space to define up to 16 bitmaps of MSR ranges to
4775+
specify whether a certain MSR access should be explicitly filtered for or not.
4776+
4777+
If this ioctl has never been invoked, MSR accesses are not guarded and the
4778+
default KVM in-kernel emulation behavior is fully preserved.
4779+
4780+
Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR
4781+
filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes
4782+
an error.
4783+
4784+
As soon as the filtering is in place, every MSR access is processed through
4785+
the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff);
4786+
x2APIC MSRs are always allowed, independent of the ``default_allow`` setting,
4787+
and their behavior depends on the ``X2APIC_ENABLE`` bit of the APIC base
4788+
register.
4789+
4790+
If a bit is within one of the defined ranges, read and write accesses are
4791+
guarded by the bitmap's value for the MSR index if the kind of access
4792+
is included in the ``struct kvm_msr_filter_range`` flags. If no range
4793+
cover this particular access, the behavior is determined by the flags
4794+
field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW``
4795+
and ``KVM_MSR_FILTER_DEFAULT_DENY``.
4796+
4797+
Each bitmap range specifies a range of MSRs to potentially allow access on.
4798+
The range goes from MSR index [base .. base+nmsrs]. The flags field
4799+
indicates whether reads, writes or both reads and writes are filtered
4800+
by setting a 1 bit in the bitmap for the corresponding MSR index.
4801+
4802+
If an MSR access is not permitted through the filtering, it generates a
4803+
#GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that
4804+
allows user space to deflect and potentially handle various MSR accesses
4805+
into user space.
4806+
4807+
If a vCPU is in running state while this ioctl is invoked, the vCPU may
4808+
experience inconsistent filtering behavior on MSR accesses.
4809+
47074810

47084811
5. The kvm_run structure
47094812
========================
@@ -4869,14 +4972,13 @@ to the byte array.
48694972

48704973
.. note::
48714974

4872-
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
4873-
KVM_EXIT_EPR the corresponding
4874-
4875-
operations are complete (and guest state is consistent) only after userspace
4876-
has re-entered the kernel with KVM_RUN. The kernel side will first finish
4877-
incomplete operations and then check for pending signals. Userspace
4878-
can re-enter the guest with an unmasked signal pending to complete
4879-
pending operations.
4975+
For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR,
4976+
KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
4977+
operations are complete (and guest state is consistent) only after userspace
4978+
has re-entered the kernel with KVM_RUN. The kernel side will first finish
4979+
incomplete operations and then check for pending signals. Userspace
4980+
can re-enter the guest with an unmasked signal pending to complete
4981+
pending operations.
48804982

48814983
::
48824984

@@ -5163,6 +5265,44 @@ Note that KVM does not skip the faulting instruction as it does for
51635265
KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
51645266
if it decides to decode and emulate the instruction.
51655267

5268+
::
5269+
5270+
/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
5271+
struct {
5272+
__u8 error; /* user -> kernel */
5273+
__u8 pad[7];
5274+
__u32 reason; /* kernel -> user */
5275+
__u32 index; /* kernel -> user */
5276+
__u64 data; /* kernel <-> user */
5277+
} msr;
5278+
5279+
Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
5280+
enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
5281+
will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
5282+
exit for writes.
5283+
5284+
The "reason" field specifies why the MSR trap occurred. User space will only
5285+
receive MSR exit traps when a particular reason was requested during through
5286+
ENABLE_CAP. Currently valid exit reasons are:
5287+
5288+
KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM
5289+
KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits
5290+
KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER
5291+
5292+
For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
5293+
wants to read. To respond to this request with a successful read, user space
5294+
writes the respective data into the "data" field and must continue guest
5295+
execution to ensure the read data is transferred into guest register state.
5296+
5297+
If the RDMSR request was unsuccessful, user space indicates that with a "1" in
5298+
the "error" field. This will inject a #GP into the guest when the VCPU is
5299+
executed again.
5300+
5301+
For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest
5302+
wants to write. Once finished processing the event, user space must continue
5303+
vCPU execution. If the MSR write was unsuccessful, user space also sets the
5304+
"error" field to "1".
5305+
51665306
::
51675307

51685308
/* Fix the size of the union. */
@@ -5852,6 +5992,28 @@ controlled by the kvm module parameter halt_poll_ns. This capability allows
58525992
the maximum halt time to specified on a per-VM basis, effectively overriding
58535993
the module parameter for the target VM.
58545994

5995+
7.21 KVM_CAP_X86_USER_SPACE_MSR
5996+
-------------------------------
5997+
5998+
:Architectures: x86
5999+
:Target: VM
6000+
:Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report
6001+
:Returns: 0 on success; -1 on error
6002+
6003+
This capability enables trapping of #GP invoking RDMSR and WRMSR instructions
6004+
into user space.
6005+
6006+
When a guest requests to read or write an MSR, KVM may not implement all MSRs
6007+
that are relevant to a respective system. It also does not differentiate by
6008+
CPU type.
6009+
6010+
To allow more fine grained control over MSR handling, user space may enable
6011+
this capability. With it enabled, MSR accesses that match the mask specified in
6012+
args[0] and trigger a #GP event inside the guest by KVM will instead trigger
6013+
KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
6014+
can then handle to implement model specific MSR handling and/or user notifications
6015+
to inform a user that an MSR was not handled.
6016+
58556017
8. Other capabilities.
58566018
======================
58576019

@@ -6193,3 +6355,39 @@ distribution...)
61936355

61946356
If this capability is available, then the CPNC and CPVC can be synchronized
61956357
between KVM and userspace via the sync regs mechanism (KVM_SYNC_DIAG318).
6358+
6359+
8.26 KVM_CAP_X86_USER_SPACE_MSR
6360+
-------------------------------
6361+
6362+
:Architectures: x86
6363+
6364+
This capability indicates that KVM supports deflection of MSR reads and
6365+
writes to user space. It can be enabled on a VM level. If enabled, MSR
6366+
accesses that would usually trigger a #GP by KVM into the guest will
6367+
instead get bounced to user space through the KVM_EXIT_X86_RDMSR and
6368+
KVM_EXIT_X86_WRMSR exit notifications.
6369+
6370+
8.25 KVM_X86_SET_MSR_FILTER
6371+
---------------------------
6372+
6373+
:Architectures: x86
6374+
6375+
This capability indicates that KVM supports that accesses to user defined MSRs
6376+
may be rejected. With this capability exposed, KVM exports new VM ioctl
6377+
KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR
6378+
ranges that KVM should reject access to.
6379+
6380+
In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
6381+
trap and emulate MSRs that are outside of the scope of KVM as well as
6382+
limit the attack surface on KVM's MSR emulation code.
6383+
6384+
6385+
8.26 KVM_CAP_ENFORCE_PV_CPUID
6386+
-----------------------------
6387+
6388+
Architectures: x86
6389+
6390+
When enabled, KVM will disable paravirtual features provided to the
6391+
guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
6392+
(0x40000001). Otherwise, a guest may use the paravirtual features
6393+
regardless of what has actually been exposed through the CPUID leaf.

Documentation/virt/kvm/cpuid.rst

Lines changed: 44 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -38,64 +38,64 @@ returns::
3838

3939
where ``flag`` is defined as below:
4040

41-
================================= =========== ================================
42-
flag value meaning
43-
================================= =========== ================================
44-
KVM_FEATURE_CLOCKSOURCE 0 kvmclock available at msrs
45-
0x11 and 0x12
41+
================================== =========== ================================
42+
flag value meaning
43+
================================== =========== ================================
44+
KVM_FEATURE_CLOCKSOURCE 0 kvmclock available at msrs
45+
0x11 and 0x12
4646

47-
KVM_FEATURE_NOP_IO_DELAY 1 not necessary to perform delays
48-
on PIO operations
47+
KVM_FEATURE_NOP_IO_DELAY 1 not necessary to perform delays
48+
on PIO operations
4949

50-
KVM_FEATURE_MMU_OP 2 deprecated
50+
KVM_FEATURE_MMU_OP 2 deprecated
5151

52-
KVM_FEATURE_CLOCKSOURCE2 3 kvmclock available at msrs
53-
0x4b564d00 and 0x4b564d01
52+
KVM_FEATURE_CLOCKSOURCE2 3 kvmclock available at msrs
53+
0x4b564d00 and 0x4b564d01
5454

55-
KVM_FEATURE_ASYNC_PF 4 async pf can be enabled by
56-
writing to msr 0x4b564d02
55+
KVM_FEATURE_ASYNC_PF 4 async pf can be enabled by
56+
writing to msr 0x4b564d02
5757

58-
KVM_FEATURE_STEAL_TIME 5 steal time can be enabled by
59-
writing to msr 0x4b564d03
58+
KVM_FEATURE_STEAL_TIME 5 steal time can be enabled by
59+
writing to msr 0x4b564d03
6060

61-
KVM_FEATURE_PV_EOI 6 paravirtualized end of interrupt
62-
handler can be enabled by
63-
writing to msr 0x4b564d04
61+
KVM_FEATURE_PV_EOI 6 paravirtualized end of interrupt
62+
handler can be enabled by
63+
writing to msr 0x4b564d04
6464

65-
KVM_FEATURE_PV_UNHAULT 7 guest checks this feature bit
66-
before enabling paravirtualized
67-
spinlock support
65+
KVM_FEATURE_PV_UNHALT 7 guest checks this feature bit
66+
before enabling paravirtualized
67+
spinlock support
6868

69-
KVM_FEATURE_PV_TLB_FLUSH 9 guest checks this feature bit
70-
before enabling paravirtualized
71-
tlb flush
69+
KVM_FEATURE_PV_TLB_FLUSH 9 guest checks this feature bit
70+
before enabling paravirtualized
71+
tlb flush
7272

73-
KVM_FEATURE_ASYNC_PF_VMEXIT 10 paravirtualized async PF VM EXIT
74-
can be enabled by setting bit 2
75-
when writing to msr 0x4b564d02
73+
KVM_FEATURE_ASYNC_PF_VMEXIT 10 paravirtualized async PF VM EXIT
74+
can be enabled by setting bit 2
75+
when writing to msr 0x4b564d02
7676

77-
KVM_FEATURE_PV_SEND_IPI 11 guest checks this feature bit
78-
before enabling paravirtualized
79-
sebd IPIs
77+
KVM_FEATURE_PV_SEND_IPI 11 guest checks this feature bit
78+
before enabling paravirtualized
79+
send IPIs
8080

81-
KVM_FEATURE_POLL_CONTROL 12 host-side polling on HLT can
82-
be disabled by writing
83-
to msr 0x4b564d05.
81+
KVM_FEATURE_POLL_CONTROL 12 host-side polling on HLT can
82+
be disabled by writing
83+
to msr 0x4b564d05.
8484

85-
KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
86-
before using paravirtualized
87-
sched yield.
85+
KVM_FEATURE_PV_SCHED_YIELD 13 guest checks this feature bit
86+
before using paravirtualized
87+
sched yield.
8888

89-
KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit
90-
before using the second async
91-
pf control msr 0x4b564d06 and
92-
async pf acknowledgment msr
93-
0x4b564d07.
89+
KVM_FEATURE_ASYNC_PF_INT 14 guest checks this feature bit
90+
before using the second async
91+
pf control msr 0x4b564d06 and
92+
async pf acknowledgment msr
93+
0x4b564d07.
9494

95-
KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24 host will warn if no guest-side
96-
per-cpu warps are expeced in
97-
kvmclock
98-
================================= =========== ================================
95+
KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 host will warn if no guest-side
96+
per-cpu warps are expected in
97+
kvmclock
98+
================================== =========== ================================
9999

100100
::
101101

0 commit comments

Comments
 (0)