@@ -4498,11 +4498,14 @@ Currently, the following list of CPUID leaves are returned:
44984498 - HYPERV_CPUID_ENLIGHTMENT_INFO
44994499 - HYPERV_CPUID_IMPLEMENT_LIMITS
45004500 - HYPERV_CPUID_NESTED_FEATURES
4501+ - HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS
4502+ - HYPERV_CPUID_SYNDBG_INTERFACE
4503+ - HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES
45014504
45024505HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was
45034506enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
45044507
4505- Userspace invokes KVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure
4508+ Userspace invokes KVM_GET_SUPPORTED_HV_CPUID by passing a kvm_cpuid2 structure
45064509with the 'nent' field indicating the number of entries in the variable-size
45074510array 'entries'. If the number of entries is too low to describe all Hyper-V
45084511feature leaves, an error (E2BIG) is returned. If the number is more or equal
@@ -4704,6 +4707,106 @@ KVM_PV_VM_VERIFY
47044707 Verify the integrity of the unpacked image. Only if this succeeds,
47054708 KVM is allowed to start protected VCPUs.
47064709
4710+ 4.126 KVM_X86_SET_MSR_FILTER
4711+ ----------------------------
4712+
4713+ :Capability: KVM_X86_SET_MSR_FILTER
4714+ :Architectures: x86
4715+ :Type: vm ioctl
4716+ :Parameters: struct kvm_msr_filter
4717+ :Returns: 0 on success, < 0 on error
4718+
4719+ ::
4720+
4721+ struct kvm_msr_filter_range {
4722+ #define KVM_MSR_FILTER_READ (1 << 0)
4723+ #define KVM_MSR_FILTER_WRITE (1 << 1)
4724+ __u32 flags;
4725+ __u32 nmsrs; /* number of msrs in bitmap */
4726+ __u32 base; /* MSR index the bitmap starts at */
4727+ __u8 *bitmap; /* a 1 bit allows the operations in flags, 0 denies */
4728+ };
4729+
4730+ #define KVM_MSR_FILTER_MAX_RANGES 16
4731+ struct kvm_msr_filter {
4732+ #define KVM_MSR_FILTER_DEFAULT_ALLOW (0 << 0)
4733+ #define KVM_MSR_FILTER_DEFAULT_DENY (1 << 0)
4734+ __u32 flags;
4735+ struct kvm_msr_filter_range ranges[KVM_MSR_FILTER_MAX_RANGES];
4736+ };
4737+
4738+ flags values for ``struct kvm_msr_filter_range ``:
4739+
4740+ ``KVM_MSR_FILTER_READ ``
4741+
4742+ Filter read accesses to MSRs using the given bitmap. A 0 in the bitmap
4743+ indicates that a read should immediately fail, while a 1 indicates that
4744+ a read for a particular MSR should be handled regardless of the default
4745+ filter action.
4746+
4747+ ``KVM_MSR_FILTER_WRITE ``
4748+
4749+ Filter write accesses to MSRs using the given bitmap. A 0 in the bitmap
4750+ indicates that a write should immediately fail, while a 1 indicates that
4751+ a write for a particular MSR should be handled regardless of the default
4752+ filter action.
4753+
4754+ ``KVM_MSR_FILTER_READ | KVM_MSR_FILTER_WRITE ``
4755+
4756+ Filter both read and write accesses to MSRs using the given bitmap. A 0
4757+ in the bitmap indicates that both reads and writes should immediately fail,
4758+ while a 1 indicates that reads and writes for a particular MSR are not
4759+ filtered by this range.
4760+
4761+ flags values for ``struct kvm_msr_filter ``:
4762+
4763+ ``KVM_MSR_FILTER_DEFAULT_ALLOW ``
4764+
4765+ If no filter range matches an MSR index that is getting accessed, KVM will
4766+ fall back to allowing access to the MSR.
4767+
4768+ ``KVM_MSR_FILTER_DEFAULT_DENY ``
4769+
4770+ If no filter range matches an MSR index that is getting accessed, KVM will
4771+ fall back to rejecting access to the MSR. In this mode, all MSRs that should
4772+ be processed by KVM need to explicitly be marked as allowed in the bitmaps.
4773+
4774+ This ioctl allows user space to define up to 16 bitmaps of MSR ranges to
4775+ specify whether a certain MSR access should be explicitly filtered for or not.
4776+
4777+ If this ioctl has never been invoked, MSR accesses are not guarded and the
4778+ default KVM in-kernel emulation behavior is fully preserved.
4779+
4780+ Calling this ioctl with an empty set of ranges (all nmsrs == 0) disables MSR
4781+ filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY `` is invalid and causes
4782+ an error.
4783+
4784+ As soon as the filtering is in place, every MSR access is processed through
4785+ the filtering except for accesses to the x2APIC MSRs (from 0x800 to 0x8ff);
4786+ x2APIC MSRs are always allowed, independent of the ``default_allow `` setting,
4787+ and their behavior depends on the ``X2APIC_ENABLE `` bit of the APIC base
4788+ register.
4789+
4790+ If a bit is within one of the defined ranges, read and write accesses are
4791+ guarded by the bitmap's value for the MSR index if the kind of access
4792+ is included in the ``struct kvm_msr_filter_range `` flags. If no range
4793+ cover this particular access, the behavior is determined by the flags
4794+ field in the kvm_msr_filter struct: ``KVM_MSR_FILTER_DEFAULT_ALLOW ``
4795+ and ``KVM_MSR_FILTER_DEFAULT_DENY ``.
4796+
4797+ Each bitmap range specifies a range of MSRs to potentially allow access on.
4798+ The range goes from MSR index [base .. base+nmsrs]. The flags field
4799+ indicates whether reads, writes or both reads and writes are filtered
4800+ by setting a 1 bit in the bitmap for the corresponding MSR index.
4801+
4802+ If an MSR access is not permitted through the filtering, it generates a
4803+ #GP inside the guest. When combined with KVM_CAP_X86_USER_SPACE_MSR, that
4804+ allows user space to deflect and potentially handle various MSR accesses
4805+ into user space.
4806+
4807+ If a vCPU is in running state while this ioctl is invoked, the vCPU may
4808+ experience inconsistent filtering behavior on MSR accesses.
4809+
47074810
470848115. The kvm_run structure
47094812========================
@@ -4869,14 +4972,13 @@ to the byte array.
48694972
48704973.. note ::
48714974
4872- For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
4873- KVM_EXIT_EPR the corresponding
4874-
4875- operations are complete (and guest state is consistent) only after userspace
4876- has re-entered the kernel with KVM_RUN. The kernel side will first finish
4877- incomplete operations and then check for pending signals. Userspace
4878- can re-enter the guest with an unmasked signal pending to complete
4879- pending operations.
4975+ For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR,
4976+ KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
4977+ operations are complete (and guest state is consistent) only after userspace
4978+ has re-entered the kernel with KVM_RUN. The kernel side will first finish
4979+ incomplete operations and then check for pending signals. Userspace
4980+ can re-enter the guest with an unmasked signal pending to complete
4981+ pending operations.
48804982
48814983::
48824984
@@ -5163,6 +5265,44 @@ Note that KVM does not skip the faulting instruction as it does for
51635265KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state
51645266if it decides to decode and emulate the instruction.
51655267
5268+ ::
5269+
5270+ /* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
5271+ struct {
5272+ __u8 error; /* user -> kernel */
5273+ __u8 pad[7];
5274+ __u32 reason; /* kernel -> user */
5275+ __u32 index; /* kernel -> user */
5276+ __u64 data; /* kernel <-> user */
5277+ } msr;
5278+
5279+ Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
5280+ enabled, MSR accesses to registers that would invoke a #GP by KVM kernel code
5281+ will instead trigger a KVM_EXIT_X86_RDMSR exit for reads and KVM_EXIT_X86_WRMSR
5282+ exit for writes.
5283+
5284+ The "reason" field specifies why the MSR trap occurred. User space will only
5285+ receive MSR exit traps when a particular reason was requested during through
5286+ ENABLE_CAP. Currently valid exit reasons are:
5287+
5288+ KVM_MSR_EXIT_REASON_UNKNOWN - access to MSR that is unknown to KVM
5289+ KVM_MSR_EXIT_REASON_INVAL - access to invalid MSRs or reserved bits
5290+ KVM_MSR_EXIT_REASON_FILTER - access blocked by KVM_X86_SET_MSR_FILTER
5291+
5292+ For KVM_EXIT_X86_RDMSR, the "index" field tells user space which MSR the guest
5293+ wants to read. To respond to this request with a successful read, user space
5294+ writes the respective data into the "data" field and must continue guest
5295+ execution to ensure the read data is transferred into guest register state.
5296+
5297+ If the RDMSR request was unsuccessful, user space indicates that with a "1" in
5298+ the "error" field. This will inject a #GP into the guest when the VCPU is
5299+ executed again.
5300+
5301+ For KVM_EXIT_X86_WRMSR, the "index" field tells user space which MSR the guest
5302+ wants to write. Once finished processing the event, user space must continue
5303+ vCPU execution. If the MSR write was unsuccessful, user space also sets the
5304+ "error" field to "1".
5305+
51665306::
51675307
51685308 /* Fix the size of the union. */
@@ -5852,6 +5992,28 @@ controlled by the kvm module parameter halt_poll_ns. This capability allows
58525992the maximum halt time to specified on a per-VM basis, effectively overriding
58535993the module parameter for the target VM.
58545994
5995+ 7.21 KVM_CAP_X86_USER_SPACE_MSR
5996+ -------------------------------
5997+
5998+ :Architectures: x86
5999+ :Target: VM
6000+ :Parameters: args[0] contains the mask of KVM_MSR_EXIT_REASON_* events to report
6001+ :Returns: 0 on success; -1 on error
6002+
6003+ This capability enables trapping of #GP invoking RDMSR and WRMSR instructions
6004+ into user space.
6005+
6006+ When a guest requests to read or write an MSR, KVM may not implement all MSRs
6007+ that are relevant to a respective system. It also does not differentiate by
6008+ CPU type.
6009+
6010+ To allow more fine grained control over MSR handling, user space may enable
6011+ this capability. With it enabled, MSR accesses that match the mask specified in
6012+ args[0] and trigger a #GP event inside the guest by KVM will instead trigger
6013+ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
6014+ can then handle to implement model specific MSR handling and/or user notifications
6015+ to inform a user that an MSR was not handled.
6016+
585560178. Other capabilities.
58566018======================
58576019
@@ -6193,3 +6355,39 @@ distribution...)
61936355
61946356If this capability is available, then the CPNC and CPVC can be synchronized
61956357between KVM and userspace via the sync regs mechanism (KVM_SYNC_DIAG318).
6358+
6359+ 8.26 KVM_CAP_X86_USER_SPACE_MSR
6360+ -------------------------------
6361+
6362+ :Architectures: x86
6363+
6364+ This capability indicates that KVM supports deflection of MSR reads and
6365+ writes to user space. It can be enabled on a VM level. If enabled, MSR
6366+ accesses that would usually trigger a #GP by KVM into the guest will
6367+ instead get bounced to user space through the KVM_EXIT_X86_RDMSR and
6368+ KVM_EXIT_X86_WRMSR exit notifications.
6369+
6370+ 8.25 KVM_X86_SET_MSR_FILTER
6371+ ---------------------------
6372+
6373+ :Architectures: x86
6374+
6375+ This capability indicates that KVM supports that accesses to user defined MSRs
6376+ may be rejected. With this capability exposed, KVM exports new VM ioctl
6377+ KVM_X86_SET_MSR_FILTER which user space can call to specify bitmaps of MSR
6378+ ranges that KVM should reject access to.
6379+
6380+ In combination with KVM_CAP_X86_USER_SPACE_MSR, this allows user space to
6381+ trap and emulate MSRs that are outside of the scope of KVM as well as
6382+ limit the attack surface on KVM's MSR emulation code.
6383+
6384+
6385+ 8.26 KVM_CAP_ENFORCE_PV_CPUID
6386+ -----------------------------
6387+
6388+ Architectures: x86
6389+
6390+ When enabled, KVM will disable paravirtual features provided to the
6391+ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
6392+ (0x40000001). Otherwise, a guest may use the paravirtual features
6393+ regardless of what has actually been exposed through the CPUID leaf.
0 commit comments