KVM ioctl API

The Definitive KVM(Kernel-based Virtual Machine) API Documentation

===================================================================

 

1. Generaldescription

----------------------

 

The kvm API is a setof ioctls that are issued to control various aspects

of a virtualmachine.  The ioctls belong to threeclasses

 

 - System ioctls: These query and set global attributeswhich affect the

   whole kvm subsystem.  In addition a system ioctl is used to create

   virtual machines

 

 - VM ioctls: These query and set attributes thataffect an entire virtual

   machine, for example memory layout.  In addition a VM ioctl is used to

   create virtual cpus (vcpus).

 

   Only run VM ioctls from the same process(address space) that was used

   to create the VM.

 

 - vcpu ioctls: These query and set attributes thatcontrol the operation

   of a single virtual cpu.

 

   Only run vcpu ioctls from the same threadthat was used to create the

   vcpu.

 

 

2. File descriptors

-------------------

 

The kvm API iscentered around file descriptors.  Aninitial

open("/dev/kvm")obtains a handle to the kvm subsystem; this handle

can be used to issuesystem ioctls.  A KVM_CREATE_VM ioctl onthis

handle will create aVM file descriptor which can be used to issue VM

ioctls.  A KVM_CREATE_VCPU ioctl on a VM fd willcreate a virtual cpu

and return a filedescriptor pointing to it.  Finally,ioctls on a vcpu

fd can be used tocontrol the vcpu, including the important task of

actually runningguest code.

 

In general filedescriptors can be migrated among processes by means

of fork() and theSCM_RIGHTS facility of unix domain socket. These

kinds of tricks areexplicitly not supported by kvm.  Whilethey will

not cause harm to thehost, their actual behavior is not guaranteed by

the API.  The only supported use is one virtual machineper process,

and one vcpu perthread.

 

 

3. Extensions

-------------

 

As of Linux 2.6.22,the KVM ABI has been stabilized: no backward

incompatible changeare allowed.  However, there is anextension

facility that allowsbackward-compatible extensions to the API to be

queried and used.

 

The extensionmechanism is not based on the Linux version number.

Instead, kvm definesextension identifiers and a facility to query

whether a particularextension identifier is available.  If itis, a

set of ioctls isavailable for application use.

 

 

4. API description

------------------

 

This sectiondescribes ioctls that can be used to control kvm guests.

For each ioctl, thefollowing information is provided along with a

description:

 

  Capability: which KVM extension provides thisioctl.  Can be 'basic',

      which means that is will be provided byany kernel that supports

      API version 12 (see section 4.1), aKVM_CAP_xyz constant, which

      means availability needs to be checked withKVM_CHECK_EXTENSION

      (see section 4.4), or 'none' which meansthat while not all kernels

      support this ioctl, there's no capabilitybit to check its

      availability: for kernels that don'tsupport the ioctl,

      the ioctl returns -ENOTTY.

 

  Architectures: which instruction setarchitectures provide this ioctl.

      x86 includes both i386 and x86_64.

 

  Type: system, vm, or vcpu.

 

  Parameters: what parameters are accepted bythe ioctl.

 

  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)

      are not detailed, but errors withspecific meanings are.

 

 

4.1KVM_GET_API_VERSION

 

Capability: basic

Architectures: all

Type: system ioctl

Parameters: none

Returns: the constantKVM_API_VERSION (=12)

 

This identifies theAPI version as the stable kvm API. It is not

expected that thisnumber will change.  However, Linux2.6.20 and

2.6.21 report earlierversions; these are not documented and not

supported.  Applications should refuse to run ifKVM_GET_API_VERSION

returns a value otherthan 12.  If this check passes, allioctls

described as 'basic'will be available.

 

 

4.2 KVM_CREATE_VM

 

Capability: basic

Architectures: all

Type: system ioctl

Parameters: machinetype identifier (KVM_VM_*)

Returns: a VM fd thatcan be used to control the new virtual machine.

 

The new VM has novirtual cpus and no memory.  An mmap() ofa VM fd

will access thevirtual machine's physical address space; offset zero

corresponds to guestphysical address zero.  Use of mmap() ona VM fd

is discouraged if userspacememory allocation (KVM_CAP_USER_MEMORY) is

available.

You most certainlywant to use 0 as machine type.

 

In order to createuser controlled virtual machines on S390, check

KVM_CAP_S390_UCONTROLand use the flag KVM_VM_S390_UCONTROL as

privileged user(CAP_SYS_ADMIN).

 

 

4.3KVM_GET_MSR_INDEX_LIST

 

Capability: basic

Architectures: x86

Type: system

Parameters: structkvm_msr_list (in/out)

Returns: 0 onsuccess; -1 on error

Errors:

  E2BIG:    the msr index list is to be to fit in the array specified by

             the user.

 

struct kvm_msr_list {

        __u32 nmsrs; /* number of msrs inentries */

        __u32 indices[0];

};

 

This ioctl returnsthe guest msrs that are supported.  Thelist varies

by kvm version andhost processor, but does not change otherwise.  The

user fills in thesize of the indices array in nmsrs, and in return

kvm adjusts nmsrs toreflect the actual number of msrs and fills in

the indices arraywith their numbers.

 

Note: if kvmindicates supports MCE (KVM_CAP_MCE), then the MCE bank MSRs are

not returned in theMSR list, as different vcpus can have a different number

of banks, as set viathe KVM_X86_SETUP_MCE ioctl.

 

 

4.4KVM_CHECK_EXTENSION

 

Capability: basic,KVM_CAP_CHECK_EXTENSION_VM for vm ioctl

Architectures: all

Type: system ioctl,vm ioctl

Parameters: extensionidentifier (KVM_CAP_*)

Returns: 0 ifunsupported; 1 (or some other positive integer) if supported

 

The API allows theapplication to query about extensions to the core

kvm API.  Userspace passes an extension identifier (aninteger) and

receives an integerthat describes the extension availability.

Generally 0 means noand 1 means yes, but some extensions may report

additionalinformation in the integer return value.

 

Based on theirinitialization different VMs may have different capabilities.

It is thus encouragedto use the vm ioctl to query for capabilities (available

withKVM_CAP_CHECK_EXTENSION_VM on the vm fd)

 

4.5KVM_GET_VCPU_MMAP_SIZE

 

Capability: basic

Architectures: all

Type: system ioctl

Parameters: none

Returns: size of vcpummap area, in bytes

 

The KVM_RUN ioctl(cf.) communicates with userspace via a shared

memory region.  This ioctl returns the size of thatregion.  See the

KVM_RUN documentationfor details.

 

 

4.6KVM_SET_MEMORY_REGION

 

Capability: basic

Architectures: all

Type: vm ioctl

Parameters: structkvm_memory_region (in)

Returns: 0 onsuccess, -1 on error

 

This ioctl isobsolete and has been removed.

 

 

4.7 KVM_CREATE_VCPU

 

Capability: basic

Architectures: all

Type: vm ioctl

Parameters: vcpu id(apic id on x86)

Returns: vcpu fd onsuccess, -1 on error

 

This API adds a vcputo a virtual machine.  The vcpu id is asmall integer

in the range [0,max_vcpus).

 

The recommendedmax_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of

theKVM_CHECK_EXTENSION ioctl() at run-time.

The maximum possiblevalue for max_vcpus can be retrieved using the

KVM_CAP_MAX_VCPUS ofthe KVM_CHECK_EXTENSION ioctl() at run-time.

 

If theKVM_CAP_NR_VCPUS does not exist, you should assume that max_vcpus is 4

cpus max.

If the KVM_CAP_MAX_VCPUSdoes not exist, you should assume that max_vcpus is

same as the valuereturned from KVM_CAP_NR_VCPUS.

 

On powerpc usingbook3s_hv mode, the vcpus are mapped onto virtual

threads in one ormore virtual CPU cores.  (This is becausethe

hardware requires allthe hardware threads in a CPU core to be in the

same partition.)  The KVM_CAP_PPC_SMT capability indicates thenumber

of vcpus per virtualcore (vcore).  The vcore id is obtainedby

dividing the vcpu idby the number of vcpus per vcore.  Thevcpus in a

given vcore willalways be in the same physical core as each other

(though that might bea different physical core from time to time).

Userspace can controlthe threading (SMT) mode of the guest by its

allocation of vcpuids.  For example, if userspace wants

single-threaded guestvcpus, it should make all vcpu ids be a multiple

of the number ofvcpus per vcore.

 

For virtual cpus thathave been created with S390 user controlled virtual

machines, theresulting vcpu fd can be memory mapped at page offset

KVM_S390_SIE_PAGE_OFFSETin order to obtain a memory map of the virtual

cpu's hardwarecontrol block.

 

 

4.8 KVM_GET_DIRTY_LOG(vm ioctl)

 

Capability: basic

Architectures: x86

Type: vm ioctl

Parameters: structkvm_dirty_log (in/out)

Returns: 0 onsuccess, -1 on error

 

/* forKVM_GET_DIRTY_LOG */

struct kvm_dirty_log{

        __u32 slot;

        __u32 padding;

        union {

               void __user *dirty_bitmap; /* onebit per page */

               __u64 padding;

        };

};

 

Given a memory slot,return a bitmap containing any pages dirtied

since the last callto this ioctl.  Bit 0 is the first pagein the

memory slot.  Ensure the entire structure is cleared toavoid padding

issues.

 

 

4.9KVM_SET_MEMORY_ALIAS

 

Capability: basic

Architectures: x86

Type: vm ioctl

Parameters: structkvm_memory_alias (in)

Returns: 0 (success),-1 (error)

 

This ioctl isobsolete and has been removed.

 

 

4.10 KVM_RUN

 

Capability: basic

Architectures: all

Type: vcpu ioctl

Parameters: none

Returns: 0 onsuccess, -1 on error

Errors:

  EINTR:    an unmasked signal is pending

 

This ioctl is used torun a guest virtual cpu.  While there areno

explicit parameters,there is an implicit parameter block that can be

obtained by mmap()ingthe vcpu fd at offset 0, with the size given by

KVM_GET_VCPU_MMAP_SIZE.  The parameter block is formatted as a 'struct

kvm_run' (see below).

 

 

4.11 KVM_GET_REGS

 

Capability: basic

Architectures: allexcept ARM, arm64

Type: vcpu ioctl

Parameters: structkvm_regs (out)

Returns: 0 onsuccess, -1 on error

 

Reads the generalpurpose registers from the vcpu.

 

/* x86 */

struct kvm_regs {

        /* out (KVM_GET_REGS) / in(KVM_SET_REGS) */

        __u64 rax, rbx, rcx, rdx;

        __u64 rsi, rdi, rsp, rbp;

        __u64 r8,  r9, r10, r11;

        __u64 r12, r13, r14, r15;

        __u64 rip, rflags;

};

 

/* mips */

struct kvm_regs {

        /* out (KVM_GET_REGS) / in(KVM_SET_REGS) */

        __u64 gpr[32];

        __u64 hi;

        __u64 lo;

        __u64 pc;

};

 

 

4.12 KVM_SET_REGS

 

Capability: basic

Architectures: allexcept ARM, arm64

Type: vcpu ioctl

Parameters: structkvm_regs (in)

Returns: 0 onsuccess, -1 on error

 

Writes the generalpurpose registers into the vcpu.

 

See KVM_GET_REGS forthe data structure.

 

 

4.13 KVM_GET_SREGS

 

Capability: basic

Architectures: x86,ppc

Type: vcpu ioctl

Parameters: structkvm_sregs (out)

Returns: 0 onsuccess, -1 on error

 

Reads specialregisters from the vcpu.

 

/* x86 */

struct kvm_sregs {

        struct kvm_segment cs, ds, es, fs, gs,ss;

        struct kvm_segment tr, ldt;

        struct kvm_dtable gdt, idt;

        __u64 cr0, cr2, cr3, cr4, cr8;

        __u64 efer;

        __u64 apic_base;

        __u64interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64];

};

 

/* ppc -- seearch/powerpc/include/uapi/asm/kvm.h */

 

interrupt_bitmap is abitmap of pending external interrupts. At most

one bit may beset.  This interrupt has beenacknowledged by the APIC

but not yet injectedinto the cpu core.

 

 

4.14 KVM_SET_SREGS

 

Capability: basic

Architectures: x86,ppc

Type: vcpu ioctl

Parameters: structkvm_sregs (in)

Returns: 0 onsuccess, -1 on error

 

Writes specialregisters into the vcpu.  SeeKVM_GET_SREGS for the

data structures.

 

 

4.15 KVM_TRANSLATE

 

Capability: basic

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_translation (in/out)

Returns: 0 onsuccess, -1 on error

 

Translates a virtualaddress according to the vcpu's current address

translation mode.

 

structkvm_translation {

        /* in */

        __u64 linear_address;

 

        /* out */

        __u64 physical_address;

        __u8 valid;

        __u8 writeable;

        __u8 usermode;

        __u8 pad[5];

};

 

 

4.16 KVM_INTERRUPT

 

Capability: basic

Architectures: x86,ppc, mips

Type: vcpu ioctl

Parameters: structkvm_interrupt (in)

Returns: 0 on success,-1 on error

 

Queues a hardwareinterrupt vector to be injected.  This isonly

useful if in-kernellocal APIC or equivalent is not used.

 

/* for KVM_INTERRUPT*/

struct kvm_interrupt{

        /* in */

        __u32 irq;

};

 

X86:

 

Note 'irq' is aninterrupt vector, not an interrupt pin or line.

 

PPC:

 

Queues an externalinterrupt to be injected. This ioctl is overleaded

with 3 different irqvalues:

 

a) KVM_INTERRUPT_SET

 

  This injects an edge type external interruptinto the guest once it's ready

  to receive interrupts. When injected, theinterrupt is done.

 

b)KVM_INTERRUPT_UNSET

 

  This unsets any pending interrupt.

 

  Only available with KVM_CAP_PPC_UNSET_IRQ.

 

c)KVM_INTERRUPT_SET_LEVEL

 

  This injects a level type external interruptinto the guest context. The

  interrupt stays pending until a specificioctl with KVM_INTERRUPT_UNSET

  is triggered.

 

  Only available with KVM_CAP_PPC_IRQ_LEVEL.

 

Note that any valuefor 'irq' other than the ones stated above is invalid

and incurs unexpectedbehavior.

 

MIPS:

 

Queues an externalinterrupt to be injected into the virtual CPU. A negative

interrupt numberdequeues the interrupt.

 

 

4.17 KVM_DEBUG_GUEST

 

Capability: basic

Architectures: none

Type: vcpu ioctl

Parameters: none)

Returns: -1 on error

 

Support for this hasbeen removed.  Use KVM_SET_GUEST_DEBUGinstead.

 

 

4.18 KVM_GET_MSRS

 

Capability: basic

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_msrs (in/out)

Returns: 0 onsuccess, -1 on error

 

Reads model-specificregisters from the vcpu.  Supported msrindices can

be obtained usingKVM_GET_MSR_INDEX_LIST.

 

struct kvm_msrs {

        __u32 nmsrs; /* number of msrs inentries */

        __u32 pad;

 

        struct kvm_msr_entry entries[0];

};

 

struct kvm_msr_entry{

        __u32 index;

        __u32 reserved;

        __u64 data;

};

 

Application codeshould set the 'nmsrs' member (which indicates the

size of the entriesarray) and the 'index' member of each array entry.

kvm will fill in the'data' member.

 

 

4.19 KVM_SET_MSRS

 

Capability: basic

Architectures: x86

Type: vcpu ioctl

Parameters: struct kvm_msrs(in)

Returns: 0 onsuccess, -1 on error

 

Writes model-specificregisters to the vcpu.  See KVM_GET_MSRSfor the

data structures.

 

Application codeshould set the 'nmsrs' member (which indicates the

size of the entriesarray), and the 'index' and 'data' members of each

array entry.

 

 

4.20 KVM_SET_CPUID

 

Capability: basic

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_cpuid (in)

Returns: 0 onsuccess, -1 on error

 

Defines the vcpuresponses to the cpuid instruction. Applications

should use theKVM_SET_CPUID2 ioctl if available.

 

 

structkvm_cpuid_entry {

        __u32 function;

        __u32 eax;

        __u32 ebx;

        __u32 ecx;

        __u32 edx;

        __u32 padding;

};

 

/* for KVM_SET_CPUID*/

struct kvm_cpuid {

        __u32 nent;

        __u32 padding;

        struct kvm_cpuid_entry entries[0];

};

 

 

4.21KVM_SET_SIGNAL_MASK

 

Capability: basic

Architectures: all

Type: vcpu ioctl

Parameters: structkvm_signal_mask (in)

Returns: 0 onsuccess, -1 on error

 

Defines which signalsare blocked during execution of KVM_RUN. This

signal mask temporarilyoverrides the threads signal mask.  Any

unblocked signalreceived (except SIGKILL and SIGSTOP, which retain

their traditionalbehaviour) will cause KVM_RUN to return with -EINTR.

 

Note the signal willonly be delivered if not blocked by the original

signal mask.

 

/* forKVM_SET_SIGNAL_MASK */

structkvm_signal_mask {

        __u32 len;

        __u8 sigset[0];

};

 

 

4.22 KVM_GET_FPU

 

Capability: basic

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_fpu (out)

Returns: 0 onsuccess, -1 on error

 

Reads the floatingpoint state from the vcpu.

 

/* for KVM_GET_FPUand KVM_SET_FPU */

struct kvm_fpu {

        __u8 fpr[8][16];

        __u16 fcw;

        __u16 fsw;

        __u8 ftwx;  /* in fxsave format */

        __u8 pad1;

        __u16 last_opcode;

        __u64 last_ip;

        __u64 last_dp;

        __u8 xmm[16][16];

        __u32 mxcsr;

        __u32 pad2;

};

 

 

4.23 KVM_SET_FPU

 

Capability: basic

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_fpu (in)

Returns: 0 onsuccess, -1 on error

 

Writes the floatingpoint state to the vcpu.

 

/* for KVM_GET_FPUand KVM_SET_FPU */

struct kvm_fpu {

        __u8 fpr[8][16];

        __u16 fcw;

        __u16 fsw;

        __u8 ftwx;  /* in fxsave format */

        __u8 pad1;

        __u16 last_opcode;

        __u64 last_ip;

        __u64 last_dp;

        __u8 xmm[16][16];

        __u32 mxcsr;

        __u32 pad2;

};

 

 

4.24KVM_CREATE_IRQCHIP

 

Capability: KVM_CAP_IRQCHIP,KVM_CAP_S390_IRQCHIP (s390)

Architectures: x86,ARM, arm64, s390

Type: vm ioctl

Parameters: none

Returns: 0 onsuccess, -1 on error

 

Creates an interruptcontroller model in the kernel.

On x86, creates avirtual ioapic, a virtual PIC (two PICs, nested), and sets up

future vcpus to havea local APIC.  IRQ routing for GSIs 0-15is set to both

PIC and IOAPIC; GSI16-23 only go to the IOAPIC.

On ARM/arm64, a GICv2is created. Any other GIC versions require the usage of

KVM_CREATE_DEVICE,which also supports creating a GICv2. Using

KVM_CREATE_DEVICE ispreferred over KVM_CREATE_IRQCHIP for GICv2.

On s390, a dummy irqrouting table is created.

 

Note that on s390 theKVM_CAP_S390_IRQCHIP vm capability needs to be enabled

beforeKVM_CREATE_IRQCHIP can be used.

 

 

4.25 KVM_IRQ_LINE

 

Capability:KVM_CAP_IRQCHIP

Architectures: x86,arm, arm64

Type: vm ioctl

Parameters: structkvm_irq_level

Returns: 0 onsuccess, -1 on error

 

Sets the level of aGSI input to the interrupt controller model in the kernel.

On some architecturesit is required that an interrupt controller model has

been previouslycreated with KVM_CREATE_IRQCHIP.  Notethat edge-triggered

interrupts requirethe level to be set to 1 and then back to 0.

 

On real hardware,interrupt pins can be active-low or active-high.  This

does not matter forthe level field of struct kvm_irq_level: 1 always

means active(asserted), 0 means inactive (deasserted).

 

x86 allows theoperating system to program the interrupt polarity

(active-low/active-high)for level-triggered interrupts, and KVM used

to consider thepolarity.  However, due to bitrot in thehandling of

active-lowinterrupts, the above convention is now valid on x86 too.

This is signaled byKVM_CAP_X86_IOAPIC_POLARITY_IGNORED. Userspace

should not presentinterrupts to the guest as active-low unless this

capability is present(or unless it is not using the in-kernel irqchip,

of course).

 

 

ARM/arm64 can signalan interrupt either at the CPU level, or at the

in-kernel irqchip(GIC), and for in-kernel irqchip can tell the GIC to

use PPIs designatedfor specific cpus.  The irq field isinterpreted

like this:

 

  bits: | 31 ... 24 | 23  ... 16 | 15    ...   0 |

  field: | irq_type  | vcpu_index |     irq_id    |

 

The irq_type fieldhas the following values:

- irq_type[0]:out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ

- irq_type[1]:in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)

               (the vcpu_index field isignored)

- irq_type[2]:in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)

 

(The irq_id fieldthus corresponds nicely to the IRQ ID in the ARM GIC specs)

 

In both cases, levelis used to assert/deassert the line.

 

struct kvm_irq_level{

        union {

               __u32 irq;     /* GSI */

               __s32 status;  /* not used for KVM_IRQ_LEVEL */

        };

        __u32 level;           /* 0 or 1 */

};

 

 

4.26 KVM_GET_IRQCHIP

 

Capability:KVM_CAP_IRQCHIP

Architectures: x86

Type: vm ioctl

Parameters: structkvm_irqchip (in/out)

Returns: 0 onsuccess, -1 on error

 

Reads the state of akernel interrupt controller created with

KVM_CREATE_IRQCHIPinto a buffer provided by the caller.

 

struct kvm_irqchip {

        __u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */

        __u32 pad;

        union {

               char dummy[512];  /* reserving space */

               struct kvm_pic_state pic;

               struct kvm_ioapic_state ioapic;

        } chip;

};

 

 

4.27 KVM_SET_IRQCHIP

 

Capability:KVM_CAP_IRQCHIP

Architectures: x86

Type: vm ioctl

Parameters: structkvm_irqchip (in)

Returns: 0 onsuccess, -1 on error

 

Sets the state of akernel interrupt controller created with

KVM_CREATE_IRQCHIPfrom a buffer provided by the caller.

 

struct kvm_irqchip {

        __u32 chip_id;  /* 0 = PIC1, 1 = PIC2, 2 = IOAPIC */

        __u32 pad;

        union {

               char dummy[512];  /* reserving space */

               struct kvm_pic_state pic;

               struct kvm_ioapic_state ioapic;

        } chip;

};

 

 

4.28KVM_XEN_HVM_CONFIG

 

Capability:KVM_CAP_XEN_HVM

Architectures: x86

Type: vm ioctl

Parameters: structkvm_xen_hvm_config (in)

Returns: 0 onsuccess, -1 on error

 

Sets the MSR that theXen HVM guest uses to initialize its hypercall

page, and providesthe starting address and size of the hypercall

blobs inuserspace.  When the guest writes theMSR, kvm copies one

page of a blob (32-or 64-bit, depending on the vcpu mode) to guest

memory.

 

structkvm_xen_hvm_config {

        __u32 flags;

        __u32 msr;

        __u64 blob_addr_32;

        __u64 blob_addr_64;

        __u8 blob_size_32;

        __u8 blob_size_64;

        __u8 pad2[30];

};

 

 

4.29 KVM_GET_CLOCK

 

Capability:KVM_CAP_ADJUST_CLOCK

Architectures: x86

Type: vm ioctl

Parameters: structkvm_clock_data (out)

Returns: 0 onsuccess, -1 on error

 

Gets the currenttimestamp of kvmclock as seen by the current guest. In

conjunction withKVM_SET_CLOCK, it is used to ensure monotonicity on scenarios

such as migration.

 

struct kvm_clock_data{

        __u64 clock;  /* kvmclock current value */

        __u32 flags;

        __u32 pad[9];

};

 

 

4.30 KVM_SET_CLOCK

 

Capability:KVM_CAP_ADJUST_CLOCK

Architectures: x86

Type: vm ioctl

Parameters: structkvm_clock_data (in)

Returns: 0 onsuccess, -1 on error

 

Sets the currenttimestamp of kvmclock to the value specified in its parameter.

In conjunction withKVM_GET_CLOCK, it is used to ensure monotonicity on scenarios

such as migration.

 

struct kvm_clock_data{

        __u64 clock;  /* kvmclock current value */

        __u32 flags;

        __u32 pad[9];

};

 

 

4.31KVM_GET_VCPU_EVENTS

 

Capability: KVM_CAP_VCPU_EVENTS

Extended by:KVM_CAP_INTR_SHADOW

Architectures: x86

Type: vm ioctl

Parameters: structkvm_vcpu_event (out)

Returns: 0 onsuccess, -1 on error

 

Gets currentlypending exceptions, interrupts, and NMIs as well as related

states of the vcpu.

 

structkvm_vcpu_events {

        struct {

               __u8 injected;

               __u8 nr;

               __u8 has_error_code;

               __u8 pad;

               __u32 error_code;

        } exception;

        struct {

               __u8 injected;

               __u8 nr;

               __u8 soft;

               __u8 shadow;

        } interrupt;

        struct {

               __u8 injected;

               __u8 pending;

               __u8 masked;

               __u8 pad;

        } nmi;

        __u32 sipi_vector;

        __u32 flags;

};

 

KVM_VCPUEVENT_VALID_SHADOWmay be set in the flags field to signal that

interrupt.shadowcontains a valid state. Otherwise, this field is undefined.

 

 

4.32KVM_SET_VCPU_EVENTS

 

Capability:KVM_CAP_VCPU_EVENTS

Extended by:KVM_CAP_INTR_SHADOW

Architectures: x86

Type: vm ioctl

Parameters: structkvm_vcpu_event (in)

Returns: 0 onsuccess, -1 on error

 

Set pendingexceptions, interrupts, and NMIs as well as related states of the

vcpu.

 

See KVM_GET_VCPU_EVENTSfor the data structure.

 

Fields that may bemodified asynchronously by running VCPUs can be excluded

from the update.These fields are nmi.pending and sipi_vector. Keep the

corresponding bits inthe flags field cleared to suppress overwriting the

current in-kernelstate. The bits are:

 

KVM_VCPUEVENT_VALID_NMI_PENDING- transfer nmi.pending to the kernel

KVM_VCPUEVENT_VALID_SIPI_VECTOR- transfer sipi_vector

 

IfKVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in

the flags field tosignal that interrupt.shadow contains a valid state and

shall be written intothe VCPU.

 

 

4.33KVM_GET_DEBUGREGS

 

Capability:KVM_CAP_DEBUGREGS

Architectures: x86

Type: vm ioctl

Parameters: structkvm_debugregs (out)

Returns: 0 onsuccess, -1 on error

 

Reads debug registersfrom the vcpu.

 

struct kvm_debugregs{

        __u64 db[4];

        __u64 dr6;

        __u64 dr7;

        __u64 flags;

        __u64 reserved[9];

};

 

 

4.34KVM_SET_DEBUGREGS

 

Capability:KVM_CAP_DEBUGREGS

Architectures: x86

Type: vm ioctl

Parameters: struct kvm_debugregs(in)

Returns: 0 onsuccess, -1 on error

 

Writes debugregisters into the vcpu.

 

See KVM_GET_DEBUGREGSfor the data structure. The flags field is unused

yet and must becleared on entry.

 

 

4.35KVM_SET_USER_MEMORY_REGION

 

Capability: KVM_CAP_USER_MEM

Architectures: all

Type: vm ioctl

Parameters: structkvm_userspace_memory_region (in)

Returns: 0 onsuccess, -1 on error

 

structkvm_userspace_memory_region {

        __u32 slot;

        __u32 flags;

        __u64 guest_phys_addr;

        __u64 memory_size; /* bytes */

        __u64 userspace_addr; /* start of theuserspace allocated memory */

};

 

/* forkvm_memory_region::flags */

#defineKVM_MEM_LOG_DIRTY_PAGES       (1UL<< 0)

#defineKVM_MEM_READONLY       (1UL << 1)

 

This ioctl allows theuser to create or modify a guest physical memory

slot.  When changing an existing slot, it may bemoved in the guest

physical memoryspace, or its flags may be modified.  Itmay not be

resized.  Slots may not overlap in guest physicaladdress space.

 

Memory for the regionis taken starting at the address denoted by the

field userspace_addr,which must point at user addressable memory for

the entire memoryslot size.  Any object may back thismemory, including

anonymous memory,ordinary files, and hugetlbfs.

 

It is recommendedthat the lower 21 bits of guest_phys_addr and userspace_addr

be identical.  This allows large pages in the guest to bebacked by large

pages in the host.

 

The flags fieldsupports two flags: KVM_MEM_LOG_DIRTY_PAGES and

KVM_MEM_READONLY.  The former can be set to instruct KVM to keeptrack of

writes to memorywithin the slot.  See KVM_GET_DIRTY_LOGioctl to know how to

use it.  The latter can be set, ifKVM_CAP_READONLY_MEM capability allows it,

to make a new slotread-only.  In this case, writes to thismemory will be

posted to userspace asKVM_EXIT_MMIO exits.

 

When theKVM_CAP_SYNC_MMU capability is available, changes in the backing of

the memory region areautomatically reflected into the guest. For example, an

mmap() that affectsthe region will be made visible immediately. Another

example ismadvise(MADV_DROP).

 

It is recommended touse this API instead of the KVM_SET_MEMORY_REGION ioctl.

TheKVM_SET_MEMORY_REGION does not allow fine grained control over memory

allocation and isdeprecated.

 

 

4.36 KVM_SET_TSS_ADDR

 

Capability: KVM_CAP_SET_TSS_ADDR

Architectures: x86

Type: vm ioctl

Parameters: unsignedlong tss_address (in)

Returns: 0 onsuccess, -1 on error

 

This ioctl definesthe physical address of a three-page region in the guest

physical addressspace.  The region must be within thefirst 4GB of the

guest physicaladdress space and must not conflict with any memory slot

or any mmioaddress.  The guest may malfunction if itaccesses this memory

region.

 

This ioctl isrequired on Intel-based hosts.  This isneeded on Intel hardware

because of a quirk inthe virtualization implementation (see the internals

documentation when itpops into existence).

 

 

4.37 KVM_ENABLE_CAP

 

Capability:KVM_CAP_ENABLE_CAP, KVM_CAP_ENABLE_CAP_VM

Architectures: ppc,s390

Type: vcpu ioctl, vmioctl (with KVM_CAP_ENABLE_CAP_VM)

Parameters: structkvm_enable_cap (in)

Returns: 0 onsuccess; -1 on error

 

+Not all extensionsare enabled by default. Using this ioctl the application

can enable anextension, making it available to the guest.

 

On systems that donot support this ioctl, it always fails. On systems that

do support it, itonly works for extensions that are supported for enablement.

 

To check if acapability can be enabled, the KVM_CHECK_EXTENSION ioctl should

be used.

 

struct kvm_enable_cap{

       /* in */

       __u32 cap;

 

The capability thatis supposed to get enabled.

 

       __u32 flags;

 

A bitfield indicatingfuture enhancements. Has to be 0 for now.

 

       __u64 args[4];

 

Arguments forenabling a feature. If a feature needs initial values to

function properly,this is the place to put them.

 

       __u8 pad[64];

};

 

The vcpu ioctl shouldbe used for vcpu-specific capabilities, the vm ioctl

for vm-widecapabilities.

 

4.38 KVM_GET_MP_STATE

 

Capability:KVM_CAP_MP_STATE

Architectures: x86,s390, arm, arm64

Type: vcpu ioctl

Parameters: structkvm_mp_state (out)

Returns: 0 onsuccess; -1 on error

 

struct kvm_mp_state {

        __u32 mp_state;

};

 

Returns the vcpu'scurrent "multiprocessing state" (though also valid on

uniprocessor guests).

 

Possible values are:

 

 - KVM_MP_STATE_RUNNABLE:        the vcpu is currently running[x86,arm/arm64]

 - KVM_MP_STATE_UNINITIALIZED:   the vcpu is an application processor (AP)

                                 which has notyet received an INIT signal [x86]

 - KVM_MP_STATE_INIT_RECEIVED:   the vcpu has received an INIT signal, and is

                                 now ready fora SIPI [x86]

 - KVM_MP_STATE_HALTED:          the vcpu has executed a HLTinstruction and

                                 is waiting foran interrupt [x86]

 - KVM_MP_STATE_SIPI_RECEIVED:   the vcpu has just received a SIPI (vector

                                 accessible viaKVM_GET_VCPU_EVENTS) [x86]

 - KVM_MP_STATE_STOPPED:         the vcpu is stopped [s390,arm/arm64]

 - KVM_MP_STATE_CHECK_STOP:      the vcpu is in a special error state[s390]

 - KVM_MP_STATE_OPERATING:       the vcpu is operating (running orhalted)

                                 [s390]

 - KVM_MP_STATE_LOAD:            the vcpu is in a specialload/startup state

                                 [s390]

 

On x86, this ioctl isonly useful after KVM_CREATE_IRQCHIP. Without an

in-kernel irqchip,the multiprocessing state must be maintained by userspace on

these architectures.

 

For arm/arm64:

 

The only states thatare valid are KVM_MP_STATE_STOPPED and

KVM_MP_STATE_RUNNABLEwhich reflect if the vcpu is paused or not.

 

4.39 KVM_SET_MP_STATE

 

Capability:KVM_CAP_MP_STATE

Architectures: x86,s390, arm, arm64

Type: vcpu ioctl

Parameters: structkvm_mp_state (in)

Returns: 0 onsuccess; -1 on error

 

Sets the vcpu'scurrent "multiprocessing state"; see KVM_GET_MP_STATE for

arguments.

 

On x86, this ioctl isonly useful after KVM_CREATE_IRQCHIP. Without an

in-kernel irqchip,the multiprocessing state must be maintained by userspace on

these architectures.

 

For arm/arm64:

 

The only states thatare valid are KVM_MP_STATE_STOPPED and

KVM_MP_STATE_RUNNABLEwhich reflect if the vcpu should be paused or not.

 

4.40KVM_SET_IDENTITY_MAP_ADDR

 

Capability:KVM_CAP_SET_IDENTITY_MAP_ADDR

Architectures: x86

Type: vm ioctl

Parameters: unsignedlong identity (in)

Returns: 0 onsuccess, -1 on error

 

This ioctl definesthe physical address of a one-page region in the guest

physical addressspace.  The region must be within thefirst 4GB of the

guest physical addressspace and must not conflict with any memory slot

or any mmioaddress.  The guest may malfunction if itaccesses this memory

region.

 

This ioctl isrequired on Intel-based hosts.  This isneeded on Intel hardware

because of a quirk inthe virtualization implementation (see the internals

documentation when itpops into existence).

 

 

4.41KVM_SET_BOOT_CPU_ID

 

Capability:KVM_CAP_SET_BOOT_CPU_ID

Architectures: x86

Type: vm ioctl

Parameters: unsignedlong vcpu_id

Returns: 0 onsuccess, -1 on error

 

Define which vcpu isthe Bootstrap Processor (BSP).  Valuesare the same

as the vcpu id inKVM_CREATE_VCPU.  If this ioctl is notcalled, the default

is vcpu 0.

 

 

4.42 KVM_GET_XSAVE

 

Capability:KVM_CAP_XSAVE

Architectures: x86

Type: vcpu ioctl

Parameters: struct kvm_xsave(out)

Returns: 0 onsuccess, -1 on error

 

struct kvm_xsave {

        __u32 region[1024];

};

 

This ioctl would copycurrent vcpu's xsave struct to the userspace.

 

 

4.43 KVM_SET_XSAVE

 

Capability:KVM_CAP_XSAVE

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_xsave (in)

Returns: 0 onsuccess, -1 on error

 

struct kvm_xsave {

        __u32 region[1024];

};

 

This ioctl would copyuserspace's xsave struct to the kernel.

 

 

4.44 KVM_GET_XCRS

 

Capability:KVM_CAP_XCRS

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_xcrs (out)

Returns: 0 onsuccess, -1 on error

 

struct kvm_xcr {

        __u32 xcr;

        __u32 reserved;

        __u64 value;

};

 

struct kvm_xcrs {

        __u32 nr_xcrs;

        __u32 flags;

        struct kvm_xcr xcrs[KVM_MAX_XCRS];

        __u64 padding[16];

};

 

This ioctl would copycurrent vcpu's xcrs to the userspace.

 

 

4.45 KVM_SET_XCRS

 

Capability:KVM_CAP_XCRS

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_xcrs (in)

Returns: 0 onsuccess, -1 on error

 

struct kvm_xcr {

        __u32 xcr;

        __u32 reserved;

        __u64 value;

};

 

struct kvm_xcrs {

        __u32 nr_xcrs;

        __u32 flags;

        struct kvm_xcr xcrs[KVM_MAX_XCRS];

        __u64 padding[16];

};

 

This ioctl would setvcpu's xcr to the value userspace specified.

 

 

4.46KVM_GET_SUPPORTED_CPUID

 

Capability:KVM_CAP_EXT_CPUID

Architectures: x86

Type: system ioctl

Parameters: structkvm_cpuid2 (in/out)

Returns: 0 onsuccess, -1 on error

 

struct kvm_cpuid2 {

        __u32 nent;

        __u32 padding;

        struct kvm_cpuid_entry2 entries[0];

};

 

#defineKVM_CPUID_FLAG_SIGNIFCANT_INDEX              BIT(0)

#defineKVM_CPUID_FLAG_STATEFUL_FUNC          BIT(1)

#defineKVM_CPUID_FLAG_STATE_READ_NEXT               BIT(2)

 

structkvm_cpuid_entry2 {

        __u32 function;

        __u32 index;

        __u32 flags;

        __u32 eax;

        __u32 ebx;

        __u32 ecx;

        __u32 edx;

        __u32 padding[3];

};

 

This ioctl returnsx86 cpuid features which are supported by both the hardware

and kvm.  Userspace can use the information returned bythis ioctl to

construct cpuidinformation (for KVM_SET_CPUID2) that is consistent with

hardware, kernel, anduserspace capabilities, and with user requirements (for

example, the user maywish to constrain cpuid to emulate older hardware,

or for featureconsistency across a cluster).

 

Userspace invokesKVM_GET_SUPPORTED_CPUID by passing a kvm_cpuid2 structure

with the 'nent' fieldindicating the number of entries in the variable-size

array 'entries'.  If the number of entries is too low todescribe the cpu

capabilities, anerror (E2BIG) is returned.  If the numberis too high,

the 'nent' field isadjusted and an error (ENOMEM) is returned. If the

number is just right,the 'nent' field is adjusted to the number of valid

entries in the'entries' array, which is then filled.

 

The entries returnedare the host cpuid as returned by the cpuid instruction,

with unknown orunsupported features masked out.  Somefeatures (for example,

x2apic), may not bepresent in the host cpu, but are exposed by kvm if it can

emulate themefficiently. The fields in each entry are defined as follows:

 

  function: the eax value used to obtain theentry

  index: the ecx value used to obtain the entry(for entries that are

         affected by ecx)

  flags: an OR of zero or more of thefollowing:

        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:

           if the index field is valid

        KVM_CPUID_FLAG_STATEFUL_FUNC:

           if cpuid for this function returnsdifferent values for successive

           invocations; there will be severalentries with the same function,

           all with this flag set

        KVM_CPUID_FLAG_STATE_READ_NEXT:

           for KVM_CPUID_FLAG_STATEFUL_FUNCentries, set if this entry is

           the first entry to be read by a cpu

   eax, ebx, ecx, edx: the values returned bythe cpuid instruction for

         this function/index combination

 

The TSC deadlinetimer feature (CPUID leaf 1, ecx[24]) is always returned

as false, since thefeature depends on KVM_CREATE_IRQCHIP for local APIC

support.  Instead it is reported via

 

  ioctl(KVM_CHECK_EXTENSION,KVM_CAP_TSC_DEADLINE_TIMER)

 

if that returns trueand you use KVM_CREATE_IRQCHIP, or if you emulate the

feature in userspace,then you can enable the feature for KVM_SET_CPUID2.

 

 

4.47KVM_PPC_GET_PVINFO

 

Capability:KVM_CAP_PPC_GET_PVINFO

Architectures: ppc

Type: vm ioctl

Parameters: structkvm_ppc_pvinfo (out)

Returns: 0 onsuccess, !0 on error

 

struct kvm_ppc_pvinfo{

        __u32 flags;

        __u32 hcall[4];

        __u8 pad[108];

};

 

This ioctl fetches PVspecific information that need to be passed to the guest

using the device treeor other means from vm context.

 

The hcall arraydefines 4 instructions that make up a hypercall.

 

If any additionalfield gets added to this structure later on, a bit for that

additional piece ofinformation will be set in the flags bitmap.

 

The flags bitmap isdefined as:

 

   /* the host supports the ePAPR idle hcall

   #define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (1<<0)

 

4.48KVM_ASSIGN_PCI_DEVICE

 

Capability: none

Architectures: x86

Type: vm ioctl

Parameters: structkvm_assigned_pci_dev (in)

Returns: 0 onsuccess, -1 on error

 

Assigns a host PCIdevice to the VM.

 

structkvm_assigned_pci_dev {

        __u32 assigned_dev_id;

        __u32 busnr;

        __u32 devfn;

        __u32 flags;

        __u32 segnr;

        union {

               __u32 reserved[11];

        };

};

 

The PCI device isspecified by the triple segnr, busnr, and devfn.

Identification insucceeding service requests is done via assigned_dev_id. The

following flags arespecified:

 

/* Depends onKVM_CAP_IOMMU */

#defineKVM_DEV_ASSIGN_ENABLE_IOMMU   (1 <<0)

/* The following twodepend on KVM_CAP_PCI_2_3 */

#defineKVM_DEV_ASSIGN_PCI_2_3        (1 <<1)

#defineKVM_DEV_ASSIGN_MASK_INTX      (1 <<2)

 

IfKVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts

via thePCI-2.3-compliant device-level mask, thus enable IRQ sharing with other

assigned devices orhost devices. KVM_DEV_ASSIGN_MASK_INTX specifies the

guest's view on theINTx mask, see KVM_ASSIGN_SET_INTX_MASK for details.

 

TheKVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure

isolation of thedevice.  Usages not specifying this flagare deprecated.

 

Only PCI header type0 devices with PCI BAR resources are supported by

deviceassignment.  The user requesting thisioctl must have read/write

access to the PCIsysfs resource files associated with the device.

 

Errors:

  ENOTTY: kernel does not support this ioctl

 

  Other error conditions may be defined byindividual device types or

  have their standard meanings.

 

 

4.49KVM_DEASSIGN_PCI_DEVICE

 

Capability: none

Architectures: x86

Type: vm ioctl

Parameters: structkvm_assigned_pci_dev (in)

Returns: 0 onsuccess, -1 on error

 

Ends PCI deviceassignment, releasing all associated resources.

 

See KVM_ASSIGN_PCI_DEVICEfor the data structure. Only assigned_dev_id is

used inkvm_assigned_pci_dev to identify the device.

 

Errors:

  ENOTTY: kernel does not support this ioctl

 

  Other error conditions may be defined byindividual device types or

  have their standard meanings.

 

4.50KVM_ASSIGN_DEV_IRQ

 

Capability:KVM_CAP_ASSIGN_DEV_IRQ

Architectures: x86

Type: vm ioctl

Parameters: structkvm_assigned_irq (in)

Returns: 0 onsuccess, -1 on error

 

Assigns an IRQ to apassed-through device.

 

structkvm_assigned_irq {

        __u32 assigned_dev_id;

        __u32 host_irq; /* ignored (legacyfield) */

        __u32 guest_irq;

        __u32 flags;

        union {

               __u32 reserved[12];

        };

};

 

The following flagsare defined:

 

#defineKVM_DEV_IRQ_HOST_INTX    (1 << 0)

#defineKVM_DEV_IRQ_HOST_MSI     (1 << 1)

#defineKVM_DEV_IRQ_HOST_MSIX    (1 << 2)

 

#defineKVM_DEV_IRQ_GUEST_INTX   (1 << 8)

#defineKVM_DEV_IRQ_GUEST_MSI    (1 << 9)

#defineKVM_DEV_IRQ_GUEST_MSIX   (1 << 10)

 

It is not valid tospecify multiple types per host or guest IRQ. However, the

IRQ type of host andguest can differ or can even be null.

 

Errors:

  ENOTTY: kernel does not support this ioctl

 

  Other error conditions may be defined byindividual device types or

  have their standard meanings.

 

 

4.51KVM_DEASSIGN_DEV_IRQ

 

Capability:KVM_CAP_ASSIGN_DEV_IRQ

Architectures: x86

Type: vm ioctl

Parameters: structkvm_assigned_irq (in)

Returns: 0 onsuccess, -1 on error

 

Ends an IRQassignment to a passed-through device.

 

SeeKVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified

by assigned_dev_id,flags must correspond to the IRQ type specified on

KVM_ASSIGN_DEV_IRQ.Partial deassignment of host or guest IRQ is allowed.

 

 

4.52KVM_SET_GSI_ROUTING

 

Capability:KVM_CAP_IRQ_ROUTING

Architectures: x86s390

Type: vm ioctl

Parameters: structkvm_irq_routing (in)

Returns: 0 onsuccess, -1 on error

 

Sets the GSI routingtable entries, overwriting any previously set entries.

 

structkvm_irq_routing {

        __u32 nr;

        __u32 flags;

        struct kvm_irq_routing_entry entries[0];

};

 

No flags arespecified so far, the corresponding field must be set to zero.

 

structkvm_irq_routing_entry {

        __u32 gsi;

        __u32 type;

        __u32 flags;

        __u32 pad;

        union {

               struct kvm_irq_routing_irqchipirqchip;

               struct kvm_irq_routing_msi msi;

               struct kvm_irq_routing_s390_adapteradapter;

               __u32 pad[8];

        } u;

};

 

/* gsi routing entrytypes */

#defineKVM_IRQ_ROUTING_IRQCHIP 1

#defineKVM_IRQ_ROUTING_MSI 2

#defineKVM_IRQ_ROUTING_S390_ADAPTER 3

 

No flags arespecified so far, the corresponding field must be set to zero.

 

structkvm_irq_routing_irqchip {

        __u32 irqchip;

        __u32 pin;

};

 

structkvm_irq_routing_msi {

        __u32 address_lo;

        __u32 address_hi;

        __u32 data;

        __u32 pad;

};

 

structkvm_irq_routing_s390_adapter {

        __u64 ind_addr;

        __u64 summary_addr;

        __u64 ind_offset;

        __u32 summary_offset;

        __u32 adapter_id;

};

 

 

4.53KVM_ASSIGN_SET_MSIX_NR

 

Capability: none

Architectures: x86

Type: vm ioctl

Parameters: structkvm_assigned_msix_nr (in)

Returns: 0 onsuccess, -1 on error

 

Set the number ofMSI-X interrupts for an assigned device. The number is

reset again byterminating the MSI-X assignment of the device via

KVM_DEASSIGN_DEV_IRQ.Calling this service more than once at any earlier

point will fail.

 

structkvm_assigned_msix_nr {

        __u32 assigned_dev_id;

        __u16 entry_nr;

        __u16 padding;

};

 

#defineKVM_MAX_MSIX_PER_DEV          256

 

 

4.54KVM_ASSIGN_SET_MSIX_ENTRY

 

Capability: none

Architectures: x86

Type: vm ioctl

Parameters: structkvm_assigned_msix_entry (in)

Returns: 0 onsuccess, -1 on error

 

Specifies the routingof an MSI-X assigned device interrupt to a GSI. Setting

the GSI vector tozero means disabling the interrupt.

 

structkvm_assigned_msix_entry {

        __u32 assigned_dev_id;

        __u32 gsi;

        __u16 entry; /* The index of entry inthe MSI-X table */

        __u16 padding[3];

};

 

Errors:

  ENOTTY: kernel does not support this ioctl

 

  Other error conditions may be defined byindividual device types or

  have their standard meanings.

 

 

4.55 KVM_SET_TSC_KHZ

 

Capability:KVM_CAP_TSC_CONTROL

Architectures: x86

Type: vcpu ioctl

Parameters: virtualtsc_khz

Returns: 0 onsuccess, -1 on error

 

Specifies the tscfrequency for the virtual machine. The unit of the

frequency is KHz.

 

 

4.56 KVM_GET_TSC_KHZ

 

Capability:KVM_CAP_GET_TSC_KHZ

Architectures: x86

Type: vcpu ioctl

Parameters: none

Returns: virtualtsc-khz on success, negative value on error

 

Returns the tscfrequency of the guest. The unit of the return value is

KHz. If the host hasunstable tsc this ioctl returns -EIO instead as an

error.

 

 

4.57 KVM_GET_LAPIC

 

Capability:KVM_CAP_IRQCHIP

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_lapic_state (out)

Returns: 0 onsuccess, -1 on error

 

#defineKVM_APIC_REG_SIZE 0x400

structkvm_lapic_state {

        char regs[KVM_APIC_REG_SIZE];

};

 

Reads the Local APICregisters and copies them into the input argument.  The

data format andlayout are the same as documented in the architecture manual.

 

 

4.58 KVM_SET_LAPIC

 

Capability:KVM_CAP_IRQCHIP

Architectures: x86

Type: vcpu ioctl

Parameters: structkvm_lapic_state (in)

Returns: 0 onsuccess, -1 on error

 

#defineKVM_APIC_REG_SIZE 0x400

structkvm_lapic_state {

        char regs[KVM_APIC_REG_SIZE];

};

 

Copies the inputargument into the Local APIC registers. The data format

and layout are thesame as documented in the architecture manual.

 

 

4.59 KVM_IOEVENTFD

 

Capability:KVM_CAP_IOEVENTFD

Architectures: all

Type: vm ioctl

Parameters: structkvm_ioeventfd (in)

Returns: 0 onsuccess, !0 on error

 

This ioctl attachesor detaches an ioeventfd to a legal pio/mmio address

within theguest.  A guest write in the registeredaddress will signal the

provided eventinstead of triggering an exit.

 

struct kvm_ioeventfd{

        __u64 datamatch;

        __u64 addr;        /* legal pio/mmio address */

        __u32 len;         /* 1, 2, 4, or 8 bytes    */

        __s32 fd;

        __u32 flags;

        __u8 pad[36];

};

 

For the special caseof virtio-ccw devices on s390, the ioevent is matched

to asubchannel/virtqueue tuple instead.

 

The following flagsare defined:

 

#defineKVM_IOEVENTFD_FLAG_DATAMATCH (1 << kvm_ioeventfd_flag_nr_datamatch)

#define KVM_IOEVENTFD_FLAG_PIO       (1 << kvm_ioeventfd_flag_nr_pio)

#defineKVM_IOEVENTFD_FLAG_DEASSIGN  (1 <<kvm_ioeventfd_flag_nr_deassign)

#defineKVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \

        (1 <<kvm_ioeventfd_flag_nr_virtio_ccw_notify)

 

If datamatch flag isset, the event will be signaled only if the written value

to the registeredaddress is equal to datamatch in struct kvm_ioeventfd.

 

For virtio-ccwdevices, addr contains the subchannel id and datamatch the

virtqueue index.

 

 

4.60 KVM_DIRTY_TLB

 

Capability: KVM_CAP_SW_TLB

Architectures: ppc

Type: vcpu ioctl

Parameters: structkvm_dirty_tlb (in)

Returns: 0 onsuccess, -1 on error

 

struct kvm_dirty_tlb{

        __u64 bitmap;

        __u32 num_dirty;

};

 

This must be calledwhenever userspace has changed an entry in the shared

TLB, prior to callingKVM_RUN on the associated vcpu.

 

The"bitmap" field is the userspace address of an array.  This array

consists of a numberof bits, equal to the total number of TLB entries as

determined by thelast successful call to KVM_CONFIG_TLB, rounded up to the

nearest multiple of64.

 

Each bit correspondsto one TLB entry, ordered the same as in the shared TLB

array.

 

The array islittle-endian: the bit 0 is the least significant bit of the

first byte, bit 8 isthe least significant bit of the second byte, etc.

This avoids anycomplications with differing word sizes.

 

The"num_dirty" field is a performance hint for KVM to determine whetherit

should skipprocessing the bitmap and just invalidate everything.  It must

be set to the numberof set bits in the bitmap.

 

 

4.61KVM_ASSIGN_SET_INTX_MASK

 

Capability:KVM_CAP_PCI_2_3

Architectures: x86

Type: vm ioctl

Parameters: structkvm_assigned_pci_dev (in)

Returns: 0 onsuccess, -1 on error

 

Allows userspace tomask PCI INTx interrupts from the assigned device.  The

kernel will notdeliver INTx interrupts to the guest between setting and

clearing ofKVM_ASSIGN_SET_INTX_MASK via this interface. This enables use of

and emulation of PCI2.3 INTx disable command register behavior.

 

This may be used forboth PCI 2.3 devices supporting INTx disable natively and

older devices lackingthis support. Userspace is responsible for emulating the

read value of theINTx disable bit in the guest visible PCI command register.

When modifying theINTx disable state, userspace should precede updating the

physical devicecommand register by calling this ioctl to inform the kernel of

the new intended INTxmask state.

 

Note that the kerneluses the device INTx disable bit to internally manage the

device interruptstate for PCI 2.3 devices.  Reads of thisregister may

therefore not matchthe expected value.  Writes should alwaysuse the guest

intended INTx disablevalue rather than attempting to read-copy-update the

current physicaldevice state.  Races between user andkernel updates to the

INTx disable bit arehandled lazily in the kernel.  It'spossible the device

may generateunintended interrupts, but they will not be injected into the

guest.

 

SeeKVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified

by assigned_dev_id.  In the flags field, onlyKVM_DEV_ASSIGN_MASK_INTX is

evaluated.

 

 

4.62KVM_CREATE_SPAPR_TCE

 

Capability:KVM_CAP_SPAPR_TCE

Architectures:powerpc

Type: vm ioctl

Parameters: structkvm_create_spapr_tce (in)

Returns: filedescriptor for manipulating the created TCE table

 

This creates avirtual TCE (translation control entry) table, which

is an IOMMU forPAPR-style virtual I/O.  It is used totranslate

logical addressesused in virtual I/O into guest physical addresses,

and provides ascatter/gather capability for PAPR virtual I/O.

 

/* forKVM_CAP_SPAPR_TCE */

structkvm_create_spapr_tce {

        __u64 liobn;

        __u32 window_size;

};

 

The liobn field givesthe logical IO bus number for which to create a

TCE table.  The window_size field specifies the size ofthe DMA window

which this TCE tablewill translate - the table will contain one 64

bit TCE entry forevery 4kiB of the DMA window.

 

When the guest issuesan H_PUT_TCE hcall on a liobn for which a TCE

table has beencreated using this ioctl(), the kernel will handle it

in real mode,updating the TCE table.  H_PUT_TCE callsfor other

liobns will cause avm exit and must be handled by userspace.

 

The return value is afile descriptor which can be passed to mmap(2)

to map the createdTCE table into userspace.  This letsuserspace read

the entries writtenby kernel-handled H_PUT_TCE calls, and also lets

userspace update theTCE table directly which is useful in some

circumstances.

 

 

4.63 KVM_ALLOCATE_RMA

 

Capability:KVM_CAP_PPC_RMA

Architectures:powerpc

Type: vm ioctl

Parameters: structkvm_allocate_rma (out)

Returns: filedescriptor for mapping the allocated RMA

 

This allocates a RealMode Area (RMA) from the pool allocated at boot

time by thekernel.  An RMA is aphysically-contiguous, aligned region

of memory used onolder POWER processors to provide the memory which

will be accessed byreal-mode (MMU off) accesses in a KVM guest.

POWER processorssupport a set of sizes for the RMA that usually

includes 64MB, 128MB,256MB and some larger powers of two.

 

/* forKVM_ALLOCATE_RMA */

structkvm_allocate_rma {

        __u64 rma_size;

};

 

The return value is afile descriptor which can be passed to mmap(2)

to map the allocatedRMA into userspace.  The mapped area canthen be

passed to theKVM_SET_USER_MEMORY_REGION ioctl to establish it as the

RMA for a virtualmachine.  The size of the RMA in bytes(which is

fixed at host kernelboot time) is returned in the rma_size field of

the argumentstructure.

 

The KVM_CAP_PPC_RMAcapability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl

is supported; 2 ifthe processor requires all virtual machines to have

an RMA, or 1 if theprocessor can use an RMA but doesn't require it,

because it supportsthe Virtual RMA (VRMA) facility.

 

 

4.64 KVM_NMI

 

Capability:KVM_CAP_USER_NMI

Architectures: x86

Type: vcpu ioctl

Parameters: none

Returns: 0 onsuccess, -1 on error

 

Queues an NMI on thethread's vcpu.  Note this is well definedonly

whenKVM_CREATE_IRQCHIP has not been called, since this is an interface

between the virtualcpu core and virtual local APIC.  AfterKVM_CREATE_IRQCHIP

has been called, thisinterface is completely emulated within the kernel.

 

To use this toemulate the LINT1 input with KVM_CREATE_IRQCHIP, use the

following algorithm:

 

  - pause the vpcu

  - read the local APIC's state (KVM_GET_LAPIC)

  - check whether changing LINT1 will queue anNMI (see the LVT entry for LINT1)

  - if so, issue KVM_NMI

  - resume the vcpu

 

Some guests configurethe LINT1 NMI input to cause a panic, aiding in

debugging.

 

 

4.65KVM_S390_UCAS_MAP

 

Capability:KVM_CAP_S390_UCONTROL

Architectures: s390

Type: vcpu ioctl

Parameters: structkvm_s390_ucas_mapping (in)

Returns: 0 in case ofsuccess

 

The parameter isdefined like this:

        struct kvm_s390_ucas_mapping {

               __u64 user_addr;

               __u64 vcpu_addr;

               __u64 length;

        };

 

This ioctl maps thememory at "user_addr" with the length "length" to

the vcpu's addressspace starting at "vcpu_addr". All parameters need to

be aligned by 1megabyte.

 

 

4.66KVM_S390_UCAS_UNMAP

 

Capability:KVM_CAP_S390_UCONTROL

Architectures: s390

Type: vcpu ioctl

Parameters: structkvm_s390_ucas_mapping (in)

Returns: 0 in case ofsuccess

 

The parameter isdefined like this:

        struct kvm_s390_ucas_mapping {

               __u64 user_addr;

               __u64 vcpu_addr;

               __u64 length;

        };

 

This ioctl unmaps thememory in the vcpu's address space starting at

"vcpu_addr"with the length "length". The field "user_addr" is ignored.

All parameters needto be aligned by 1 megabyte.

 

 

4.67KVM_S390_VCPU_FAULT

 

Capability:KVM_CAP_S390_UCONTROL

Architectures: s390

Type: vcpu ioctl

Parameters: vcpuabsolute address (in)

Returns: 0 in case ofsuccess

 

This call creates apage table entry on the virtual cpu's address space

(for user controlledvirtual machines) or the virtual machine's address

space (for regularvirtual machines). This only works for minor faults,

thus it's recommendedto access subject memory page via the user page

table upfront. Thisis useful to handle validity intercepts for user

controlled virtualmachines to fault in the virtual cpu's lowcore pages

prior to calling theKVM_RUN ioctl.

 

 

4.68 KVM_SET_ONE_REG

 

Capability:KVM_CAP_ONE_REG

Architectures: all

Type: vcpu ioctl

Parameters: structkvm_one_reg (in)

Returns: 0 onsuccess, negative value on failure

 

struct kvm_one_reg {

       __u64 id;

       __u64 addr;

};

 

Using this ioctl, asingle vcpu register can be set to a specific value

defined by user spacewith the passed in struct kvm_one_reg, where id

refers to theregister identifier as described below and addr is a pointer

to a variable withthe respective size. There can be architecture agnostic

and architecturespecific registers. Each have their own range of operation

and their ownconstants and width. To keep track of the implemented

registers, find alist below:

 

  Arch |           Register            | Width (bits)

        |                               |

  PPC   |KVM_REG_PPC_HIOR              | 64

  PPC   |KVM_REG_PPC_IAC1              | 64

  PPC   |KVM_REG_PPC_IAC2              | 64

  PPC   |KVM_REG_PPC_IAC3              | 64

  PPC   |KVM_REG_PPC_IAC4              | 64

  PPC   |KVM_REG_PPC_DAC1              | 64

  PPC   |KVM_REG_PPC_DAC2              | 64

  PPC   |KVM_REG_PPC_DABR              | 64

  PPC   |KVM_REG_PPC_DSCR              | 64

  PPC   |KVM_REG_PPC_PURR              | 64

  PPC   |KVM_REG_PPC_SPURR             | 64

  PPC   |KVM_REG_PPC_DAR               | 64

  PPC   |KVM_REG_PPC_DSISR             | 32

  PPC   |KVM_REG_PPC_AMR               | 64

  PPC   |KVM_REG_PPC_UAMOR             | 64

  PPC   |KVM_REG_PPC_MMCR0             | 64

  PPC   |KVM_REG_PPC_MMCR1             | 64

  PPC   |KVM_REG_PPC_MMCRA             | 64

  PPC   |KVM_REG_PPC_MMCR2             | 64

  PPC   |KVM_REG_PPC_MMCRS             | 64

  PPC   |KVM_REG_PPC_SIAR              | 64

  PPC   |KVM_REG_PPC_SDAR              | 64

  PPC   |KVM_REG_PPC_SIER              | 64

  PPC   |KVM_REG_PPC_PMC1              | 32

  PPC   |KVM_REG_PPC_PMC2              | 32

  PPC   |KVM_REG_PPC_PMC3              | 32

  PPC   |KVM_REG_PPC_PMC4              | 32

  PPC   |KVM_REG_PPC_PMC5              | 32

  PPC   |KVM_REG_PPC_PMC6              | 32

  PPC   |KVM_REG_PPC_PMC7              | 32

  PPC   |KVM_REG_PPC_PMC8              | 32

  PPC   |KVM_REG_PPC_FPR0              | 64

          ...

  PPC   |KVM_REG_PPC_FPR31             | 64

  PPC   |KVM_REG_PPC_VR0               | 128

          ...

  PPC   |KVM_REG_PPC_VR31              | 128

  PPC   |KVM_REG_PPC_VSR0              | 128

          ...

  PPC   |KVM_REG_PPC_VSR31             | 128

  PPC   |KVM_REG_PPC_FPSCR             | 64

  PPC   |KVM_REG_PPC_VSCR              | 32

  PPC   |KVM_REG_PPC_VPA_ADDR          | 64

  PPC   |KVM_REG_PPC_VPA_SLB           | 128

  PPC   |KVM_REG_PPC_VPA_DTL           | 128

  PPC   |KVM_REG_PPC_EPCR              | 32

  PPC   |KVM_REG_PPC_EPR               | 32

  PPC   |KVM_REG_PPC_TCR               | 32

  PPC   |KVM_REG_PPC_TSR               | 32

  PPC   |KVM_REG_PPC_OR_TSR            | 32

  PPC   |KVM_REG_PPC_CLEAR_TSR         | 32

  PPC   |KVM_REG_PPC_MAS0              | 32

  PPC   |KVM_REG_PPC_MAS1              | 32

  PPC   |KVM_REG_PPC_MAS2              | 64

  PPC   |KVM_REG_PPC_MAS7_3            | 64

  PPC   |KVM_REG_PPC_MAS4              | 32

  PPC   |KVM_REG_PPC_MAS6              | 32

  PPC   |KVM_REG_PPC_MMUCFG            | 32

  PPC   |KVM_REG_PPC_TLB0CFG           | 32

  PPC   |KVM_REG_PPC_TLB1CFG           | 32

  PPC   |KVM_REG_PPC_TLB2CFG           | 32

  PPC   |KVM_REG_PPC_TLB3CFG           | 32

  PPC   |KVM_REG_PPC_TLB0PS            | 32

  PPC   |KVM_REG_PPC_TLB1PS            | 32

  PPC   |KVM_REG_PPC_TLB2PS            | 32

  PPC   |KVM_REG_PPC_TLB3PS            | 32

  PPC   |KVM_REG_PPC_EPTCFG            | 32

  PPC   |KVM_REG_PPC_ICP_STATE         | 64

  PPC   |KVM_REG_PPC_TB_OFFSET         | 64

  PPC   |KVM_REG_PPC_SPMC1             | 32

  PPC   |KVM_REG_PPC_SPMC2             | 32

  PPC   |KVM_REG_PPC_IAMR              | 64

  PPC   |KVM_REG_PPC_TFHAR             | 64

  PPC   |KVM_REG_PPC_TFIAR             | 64

  PPC   |KVM_REG_PPC_TEXASR            | 64

  PPC   |KVM_REG_PPC_FSCR              | 64

  PPC   |KVM_REG_PPC_PSPB              | 32

  PPC   |KVM_REG_PPC_EBBHR             | 64

  PPC   |KVM_REG_PPC_EBBRR             | 64

  PPC   |KVM_REG_PPC_BESCR             | 64

  PPC   |KVM_REG_PPC_TAR               | 64

  PPC   |KVM_REG_PPC_DPDES             | 64

  PPC   |KVM_REG_PPC_DAWR              | 64

  PPC   |KVM_REG_PPC_DAWRX             | 64

  PPC   |KVM_REG_PPC_CIABR             | 64

  PPC   |KVM_REG_PPC_IC                | 64

  PPC   |KVM_REG_PPC_VTB               | 64

  PPC   |KVM_REG_PPC_CSIGR             | 64

  PPC   |KVM_REG_PPC_TACR              | 64

  PPC   |KVM_REG_PPC_TCSCR             | 64

  PPC   |KVM_REG_PPC_PID               | 64

  PPC   |KVM_REG_PPC_ACOP              | 64

  PPC   |KVM_REG_PPC_VRSAVE            | 32

  PPC   |KVM_REG_PPC_LPCR              | 32

  PPC   |KVM_REG_PPC_LPCR_64           | 64

  PPC   |KVM_REG_PPC_PPR               | 64

  PPC   |KVM_REG_PPC_ARCH_COMPAT       | 32

  PPC   |KVM_REG_PPC_DABRX             | 32

  PPC   |KVM_REG_PPC_WORT              | 64

  PPC   |KVM_REG_PPC_SPRG9             | 64

  PPC   |KVM_REG_PPC_DBSR              | 32

  PPC   |KVM_REG_PPC_TM_GPR0           | 64

          ...

  PPC   |KVM_REG_PPC_TM_GPR31          | 64

  PPC   |KVM_REG_PPC_TM_VSR0           | 128

          ...

  PPC   |KVM_REG_PPC_TM_VSR63          | 128

  PPC   |KVM_REG_PPC_TM_CR             | 64

  PPC   |KVM_REG_PPC_TM_LR             | 64

  PPC   |KVM_REG_PPC_TM_CTR            | 64

  PPC   |KVM_REG_PPC_TM_FPSCR          | 64

  PPC   |KVM_REG_PPC_TM_AMR            | 64

  PPC   |KVM_REG_PPC_TM_PPR            | 64

  PPC   |KVM_REG_PPC_TM_VRSAVE         | 64

  PPC   |KVM_REG_PPC_TM_VSCR           | 32

  PPC   |KVM_REG_PPC_TM_DSCR           | 64

  PPC   |KVM_REG_PPC_TM_TAR            | 64

        |                               |

  MIPS  |KVM_REG_MIPS_R0               | 64

          ...

  MIPS  |KVM_REG_MIPS_R31              | 64

  MIPS  |KVM_REG_MIPS_HI               | 64

  MIPS  |KVM_REG_MIPS_LO               | 64

  MIPS  |KVM_REG_MIPS_PC               | 64

  MIPS  |KVM_REG_MIPS_CP0_INDEX        | 32

  MIPS  |KVM_REG_MIPS_CP0_CONTEXT      | 64

  MIPS  |KVM_REG_MIPS_CP0_USERLOCAL    | 64

  MIPS  |KVM_REG_MIPS_CP0_PAGEMASK     | 32

  MIPS  |KVM_REG_MIPS_CP0_WIRED        | 32

  MIPS  |KVM_REG_MIPS_CP0_HWRENA       | 32

  MIPS  |KVM_REG_MIPS_CP0_BADVADDR     | 64

  MIPS  |KVM_REG_MIPS_CP0_COUNT        | 32

  MIPS  |KVM_REG_MIPS_CP0_ENTRYHI      | 64

  MIPS  |KVM_REG_MIPS_CP0_COMPARE      | 32

  MIPS  |KVM_REG_MIPS_CP0_STATUS       | 32

  MIPS  |KVM_REG_MIPS_CP0_CAUSE        | 32

  MIPS  |KVM_REG_MIPS_CP0_EPC          | 64

  MIPS  |KVM_REG_MIPS_CP0_PRID         | 32

  MIPS  |KVM_REG_MIPS_CP0_CONFIG       | 32

  MIPS  |KVM_REG_MIPS_CP0_CONFIG1      | 32

  MIPS  |KVM_REG_MIPS_CP0_CONFIG2      | 32

  MIPS  |KVM_REG_MIPS_CP0_CONFIG3      | 32

  MIPS  |KVM_REG_MIPS_CP0_CONFIG4      | 32

  MIPS  |KVM_REG_MIPS_CP0_CONFIG5      | 32

  MIPS  |KVM_REG_MIPS_CP0_CONFIG7      | 32

  MIPS  |KVM_REG_MIPS_CP0_ERROREPC     | 64

  MIPS  |KVM_REG_MIPS_COUNT_CTL        | 64

  MIPS  |KVM_REG_MIPS_COUNT_RESUME     | 64

  MIPS  |KVM_REG_MIPS_COUNT_HZ         | 64

  MIPS  |KVM_REG_MIPS_FPR_32(0..31)    | 32

  MIPS  |KVM_REG_MIPS_FPR_64(0..31)    | 64

  MIPS  |KVM_REG_MIPS_VEC_128(0..31)   | 128

  MIPS  |KVM_REG_MIPS_FCR_IR           | 32

  MIPS  |KVM_REG_MIPS_FCR_CSR          | 32

  MIPS  |KVM_REG_MIPS_MSA_IR           | 32

  MIPS  |KVM_REG_MIPS_MSA_CSR          | 32

 

ARM registers are mappedusing the lower 32 bits.  The upper 16 ofthat

is the register grouptype, or coprocessor number:

 

ARM core registershave the following id bit patterns:

  0x4020 0000 0010 <index into the kvm_regsstruct:16>

 

ARM 32-bit CP15registers have the following id bit patterns:

  0x4020 0000 000F <zero:1> <crn:4><crm:4> <opc1:4> <opc2:3>

 

ARM 64-bit CP15registers have the following id bit patterns:

  0x4030 0000 000F <zero:1><zero:4> <crm:4> <opc1:4> <zero:3>

 

ARM CCSIDR registersare demultiplexed by CSSELR value:

  0x4020 0000 0011 00 <csselr:8>

 

ARM 32-bit VFPcontrol registers have the following id bit patterns:

  0x4020 0000 0012 1 <regno:12>

 

ARM 64-bit FPregisters have the following id bit patterns:

  0x4030 0000 0012 0 <regno:12>

 

 

arm64 registers aremapped using the lower 32 bits. The upper 16 of

that is the registergroup type, or coprocessor number:

 

arm64 core/FP-SIMDregisters have the following id bit patterns. Note

that the size of theaccess is variable, as the kvm_regs structure

contains elementsranging from 32 to 128 bits. The index is a 32bit

value in the kvm_regsstructure seen as a 32bit array.

  0x60x0 0000 0010 <index into the kvm_regsstruct:16>

 

arm64 CCSIDRregisters are demultiplexed by CSSELR value:

  0x6020 0000 0011 00 <csselr:8>

 

arm64 systemregisters have the following id bit patterns:

  0x6030 0000 0013 <op0:2> <op1:3><crn:4> <crm:4> <op2:3>

 

 

MIPS registers aremapped using the lower 32 bits.  Theupper 16 of that is

the register grouptype:

 

MIPS core registers(see above) have the following id bit patterns:

  0x7030 0000 0000 <reg:16>

 

MIPS CP0 registers(see KVM_REG_MIPS_CP0_* above) have the following id bit

patterns depending onwhether they're 32-bit or 64-bit registers:

  0x7020 0000 0001 00 <reg:5><sel:3>   (32-bit)

  0x7030 0000 0001 00 <reg:5><sel:3>   (64-bit)

 

MIPS KVM controlregisters (see above) have the following id bit patterns:

  0x7030 0000 0002 <reg:16>

 

MIPS FPU registers(see KVM_REG_MIPS_FPR_{32,64}() above) have the following

id bit patterns dependingon the size of the register being accessed. They are

always accessedaccording to the current guest FPU mode (Status.FR and

Config5.FRE), i.e. asthe guest would see them, and they become unpredictable

if the guest FPU modeis changed. MIPS SIMD Architecture (MSA) vector

registers (seeKVM_REG_MIPS_VEC_128() above) have similar patterns as they

overlap the FPUregisters:

  0x7020 0000 0003 00 <0:3> <reg:5>(32-bit FPU registers)

  0x7030 0000 0003 00 <0:3> <reg:5>(64-bit FPU registers)

  0x7040 0000 0003 00 <0:3> <reg:5>(128-bit MSA vector registers)

 

MIPS FPU controlregisters (see KVM_REG_MIPS_FCR_{IR,CSR} above) have the

following id bitpatterns:

  0x7020 0000 0003 01 <0:3> <reg:5>

 

MIPS MSA controlregisters (see KVM_REG_MIPS_MSA_{IR,CSR} above) have the

following id bitpatterns:

  0x7020 0000 0003 02 <0:3> <reg:5>

 

 

4.69 KVM_GET_ONE_REG

 

Capability:KVM_CAP_ONE_REG

Architectures: all

Type: vcpu ioctl

Parameters: structkvm_one_reg (in and out)

Returns: 0 onsuccess, negative value on failure

 

This ioctl allows toreceive the value of a single register implemented

in a vcpu. Theregister to read is indicated by the "id" field of the

kvm_one_reg structpassed in. On success, the register value can be found

at the memorylocation pointed to by "addr".

 

The list of registersaccessible using this interface is identical to the

list in 4.68.

 

 

4.70KVM_KVMCLOCK_CTRL

 

Capability:KVM_CAP_KVMCLOCK_CTRL

Architectures: Anythat implement pvclocks (currently x86 only)

Type: vcpu ioctl

Parameters: None

Returns: 0 onsuccess, -1 on error

 

This signals to thehost kernel that the specified guest is being paused by

userspace.  The host will set a flag in the pvclockstructure that is checked

from the soft lockupwatchdog.  The flag is part of thepvclock structure that

is shared betweenguest and host, specifically the second bit of the flags

field of thepvclock_vcpu_time_info structure.  Itwill be set exclusively by

the host andread/cleared exclusively by the guest. The guest operation of

checking and clearingthe flag must an atomic operation so

load-link/store-conditional,or equivalent must be used.  There aretwo cases

where the guest willclear the flag: when the soft lockup watchdog timer resets

itself or when a softlockup is detected.  This ioctl can becalled any time

after pausing thevcpu, but before it is resumed.

 

 

4.71 KVM_SIGNAL_MSI

 

Capability:KVM_CAP_SIGNAL_MSI

Architectures: x86

Type: vm ioctl

Parameters: structkvm_msi (in)

Returns: >0 ondelivery, 0 if guest blocked the MSI, and -1 on error

 

Directly inject a MSImessage. Only valid with in-kernel irqchip that handles

MSI messages.

 

struct kvm_msi {

        __u32 address_lo;

        __u32 address_hi;

        __u32 data;

        __u32 flags;

        __u8 pad[16];

};

 

No flags are definedso far. The corresponding field must be 0.

 

 

4.71 KVM_CREATE_PIT2

 

Capability:KVM_CAP_PIT2

Architectures: x86

Type: vm ioctl

Parameters: structkvm_pit_config (in)

Returns: 0 onsuccess, -1 on error

 

Creates an in-kerneldevice model for the i8254 PIT. This call is only valid

after enabling in-kernelirqchip support via KVM_CREATE_IRQCHIP. The following

parameters have to bepassed:

 

struct kvm_pit_config{

        __u32 flags;

        __u32 pad[15];

};

 

Valid flags are:

 

#defineKVM_PIT_SPEAKER_DUMMY     1 /* emulatespeaker port stub */

 

PIT timer interruptsmay use a per-VM kernel thread for injection. If it

exists, this threadwill have a name of the following pattern:

 

kvm-pit/<owner-process-pid>

 

When running a guestwith elevated priorities, the scheduling parameters of

this thread may haveto be adjusted accordingly.

 

This IOCTL replacesthe obsolete KVM_CREATE_PIT.

 

 

4.72 KVM_GET_PIT2

 

Capability:KVM_CAP_PIT_STATE2

Architectures: x86

Type: vm ioctl

Parameters: structkvm_pit_state2 (out)

Returns: 0 onsuccess, -1 on error

 

Retrieves the stateof the in-kernel PIT model. Only valid after

KVM_CREATE_PIT2. Thestate is returned in the following structure:

 

struct kvm_pit_state2{

        struct kvm_pit_channel_statechannels[3];

        __u32 flags;

        __u32 reserved[9];

};

 

Valid flags are:

 

/* disable PIT inHPET legacy mode */

#defineKVM_PIT_FLAGS_HPET_LEGACY  0x00000001

 

This IOCTL replacesthe obsolete KVM_GET_PIT.

 

 

4.73 KVM_SET_PIT2

 

Capability:KVM_CAP_PIT_STATE2

Architectures: x86

Type: vm ioctl

Parameters: structkvm_pit_state2 (in)

Returns: 0 onsuccess, -1 on error

 

Sets the state of thein-kernel PIT model. Only valid after KVM_CREATE_PIT2.

See KVM_GET_PIT2 fordetails on struct kvm_pit_state2.

 

This IOCTL replacesthe obsolete KVM_SET_PIT.

 

 

4.74KVM_PPC_GET_SMMU_INFO

 

Capability:KVM_CAP_PPC_GET_SMMU_INFO

Architectures:powerpc

Type: vm ioctl

Parameters: None

Returns: 0 onsuccess, -1 on error

 

This populates andreturns a structure describing the features of

the"Server" class MMU emulation supported by KVM.

This can in turn beused by userspace to generate the appropriate

device-treeproperties for the guest operating system.

 

The structurecontains some global information, followed by an

array of supportedsegment page sizes:

 

      struct kvm_ppc_smmu_info {

            __u64 flags;

            __u32 slb_size;

            __u32 pad;

            struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];

      };

 

The supported flagsare:

 

    - KVM_PPC_PAGE_SIZES_REAL:

        When that flag is set, guest page sizesmust "fit" the backing

        store page sizes. When not set, anypage size in the list can

        be used regardless of how they arebacked by userspace.

 

    - KVM_PPC_1T_SEGMENTS

        The emulated MMU supports 1T segmentsin addition to the

        standard 256M ones.

 

The"slb_size" field indicates how many SLB entries are supported

 

The "sps"array contains 8 entries indicating the supported base

page sizes for asegment in increasing order. Each entry is defined

as follow:

 

   struct kvm_ppc_one_seg_page_size {

        __u32 page_shift;      /* Base page shift of segment (or 0) */

        __u32 slb_enc;         /* SLB encoding for BookS */

        struct kvm_ppc_one_page_sizeenc[KVM_PPC_PAGE_SIZES_MAX_SZ];

   };

 

An entry with a"page_shift" of 0 is unused. Because the array is

organized inincreasing order, a lookup can stop when encoutering

such an entry.

 

The"slb_enc" field provides the encoding to use in the SLB for the

page size. The bitsare in positions such as the value can directly

be OR'ed into the"vsid" argument of the slbmte instruction.

 

The "enc"array is a list which for each of those segment base page

size provides thelist of supported actual page sizes (which can be

only larger or equalto the base page size), along with the

correspondingencoding in the hash PTE. Similarly, the array is

8 entries sorted byincreasing sizes and an entry with a "0" shift

is an empty entry anda terminator:

 

   struct kvm_ppc_one_page_size {

        __u32 page_shift;      /* Page shift (or 0) */

        __u32 pte_enc;         /* Encoding in the HPTE (>>12) */

   };

 

The"pte_enc" field provides a value that can OR'ed into the hash

PTE's RPN field (ie,it needs to be shifted left by 12 to OR it

into the hash PTEsecond double word).

 

4.75 KVM_IRQFD

 

Capability:KVM_CAP_IRQFD

Architectures: x86s390 arm arm64

Type: vm ioctl

Parameters: structkvm_irqfd (in)

Returns: 0 onsuccess, -1 on error

 

Allows setting aneventfd to directly trigger a guest interrupt.

kvm_irqfd.fdspecifies the file descriptor to use as the eventfd and

kvm_irqfd.gsispecifies the irqchip pin toggled by this event.  When

an event is triggeredon the eventfd, an interrupt is injected into

the guest using thespecified gsi pin.  The irqfd is removedusing

theKVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd

and kvm_irqfd.gsi.

 

WithKVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify

mechanism allowingemulation of level-triggered, irqfd-based

interrupts.  When KVM_IRQFD_FLAG_RESAMPLE is set the usermust pass an

additional eventfd inthe kvm_irqfd.resamplefd field.  Whenoperating

in resample mode,posting of an interrupt through kvm_irq.fd asserts

the specified gsi inthe irqchip.  When the irqchip isresampled, such

as from an EOI, thegsi is de-asserted and the user is notified via

kvm_irqfd.resamplefd.  It is the user's responsibility to re-queue

the interrupt if thedevice making use of it still requires service.

Note that closing theresamplefd is not sufficient to disable the

irqfd.  The KVM_IRQFD_FLAG_RESAMPLE is only necessaryon assignment

and need not bespecified with KVM_IRQFD_FLAG_DEASSIGN.

 

On ARM/ARM64, the gsifield in the kvm_irqfd struct specifies the Shared

Peripheral Interrupt(SPI) index, such that the GIC interrupt ID is

given by gsi + 32.

 

4.76KVM_PPC_ALLOCATE_HTAB

 

Capability:KVM_CAP_PPC_ALLOC_HTAB

Architectures:powerpc

Type: vm ioctl

Parameters: Pointerto u32 containing hash table order (in/out)

Returns: 0 onsuccess, -1 on error

 

This requests thehost kernel to allocate an MMU hash table for a

guest using the PAPRparavirtualization interface.  This onlydoes

anything if thekernel is configured to use the Book 3S HV style of

virtualization.  Otherwise the capability doesn't exist andthe ioctl

returns an ENOTTYerror.  The rest of this descriptionassumes Book 3S

HV.

 

There must be novcpus running when this ioctl is called; if there

are, it will donothing and return an EBUSY error.

 

The parameter is apointer to a 32-bit unsigned integer variable

containing the order(log base 2) of the desired size of the hash

table, which must bebetween 18 and 46.  On successful returnfrom the

ioctl, it will havebeen updated with the order of the hash table that

was allocated.

 

If no hash table hasbeen allocated when any vcpu is asked to run

(with the KVM_RUNioctl), the host kernel will allocate a

default-sized hashtable (16 MB).

 

If this ioctl iscalled when a hash table has already been allocated,

the kernel will clearout the existing hash table (zero all HPTEs) and

return the hash tableorder in the parameter.  (If the guest isusing

the virtualizedreal-mode area (VRMA) facility, the kernel will

re-create the VMRAHPTEs on the next KVM_RUN of any vcpu.)

 

4.77KVM_S390_INTERRUPT

 

Capability: basic

Architectures: s390

Type: vm ioctl, vcpuioctl

Parameters: structkvm_s390_interrupt (in)

Returns: 0 onsuccess, -1 on error

 

Allows to inject aninterrupt to the guest. Interrupts can be floating

(vm ioctl) or per cpu(vcpu ioctl), depending on the interrupt type.

 

Interrupt parametersare passed via kvm_s390_interrupt:

 

structkvm_s390_interrupt {

        __u32 type;

        __u32 parm;

        __u64 parm64;

};

 

type can be one of thefollowing:

 

KVM_S390_SIGP_STOP(vcpu) - sigp stop; optional flags in parm

KVM_S390_PROGRAM_INT(vcpu) - program check; code in parm

KVM_S390_SIGP_SET_PREFIX(vcpu) - sigp set prefix; prefix address in parm

KVM_S390_RESTART(vcpu) - restart

KVM_S390_INT_CLOCK_COMP(vcpu) - clock comparator interrupt

KVM_S390_INT_CPU_TIMER(vcpu) - CPU timer interrupt

KVM_S390_INT_VIRTIO(vm) - virtio external interrupt; external interrupt

                          parameters in parm and parm64

KVM_S390_INT_SERVICE(vm) - sclp external interrupt; sclp parameter in parm

KVM_S390_INT_EMERGENCY(vcpu) - sigp emergency; source cpu in parm

KVM_S390_INT_EXTERNAL_CALL(vcpu) - sigp external call; source cpu in parm

KVM_S390_INT_IO(ai,cssid,ssid,schid)(vm) - compound value to indicate an

    I/O interrupt (ai - adapter interrupt;cssid,ssid,schid - subchannel);

    I/O interruption parameters in parm(subchannel) and parm64 (intparm,

    interruption subclass)

KVM_S390_MCHK (vm,vcpu) - machine check interrupt; cr 14 bits in parm,

                           machine check interrupt code in parm64(note that

                           machine checksneeding further payload are not

                           supported by thisioctl)

 

Note that the vcpuioctl is asynchronous to vcpu execution.

 

4.78 KVM_PPC_GET_HTAB_FD

 

Capability:KVM_CAP_PPC_HTAB_FD

Architectures:powerpc

Type: vm ioctl

Parameters: Pointerto struct kvm_get_htab_fd (in)

Returns: filedescriptor number (>= 0) on success, -1 on error

 

This returns a filedescriptor that can be used either to read out the

entries in theguest's hashed page table (HPT), or to write entries to

initialize theHPT.  The returned fd can only be writtento if the

KVM_GET_HTAB_WRITEbit is set in the flags field of the argument, and

can only be read ifthat bit is clear.  The argument structlooks like

this:

 

/* ForKVM_PPC_GET_HTAB_FD */

structkvm_get_htab_fd {

        __u64   flags;

        __u64   start_index;

        __u64   reserved[2];

};

 

/* Values forkvm_get_htab_fd.flags */

#defineKVM_GET_HTAB_BOLTED_ONLY      ((__u64)0x1)

#define KVM_GET_HTAB_WRITE            ((__u64)0x2)

 

The `start_index'field gives the index in the HPT of the entry at

which to startreading.  It is ignored when writing.

 

Reads on the fd willinitially supply information about all

"interesting"HPT entries.  Interesting entries arethose with the

bolted bit set, ifthe KVM_GET_HTAB_BOLTED_ONLY bit is set, otherwise

all entries.  When the end of the HPT is reached, theread() will

return.  If read() is called again on the fd, it willstart again from

the beginning of theHPT, but will only return HPT entries that have

changed since theywere last read.

 

Data read or writtenis structured as a header (8 bytes) followed by a

series of valid HPTentries (16 bytes) each.  The headerindicates how

many valid HPTentries there are and how many invalid entries follow

the validentries.  The invalid entries are notrepresented explicitly

in the stream.  The header format is:

 

structkvm_get_htab_header {

        __u32   index;

        __u16   n_valid;

        __u16   n_invalid;

};

 

Writes to the fdcreate HPT entries starting at the index given in the

header; first`n_valid' valid entries with contents from the data

written, then`n_invalid' invalid entries, invalidating any previously

valid entries found.

 

4.79KVM_CREATE_DEVICE

 

Capability:KVM_CAP_DEVICE_CTRL

Type: vm ioctl

Parameters: structkvm_create_device (in/out)

Returns: 0 onsuccess, -1 on error

Errors:

  ENODEV: The device type is unknown orunsupported

  EEXIST: Device already created, and this typeof device may not

          be instantiated multiple times

 

  Other error conditions may be defined byindividual device types or

  have their standard meanings.

 

Creates an emulateddevice in the kernel.  The filedescriptor returned

in fd can be usedwith KVM_SET/GET/HAS_DEVICE_ATTR.

 

If theKVM_CREATE_DEVICE_TEST flag is set, only test whether the

device type issupported (not necessarily whether it can be created

in the current vm).

 

Individual devicesshould not define flags.  Attributesshould be used

for specifying anybehavior that is not implied by the device type

number.

 

structkvm_create_device {

        __u32   type;   /* in: KVM_DEV_TYPE_xxx */

        __u32   fd;     /* out: device handle */

        __u32   flags;  /* in: KVM_CREATE_DEVICE_xxx */

};

 

4.80KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR

 

Capability:KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device

Type: device ioctl,vm ioctl

Parameters: structkvm_device_attr

Returns: 0 onsuccess, -1 on error

Errors:

  ENXIO: The group or attribute is unknown/unsupported for this device

  EPERM: The attribute cannot (currently) be accessed this way

          (e.g. read-only attribute, orattribute that only makes

          sense when the device is in adifferent state)

 

  Other error conditions may be defined byindividual device types.

 

Gets/sets a specifiedpiece of device configuration and/or state. The

semantics aredevice-specific.  See individual devicedocumentation in

the"devices" directory.  As withONE_REG, the size of the data

transferred isdefined by the particular attribute.

 

structkvm_device_attr {

        __u32   flags;         /* no flags currently defined */

        __u32   group;         /* device-defined */

        __u64   attr;          /* group-defined */

        __u64   addr;          /* userspace address of attr data */

};

 

4.81KVM_HAS_DEVICE_ATTR

 

Capability:KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device

Type: device ioctl,vm ioctl

Parameters: structkvm_device_attr

Returns: 0 onsuccess, -1 on error

Errors:

  ENXIO: The group or attribute is unknown/unsupported for this device

 

Tests whether adevice supports a particular attribute. A successful

return indicates theattribute is implemented.  It does notnecessarily

indicate that theattribute can be read or written in the device's

current state.  "addr" is ignored.

 

4.82KVM_ARM_VCPU_INIT

 

Capability: basic

Architectures: arm,arm64

Type: vcpu ioctl

Parameters: structkvm_vcpu_init (in)

Returns: 0 onsuccess; -1 on error

Errors:

  EINVAL:    the target isunknown, or the combination of features is invalid.

  ENOENT:    a features bitspecified is unknown.

 

This tells KVM whattype of CPU to present to the guest, and what

optional features itshould have.  This will cause a reset of the cpu

registers to theirinitial values.  If this is not called, KVM_RUN will

return ENOEXEC forthat vcpu.

 

Note that becausesome registers reflect machine topology, all vcpus

should be createdbefore this ioctl is invoked.

 

Userspace can callthis function multiple times for a given vcpu, including

after the vcpu hasbeen run. This will reset the vcpu to its initial

state. All calls tothis function after the initial call must use the same

target and same setof feature flags, otherwise EINVAL will be returned.

 

Possible features:

        - KVM_ARM_VCPU_POWER_OFF: Starts the CPUin a power-off state.

         Depends on KVM_CAP_ARM_PSCI.  Ifnot set, the CPU will be powered on

         and execute guest code when KVM_RUN is called.

        - KVM_ARM_VCPU_EL1_32BIT: Starts the CPUin a 32bit mode.

         Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).

        - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCIv0.2 for the CPU.

         Depends on KVM_CAP_ARM_PSCI_0_2.

 

 

4.83 KVM_ARM_PREFERRED_TARGET

 

Capability: basic

Architectures: arm,arm64

Type: vm ioctl

Parameters: structstruct kvm_vcpu_init (out)

Returns: 0 onsuccess; -1 on error

Errors:

  ENODEV:   no preferred target available for the host

 

This queries KVM forpreferred CPU target type which can be emulated

by KVM on underlyinghost.

 

The ioctl returnsstruct kvm_vcpu_init instance containing information

about preferred CPUtarget type and recommended features for it. The

kvm_vcpu_init->featuresbitmap returned will have feature bits set if

the preferred targetrecommends setting these features, but this is

not mandatory.

 

The informationreturned by this ioctl can be used to prepare an instance

of structkvm_vcpu_init for KVM_ARM_VCPU_INIT ioctl which will result in

in VCPU matchingunderlying host.

 

 

4.84 KVM_GET_REG_LIST

 

Capability: basic

Architectures: arm,arm64, mips

Type: vcpu ioctl

Parameters: structkvm_reg_list (in/out)

Returns: 0 onsuccess; -1 on error

Errors:

  E2BIG:     the regindex list is too big to fit in the array specified by

             theuser (the number required will be written into n).

 

struct kvm_reg_list {

        __u64 n; /* number of registers in reg[]*/

        __u64 reg[0];

};

 

This ioctl returnsthe guest registers that are supported for the

KVM_GET_ONE_REG/KVM_SET_ONE_REGcalls.

 

 

4.85KVM_ARM_SET_DEVICE_ADDR (deprecated)

 

Capability:KVM_CAP_ARM_SET_DEVICE_ADDR

Architectures: arm,arm64

Type: vm ioctl

Parameters: structkvm_arm_device_address (in)

Returns: 0 onsuccess, -1 on error

Errors:

  ENODEV: The device id is unknown

  ENXIO: Device not supported on current system

  EEXIST: Address already set

  E2BIG: Address outside guest physical address space

  EBUSY: Address overlaps with other device range

 

structkvm_arm_device_addr {

        __u64 id;

        __u64 addr;

};

 

Specify a deviceaddress in the guest's physical address space where guests

can access emulatedor directly exposed devices, which the host kernel needs

to know about. The idfield is an architecture specific identifier for a

specific device.

 

ARM/arm64 divides theid field into two parts, a device id and an

address type idspecific to the individual device.

 

  bits: | 63        ...       32 | 31    ...   16 | 15    ...    0 |

  field: |        0x00000000      |    device id   |  addr type id  |

 

ARM/arm64 currentlyonly require this when using the in-kernel GIC

support for thehardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2

as the deviceid.  When setting the base address forthe guest's

mapping of the VGICvirtual CPU and distributor interface, the ioctl

must be called aftercalling KVM_CREATE_IRQCHIP, but before calling

KVM_RUN on any of theVCPUs.  Calling this ioctl twice for anyof the

base addresses willreturn -EEXIST.

 

Note, this IOCTL isdeprecated and the more flexible SET/GET_DEVICE_ATTR API

should be usedinstead.

 

 

4.86KVM_PPC_RTAS_DEFINE_TOKEN

 

Capability:KVM_CAP_PPC_RTAS

Architectures: ppc

Type: vm ioctl

Parameters: structkvm_rtas_token_args

Returns: 0 onsuccess, -1 on error

 

Defines a token valuefor a RTAS (Run Time Abstraction Services)

service in order toallow it to be handled in the kernel. The

argument struct givesthe name of the service, which must be the name

of a service that hasa kernel-side implementation.  If thetoken

value is non-zero, itwill be associated with that service, and

subsequent RTAS callsby the guest specifying that token will be

handled by thekernel.  If the token value is 0, thenany token

associated with theservice will be forgotten, and subsequent RTAS

calls by the guestfor that service will be passed to userspace to be

handled.

 

4.87KVM_SET_GUEST_DEBUG

 

Capability:KVM_CAP_SET_GUEST_DEBUG

Architectures: x86,s390, ppc

Type: vcpu ioctl

Parameters: structkvm_guest_debug (in)

Returns: 0 onsuccess; -1 on error

 

structkvm_guest_debug {

       __u32 control;

       __u32 pad;

       struct kvm_guest_debug_arch arch;

};

 

Set up the processorspecific debug registers and configure vcpu for

handling guest debugevents. There are two parts to the structure, the

first a controlbitfield indicates the type of debug events to handle

when running. Commoncontrol bits are:

 

  - KVM_GUESTDBG_ENABLE:        guest debugging is enabled

  - KVM_GUESTDBG_SINGLESTEP:    the next run should single-step

 

The top 16 bits ofthe control field are architecture specific control

flags which caninclude the following:

 

  - KVM_GUESTDBG_USE_SW_BP:     using software breakpoints [x86]

  - KVM_GUESTDBG_USE_HW_BP:     using hardware breakpoints [x86, s390]

  - KVM_GUESTDBG_INJECT_DB:     inject DB type exception [x86]

  - KVM_GUESTDBG_INJECT_BP:     inject BP type exception [x86]

  - KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]

 

For exampleKVM_GUESTDBG_USE_SW_BP indicates that software breakpoints

are enabled in memoryso we need to ensure breakpoint exceptions are

correctly trapped andthe KVM run loop exits at the breakpoint and not

running off into thenormal guest vector. For KVM_GUESTDBG_USE_HW_BP

we need to ensure theguest vCPUs architecture specific registers are

updated to thecorrect (supplied) values.

 

The second part ofthe structure is architecture specific and

typically contains aset of debug registers.

 

When debug eventsexit the main run loop with the reason

KVM_EXIT_DEBUG withthe kvm_debug_exit_arch part of the kvm_run

structure containingarchitecture specific debug information.

 

4.88KVM_GET_EMULATED_CPUID

 

Capability:KVM_CAP_EXT_EMUL_CPUID

Architectures: x86

Type: system ioctl

Parameters: structkvm_cpuid2 (in/out)

Returns: 0 onsuccess, -1 on error

 

struct kvm_cpuid2 {

        __u32 nent;

        __u32 flags;

        struct kvm_cpuid_entry2 entries[0];

};

 

The member 'flags' isused for passing flags from userspace.

 

#defineKVM_CPUID_FLAG_SIGNIFCANT_INDEX              BIT(0)

#defineKVM_CPUID_FLAG_STATEFUL_FUNC          BIT(1)

#defineKVM_CPUID_FLAG_STATE_READ_NEXT                BIT(2)

 

structkvm_cpuid_entry2 {

        __u32 function;

        __u32 index;

        __u32 flags;

        __u32 eax;

        __u32 ebx;

        __u32 ecx;

        __u32 edx;

        __u32 padding[3];

};

 

This ioctl returnsx86 cpuid features which are emulated by

kvm.Userspace can usethe information returned by this ioctl to query

which features areemulated by kvm instead of being present natively.

 

Userspace invokesKVM_GET_EMULATED_CPUID by passing a kvm_cpuid2

structure with the'nent' field indicating the number of entries in

the variable-sizearray 'entries'. If the number of entries is too low

to describe the cpucapabilities, an error (E2BIG) is returned. If the

number is too high,the 'nent' field is adjusted and an error (ENOMEM)

is returned. If thenumber is just right, the 'nent' field is adjusted

to the number ofvalid entries in the 'entries' array, which is then

filled.

 

The entries returnedare the set CPUID bits of the respective features

which kvm emulates,as returned by the CPUID instruction, with unknown

or unsupportedfeature bits cleared.

 

Features like x2apic,for example, may not be present in the host cpu

but are exposed bykvm in KVM_GET_SUPPORTED_CPUID because they can be

emulated efficientlyand thus not included here.

 

The fields in eachentry are defined as follows:

 

  function: the eax value used to obtain theentry

  index: the ecx value used to obtain the entry(for entries that are

         affected by ecx)

  flags: an OR of zero or more of thefollowing:

        KVM_CPUID_FLAG_SIGNIFCANT_INDEX:

           if the index field is valid

        KVM_CPUID_FLAG_STATEFUL_FUNC:

           if cpuid for this function returnsdifferent values for successive

           invocations; there will be severalentries with the same function,

           all with this flag set

        KVM_CPUID_FLAG_STATE_READ_NEXT:

           for KVM_CPUID_FLAG_STATEFUL_FUNCentries, set if this entry is

           the first entry to be read by a cpu

   eax, ebx, ecx, edx: the values returned bythe cpuid instruction for

         this function/index combination

 

4.89 KVM_S390_MEM_OP

 

Capability:KVM_CAP_S390_MEM_OP

Architectures: s390

Type: vcpu ioctl

Parameters: structkvm_s390_mem_op (in)

Returns: = 0 onsuccess,

         < 0 on generic error (e.g. -EFAULTor -ENOMEM),

         > 0 if an exception occurred whilewalking the page tables

 

Read or write datafrom/to the logical (virtual) memory of a VPCU.

 

Parameters arespecified via the following structure:

 

structkvm_s390_mem_op {

        __u64 gaddr;           /* the guest address */

        __u64 flags;           /* flags */

        __u32 size;            /* amount of bytes */

        __u32 op;              /* type of operation */

        __u64 buf;             /* buffer in userspace */

        __u8 ar;               /*the access register number */

        __u8 reserved[31];     /* should be set to 0 */

};

 

The type of operationis specified in the "op" field. It is either

KVM_S390_MEMOP_LOGICAL_READfor reading from logical memory space or

KVM_S390_MEMOP_LOGICAL_WRITEfor writing to logical memory space. The

KVM_S390_MEMOP_F_CHECK_ONLYflag can be set in the "flags" field to check

whether thecorresponding memory access would create an access exception

(without touching thedata in the memory at the destination). In case an

access exceptionoccurred while walking the MMU tables of the guest, the

ioctl returns apositive error number to indicate the type of exception.

This exception isalso raised directly at the corresponding VCPU if the

flagKVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the "flags" field.

 

The start address ofthe memory region has to be specified in the "gaddr"

field, and the lengthof the region in the "size" field. "buf" is the buffer

supplied by theuserspace application where the read data should be written

to forKVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written

is stored for aKVM_S390_MEMOP_LOGICAL_WRITE. "buf" is unused and can be NULL

whenKVM_S390_MEMOP_F_CHECK_ONLY is specified. "ar" designates the access

register number to beused.

 

The"reserved" field is meant for future extensions. It is not used by

KVM with thecurrently defined set of flags.

 

4.90KVM_S390_GET_SKEYS

 

Capability:KVM_CAP_S390_SKEYS

Architectures: s390

Type: vm ioctl

Parameters: structkvm_s390_skeys

Returns: 0 onsuccess, KVM_S390_GET_KEYS_NONE if guest is not using storage

         keys, negative value on error

 

This ioctl is used toget guest storage key values on the s390

architecture. Theioctl takes parameters via the kvm_s390_skeys struct.

 

struct kvm_s390_skeys{

        __u64 start_gfn;

        __u64 count;

        __u64 skeydata_addr;

        __u32 flags;

        __u32 reserved[9];

};

 

The start_gfn fieldis the number of the first guest frame whose storage keys

you want to get.

 

The count field isthe number of consecutive frames (starting from start_gfn)

whose storage keys toget. The count field must be at least 1 and the maximum

allowed value isdefined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range

will cause the ioctlto return -EINVAL.

 

The skeydata_addrfield is the address to a buffer large enough to hold count

bytes. This bufferwill be filled with storage key data by the ioctl.

 

4.91KVM_S390_SET_SKEYS

 

Capability: KVM_CAP_S390_SKEYS

Architectures: s390

Type: vm ioctl

Parameters: structkvm_s390_skeys

Returns: 0 onsuccess, negative value on error

 

This ioctl is used toset guest storage key values on the s390

architecture. Theioctl takes parameters via the kvm_s390_skeys struct.

See section onKVM_S390_GET_SKEYS for struct definition.

 

The start_gfn fieldis the number of the first guest frame whose storage keys

you want to set.

 

The count field isthe number of consecutive frames (starting from start_gfn)

whose storage keys toget. The count field must be at least 1 and the maximum

allowed value isdefined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range

will cause the ioctlto return -EINVAL.

 

The skeydata_addrfield is the address to a buffer containing count bytes of

storage keys. Eachbyte in the buffer will be set as the storage key for a

single frame startingat start_gfn for count frames.

 

Note: If anyarchitecturally invalid key value is found in the given data then

the ioctl will return-EINVAL.

 

4.92 KVM_S390_IRQ

 

Capability:KVM_CAP_S390_INJECT_IRQ

Architectures: s390

Type: vcpu ioctl

Parameters: structkvm_s390_irq (in)

Returns: 0 onsuccess, -1 on error

Errors:

  EINVAL: interrupt type is invalid

          type is KVM_S390_SIGP_STOP and flagparameter is invalid value

          type is KVM_S390_INT_EXTERNAL_CALLand code is bigger

            than the maximum of VCPUs

  EBUSY: type is KVM_S390_SIGP_SET_PREFIX and vcpu is not stopped

          type is KVM_S390_SIGP_STOP and a stopirq is already pending

          type is KVM_S390_INT_EXTERNAL_CALL andan external call interrupt

            is already pending

 

Allows to inject aninterrupt to the guest.

 

Using structkvm_s390_irq as a parameter allows

to inject additionalpayload which is not

possible via KVM_S390_INTERRUPT.

 

Interrupt parametersare passed via kvm_s390_irq:

 

struct kvm_s390_irq {

        __u64 type;

        union {

               struct kvm_s390_io_info io;

               struct kvm_s390_ext_info ext;

               struct kvm_s390_pgm_info pgm;

               struct kvm_s390_emerg_info emerg;

               struct kvm_s390_extcall_infoextcall;

               struct kvm_s390_prefix_infoprefix;

               struct kvm_s390_stop_info stop;

               struct kvm_s390_mchk_info mchk;

               char reserved[64];

        } u;

};

 

type can be one ofthe following:

 

KVM_S390_SIGP_STOP -sigp stop; parameter in .stop

KVM_S390_PROGRAM_INT- program check; parameters in .pgm

KVM_S390_SIGP_SET_PREFIX- sigp set prefix; parameters in .prefix

KVM_S390_RESTART -restart; no parameters

KVM_S390_INT_CLOCK_COMP- clock comparator interrupt; no parameters

KVM_S390_INT_CPU_TIMER- CPU timer interrupt; no parameters

KVM_S390_INT_EMERGENCY- sigp emergency; parameters in .emerg

KVM_S390_INT_EXTERNAL_CALL- sigp external call; parameters in .extcall

KVM_S390_MCHK -machine check interrupt; parameters in .mchk

 

 

Note that the vcpuioctl is asynchronous to vcpu execution.

 

4.94KVM_S390_GET_IRQ_STATE

 

Capability:KVM_CAP_S390_IRQ_STATE

Architectures: s390

Type: vcpu ioctl

Parameters: structkvm_s390_irq_state (out)

Returns: >= numberof bytes copied into buffer,

         -EINVAL if buffer size is 0,

         -ENOBUFS if buffer size is too smallto fit all pending interrupts,

         -EFAULT if the buffer address wasinvalid

 

This ioctl allowsuserspace to retrieve the complete state of all currently

pending interrupts ina single buffer. Use cases include migration

and introspection.The parameter structure contains the address of a

userspace buffer andits length:

 

structkvm_s390_irq_state {

        __u64 buf;

        __u32 flags;

        __u32 len;

        __u32 reserved[4];

};

 

Userspace passes inthe above struct and for each pending interrupt a

struct kvm_s390_irqis copied to the provided buffer.

 

If -ENOBUFS isreturned the buffer provided was too small and userspace

may retry with abigger buffer.

 

4.95KVM_S390_SET_IRQ_STATE

 

Capability:KVM_CAP_S390_IRQ_STATE

Architectures: s390

Type: vcpu ioctl

Parameters: structkvm_s390_irq_state (in)

Returns: 0 onsuccess,

         -EFAULT if the buffer address wasinvalid,

         -EINVAL for an invalid buffer length(see below),

         -EBUSY if there were already interruptspending,

         errors occurring when actuallyinjecting the

          interrupt. See KVM_S390_IRQ.

 

This ioctl allowsuserspace to set the complete state of all cpu-local

interrupts currentlypending for the vcpu. It is intended for restoring

interrupt state aftera migration. The input parameter is a userspace buffer

containing a structkvm_s390_irq_state:

 

structkvm_s390_irq_state {

        __u64 buf;

        __u32 len;

        __u32 pad;

};

 

The userspace memoryreferenced by buf contains a struct kvm_s390_irq

for each interrupt tobe injected into the guest.

If one of theinterrupts could not be injected for some reason the

ioctl aborts.

 

len must be amultiple of sizeof(struct kvm_s390_irq). It must be > 0

and it must notexceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),

which is the maximumnumber of possibly pending cpu-local interrupts.

 

5. The kvm_runstructure

------------------------

 

Application codeobtains a pointer to the kvm_run structure by

mmap()ing a vcpufd.  From that point, application code cancontrol

execution by changingfields in kvm_run prior to calling the KVM_RUN

ioctl, and obtaininformation about the reason KVM_RUN returned by

looking up structuremembers.

 

struct kvm_run {

        /* in */

        __u8 request_interrupt_window;

 

Request that KVM_RUNreturn when it becomes possible to inject external

interrupts into theguest.  Useful in conjunction withKVM_INTERRUPT.

 

        __u8 padding1[7];

 

        /* out */

        __u32 exit_reason;

 

When KVM_RUN hasreturned successfully (return value 0), this informs

application code whyKVM_RUN has returned.  Allowable valuesfor this

field are detailedbelow.

 

        __u8 ready_for_interrupt_injection;

 

Ifrequest_interrupt_window has been specified, this field indicates

an interrupt can beinjected now with KVM_INTERRUPT.

 

        __u8 if_flag;

 

The value of thecurrent interrupt flag.  Only valid ifin-kernel

local APIC is notused.

 

        __u8 padding2[2];

 

        /* in (pre_kvm_run), out (post_kvm_run)*/

        __u64 cr8;

 

The value of the cr8register.  Only valid if in-kernel localAPIC is

not used.  Both input and output.

 

        __u64 apic_base;

 

The value of the APICBASE msr.  Only valid if in-kernel local

APIC is notused.  Both input and output.

 

        union {

               /* KVM_EXIT_UNKNOWN */

               struct {

                       __u64hardware_exit_reason;

               } hw;

 

If exit_reason is KVM_EXIT_UNKNOWN,the vcpu has exited due to unknown

reasons.  Further architecture-specific information isavailable in

hardware_exit_reason.

 

               /* KVM_EXIT_FAIL_ENTRY */

               struct {

                       __u64hardware_entry_failure_reason;

               } fail_entry;

 

If exit_reason is KVM_EXIT_FAIL_ENTRY,the vcpu could not be run due

to unknownreasons.  Further architecture-specificinformation is

available inhardware_entry_failure_reason.

 

               /* KVM_EXIT_EXCEPTION */

               struct {

                       __u32 exception;

                       __u32 error_code;

               } ex;

 

Unused.

 

               /* KVM_EXIT_IO */

               struct {

#defineKVM_EXIT_IO_IN  0

#defineKVM_EXIT_IO_OUT 1

                       __u8 direction;

                       __u8 size; /* bytes */

                       __u16 port;

                       __u32 count;

                       __u64 data_offset; /*relative to kvm_run start */

               } io;

 

If exit_reason isKVM_EXIT_IO, then the vcpu has

executed a port I/Oinstruction which could not be satisfied by kvm.

data_offset describeswhere the data is located (KVM_EXIT_IO_OUT) or

where kvm expectsapplication code to place the data for the next

KVM_RUN invocation(KVM_EXIT_IO_IN).  Data format is apacked array.

 

               struct {

                       structkvm_debug_exit_arch arch;

               } debug;

 

Unused.

 

               /* KVM_EXIT_MMIO */

               struct {

                       __u64 phys_addr;

                       __u8  data[8];

                       __u32 len;

                       __u8  is_write;

               } mmio;

 

If exit_reason isKVM_EXIT_MMIO, then the vcpu has

executed amemory-mapped I/O instruction which could not be satisfied

by kvm.  The 'data' member contains the written dataif 'is_write' is

true, and should befilled by application code otherwise.

 

The 'data' membercontains, in its first 'len' bytes, the value as it would

appear if the VCPUperformed a load or store of the appropriate width directly

to the byte array.

 

NOTE: ForKVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and

      KVM_EXIT_EPR the corresponding

operations are complete(and guest state is consistent) only after userspace

has re-entered thekernel with KVM_RUN.  The kernel sidewill first finish

incomplete operationsand then check for pending signals. Userspace

can re-enter theguest with an unmasked signal pending to complete

pending operations.

 

               /* KVM_EXIT_HYPERCALL */

               struct {

                       __u64 nr;

                       __u64 args[6];

                       __u64 ret;

                       __u32 longmode;

                       __u32 pad;

               } hypercall;

 

Unused.  This was once used for 'hypercall touserspace'.  To implement

such functionality,use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO (all except s390).

Note KVM_EXIT_IO issignificantly faster than KVM_EXIT_MMIO.

 

               /* KVM_EXIT_TPR_ACCESS */

               struct {

                       __u64 rip;

                       __u32 is_write;

                       __u32 pad;

               } tpr_access;

 

To be documented(KVM_TPR_ACCESS_REPORTING).

 

               /* KVM_EXIT_S390_SIEIC */

               struct {

                       __u8 icptcode;

                       __u64 mask; /* psw upperhalf */

                       __u64 addr; /* psw lowerhalf */

                       __u16 ipa;

                       __u32 ipb;

               } s390_sieic;

 

s390 specific.

 

               /* KVM_EXIT_S390_RESET */

#defineKVM_S390_RESET_POR       1

#defineKVM_S390_RESET_CLEAR     2

#defineKVM_S390_RESET_SUBSYSTEM 4

#defineKVM_S390_RESET_CPU_INIT  8

#defineKVM_S390_RESET_IPL       16

               __u64 s390_reset_flags;

 

s390 specific.

 

               /* KVM_EXIT_S390_UCONTROL */

               struct {

                       __u64 trans_exc_code;

                       __u32 pgm_code;

               } s390_ucontrol;

 

s390 specific. A pagefault has occurred for a user controlled virtual

machine(KVM_VM_S390_UNCONTROL) on it's host page table that cannot be

resolved by thekernel.

The program code andthe translation exception code that were placed

in the cpu's lowcoreare presented here as defined by the z Architecture

Principles ofOperation Book in the Chapter for Dynamic Address Translation

(DAT)

 

               /* KVM_EXIT_DCR */

               struct {

                       __u32 dcrn;

                       __u32 data;

                       __u8  is_write;

               } dcr;

 

Deprecated - was usedfor 440 KVM.

 

               /* KVM_EXIT_OSI */

               struct {

                       __u64 gprs[32];

               } osi;

 

MOL uses a specialhypercall interface it calls 'OSI'. To enable it, we catch

hypercalls and exitwith this exit struct that contains all the guest gprs.

 

If exit_reason isKVM_EXIT_OSI, then the vcpu has triggered such a hypercall.

Userspace can nowhandle the hypercall and when it's done modify the gprs as

necessary. Upon guestentry all guest GPRs will then be replaced by the values

in this struct.

 

               /* KVM_EXIT_PAPR_HCALL */

               struct {

                       __u64 nr;

                       __u64 ret;

                       __u64 args[9];

               } papr_hcall;

 

This is used on64-bit PowerPC when emulating a pSeries partition,

e.g. with the'pseries' machine type in qemu.  Itoccurs when the

guest does a hypercallusing the 'sc 1' instruction.  The 'nr'field

contains thehypercall number (from the guest R3), and 'args' contains

the arguments (fromthe guest R4 - R12).  Userspace shouldput the

return code in 'ret'and any extra returned values in args[].

The possiblehypercalls are defined in the Power Architecture Platform

Requirements (PAPR)document available from www.power.org (free

developerregistration required to access it).

 

               /* KVM_EXIT_S390_TSCH */

               struct {

                       __u16 subchannel_id;

                       __u16 subchannel_nr;

                       __u32 io_int_parm;

                       __u32 io_int_word;

                       __u32 ipb;

                       __u8 dequeued;

               } s390_tsch;

 

s390 specific. Thisexit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled

and TEST SUBCHANNELwas intercepted. If dequeued is set, a pending I/O

interrupt for thetarget subchannel has been dequeued and subchannel_id,

subchannel_nr,io_int_parm and io_int_word contain the parameters for that

interrupt. ipb isneeded for instruction parameter decoding.

 

               /* KVM_EXIT_EPR */

               struct {

                       __u32 epr;

               } epr;

 

On FSL BookE PowerPCchips, the interrupt controller has a fast patch

interrupt acknowledgepath to the core. When the core successfully

delivers aninterrupt, it automatically populates the EPR register with

the interrupt vectornumber and acknowledges the interrupt inside

the interruptcontroller.

 

In case the interruptcontroller lives in user space, we need to do

the interruptacknowledge cycle through it to fetch the next to be

delivered interruptvector using this exit.

 

It gets triggeredwhenever both KVM_CAP_PPC_EPR are enabled and an

external interrupthas just been delivered into the guest. User space

should put theacknowledged interrupt vector into the 'epr' field.

 

               /* KVM_EXIT_SYSTEM_EVENT */

               struct {

#defineKVM_SYSTEM_EVENT_SHUTDOWN       1

#defineKVM_SYSTEM_EVENT_RESET          2

                       __u32 type;

                       __u64 flags;

               } system_event;

 

If exit_reason isKVM_EXIT_SYSTEM_EVENT then the vcpu has triggered

a system-level eventusing some architecture specific mechanism (hypercall

or some specialinstruction). In case of ARM/ARM64, this is triggered using

HVC instruction basedPSCI call from the vcpu. The 'type' field describes

the system-levelevent type. The 'flags' field describes architecture

specific flags forthe system-level event.

 

Valid values for'type' are:

  KVM_SYSTEM_EVENT_SHUTDOWN -- the guest hasrequested a shutdown of the

   VM. Userspace is not obliged to honour this,and if it does honour

   this does not need to destroy the VMsynchronously (ie it may call

   KVM_RUN again before shutdown finallyoccurs).

  KVM_SYSTEM_EVENT_RESET -- the guest hasrequested a reset of the VM.

   As with SHUTDOWN, userspace can choose toignore the request, or

   to schedule the reset to occur in the futureand may call KVM_RUN again.

 

               /* Fix the size of the union. */

               char padding[256];

        };

 

        /*

         *shared registers between kvm and userspace.

         *kvm_valid_regs specifies the register classes set by the host

         *kvm_dirty_regs specified the register classes dirtied by userspace

         *struct kvm_sync_regs is architecture specific, as well as the

         *bits for kvm_valid_regs and kvm_dirty_regs

         */

        __u64 kvm_valid_regs;

        __u64 kvm_dirty_regs;

        union {

               struct kvm_sync_regs regs;

               char padding[1024];

        } s;

 

If KVM_CAP_SYNC_REGSis defined, these fields allow userspace to access

certain guestregisters without having to call SET/GET_*REGS. Thus we can

avoid some systemcall overhead if userspace has to handle the exit.

Userspace can querythe validity of the structure by checking

kvm_valid_regs forspecific bits. These bits are architecture specific

and usually definethe validity of a groups of registers. (e.g. one bit

 for general purpose registers)

 

Please note that thekernel is allowed to use the kvm_run structure as the

primary storage forcertain register types. Therefore, the kernel may use the

values in kvm_runeven if the corresponding bit in kvm_dirty_regs is not set.

 

};

 

 

 

6. Capabilities thatcan be enabled on vCPUs

--------------------------------------------

 

There are certaincapabilities that change the behavior of the virtual CPU or

the virtual machinewhen enabled. To enable them, please see section 4.37.

Below you can find alist of capabilities and what their effect on the vCPU or

the virtual machineis when enabling them.

 

The followinginformation is provided along with the description:

 

  Architectures: which instruction setarchitectures provide this ioctl.

      x86 includes both i386 and x86_64.

 

  Target: whether this is a per-vcpu or per-vmcapability.

 

  Parameters: what parameters are accepted bythe capability.

 

  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)

      are not detailed, but errors withspecific meanings are.

 

 

6.1 KVM_CAP_PPC_OSI

 

Architectures: ppc

Target: vcpu

Parameters: none

Returns: 0 onsuccess; -1 on error

 

This capabilityenables interception of OSI hypercalls that otherwise would

be treated as normalsystem calls to be injected into the guest. OSI hypercalls

were invented byMac-on-Linux to have a standardized communication mechanism

between the guest andthe host.

 

When this capabilityis enabled, KVM_EXIT_OSI can occur.

 

 

6.2 KVM_CAP_PPC_PAPR

 

Architectures: ppc

Target: vcpu

Parameters: none

Returns: 0 onsuccess; -1 on error

 

This capabilityenables interception of PAPR hypercalls. PAPR hypercalls are

done using thehypercall instruction "sc 1".

 

It also sets theguest privilege level to "supervisor" mode. Usually the guest

runs in"hypervisor" privilege mode with a few missing features.

 

In addition to theabove, it changes the semantics of SDR1. In this mode, the

HTAB address part ofSDR1 contains an HVA instead of a GPA, as PAPR keeps the

HTAB invisible to theguest.

 

When this capabilityis enabled, KVM_EXIT_PAPR_HCALL can occur.

 

 

6.3 KVM_CAP_SW_TLB

 

Architectures: ppc

Target: vcpu

Parameters: args[0]is the address of a struct kvm_config_tlb

Returns: 0 onsuccess; -1 on error

 

struct kvm_config_tlb{

        __u64 params;

        __u64 array;

        __u32 mmu_type;

        __u32 array_len;

};

 

Configures thevirtual CPU's TLB array, establishing a shared memory area

between userspace andKVM.  The "params" and"array" fields are userspace

addresses ofmmu-type-specific data structures.  The"array_len" field is an

safety mechanism, andshould be set to the size in bytes of the memory that

userspace hasreserved for the array.  It must be atleast the size dictated

by"mmu_type" and "params".

 

While KVM_RUN isactive, the shared region is under control of KVM.  Its

contents areundefined, and any modification by userspace results in

boundedly undefinedbehavior.

 

On return fromKVM_RUN, the shared region will reflect the current state of

the guest's TLB.  If userspace makes any changes, it must callKVM_DIRTY_TLB

to tell KVM whichentries have been changed, prior to calling KVM_RUN again

on this vcpu.

 

For mmu typesKVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:

 - The "params" field is of type"struct kvm_book3e_206_tlb_params".

 - The "array" field points to anarray of type "struct

   kvm_book3e_206_tlb_entry".

 - The array consists of all entries in thefirst TLB, followed by all

   entries in the second TLB.

 - Within a TLB, entries are ordered first byincreasing set number.  Within a

   set, entries are ordered by way (increasingESEL).

 - The hash for determining set number in TLB0is: (MAS2 >> 12) & (num_sets - 1)

   where "num_sets" is thetlb_sizes[] value divided by the tlb_ways[] value.

 - The tsize field of mas1 shall be set to 4Kon TLB0, even though the

   hardware ignores this value for TLB0.

 

6.4KVM_CAP_S390_CSS_SUPPORT

 

Architectures: s390

Target: vcpu

Parameters: none

Returns: 0 onsuccess; -1 on error

 

This capabilityenables support for handling of channel I/O instructions.

 

TEST PENDINGINTERRUPTION and the interrupt portion of TEST SUBCHANNEL are

handled in-kernel,while the other I/O instructions are passed to userspace.

 

When this capabilityis enabled, KVM_EXIT_S390_TSCH will occur on TEST

SUBCHANNELintercepts.

 

Note that even thoughthis capability is enabled per-vcpu, the complete

virtual machine isaffected.

 

6.5 KVM_CAP_PPC_EPR

 

Architectures: ppc

Target: vcpu

Parameters: args[0]defines whether the proxy facility is active

Returns: 0 onsuccess; -1 on error

 

This capabilityenables or disables the delivery of interrupts through the

external proxyfacility.

 

When enabled (args[0]!= 0), every time the guest gets an external interrupt

delivered, itautomatically exits into user space with a KVM_EXIT_EPR exit

to receive thetopmost interrupt vector.

 

When disabled(args[0] == 0), behavior is as if this facility is unsupported.

 

When this capabilityis enabled, KVM_EXIT_EPR can occur.

 

6.6 KVM_CAP_IRQ_MPIC

 

Architectures: ppc

Parameters: args[0]is the MPIC device fd

            args[1] is the MPIC CPU number forthis vcpu

 

This capabilityconnects the vcpu to an in-kernel MPIC device.

 

6.7 KVM_CAP_IRQ_XICS

 

Architectures: ppc

Target: vcpu

Parameters: args[0]is the XICS device fd

            args[1] is the XICS CPU number(server ID) for this vcpu

 

This capabilityconnects the vcpu to an in-kernel XICS device.

 

6.8 KVM_CAP_S390_IRQCHIP

 

Architectures: s390

Target: vm

Parameters: none

 

This capabilityenables the in-kernel irqchip for s390. Please refer to

"4.24KVM_CREATE_IRQCHIP" for details.

 

6.9 KVM_CAP_MIPS_FPU

 

Architectures: mips

Target: vcpu

Parameters: args[0]is reserved for future use (should be 0).

 

This capabilityallows the use of the host Floating Point Unit by the guest. It

allows the Config1.FPbit to be set to enable the FPU in the guest. Once this is

done theKVM_REG_MIPS_FPR_* and KVM_REG_MIPS_FCR_* registers can be accessed

(depending on thecurrent guest FPU register mode), and the Status.FR,

Config5.FRE bits areaccessible via the KVM API and also from the guest,

depending on thembeing supported by the FPU.

 

6.10 KVM_CAP_MIPS_MSA

 

Architectures: mips

Target: vcpu

Parameters: args[0]is reserved for future use (should be 0).

 

This capabilityallows the use of the MIPS SIMD Architecture (MSA) by the guest.

It allows theConfig3.MSAP bit to be set to enable the use of MSA by the guest.

Once this is done theKVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be

accessed, and theConfig5.MSAEn bit is accessible via the KVM API and also from

the guest.

 

7. Capabilities thatcan be enabled on VMs

------------------------------------------

 

There are certain capabilitiesthat change the behavior of the virtual

machine when enabled.To enable them, please see section 4.37. Below

you can find a listof capabilities and what their effect on the VM

is when enablingthem.

 

The followinginformation is provided along with the description:

 

  Architectures: which instruction setarchitectures provide this ioctl.

      x86 includes both i386 and x86_64.

 

  Parameters: what parameters are accepted bythe capability.

 

  Returns: the return value.  General error numbers (EBADF, ENOMEM, EINVAL)

      are not detailed, but errors withspecific meanings are.

 

 

7.1KVM_CAP_PPC_ENABLE_HCALL

 

Architectures: ppc

Parameters: args[0]is the sPAPR hcall number

           args[1] is 0 to disable, 1 to enable in-kernel handling

 

This capabilitycontrols whether individual sPAPR hypercalls (hcalls)

get handled by thekernel or not.  Enabling or disablingin-kernel

handling of an hcallis effective across the VM.  On creation,an

initial set of hcallsare enabled for in-kernel handling, which

consists of thosehcalls for which in-kernel handlers were implemented

before thiscapability was implemented.  If disabled,the kernel will

not to attempt tohandle the hcall, but will always exit to userspace

to handle it.  Note that it may not make sense to enablesome and

disable others of agroup of related hcalls, but KVM does not prevent

userspace from doingthat.

 

If the hcall numberspecified is not one that has an in-kernel

implementation, theKVM_ENABLE_CAP ioctl will fail with an EINVAL

error.

 

7.2KVM_CAP_S390_USER_SIGP

 

Architectures: s390

Parameters: none

 

This capabilitycontrols which SIGP orders will be handled completely in user

space. With thiscapability enabled, all fast orders will be handled completely

in the kernel:

- SENSE

- SENSE RUNNING

- EXTERNAL CALL

- EMERGENCY SIGNAL

- CONDITIONALEMERGENCY SIGNAL

 

All other orders willbe handled completely in user space.

 

Only privilegedoperation exceptions will be checked for in the kernel (or even

in the hardware priorto interception). If this capability is not enabled, the

old way of handlingSIGP orders is used (partially in kernel and user space).

 

7.3KVM_CAP_S390_VECTOR_REGISTERS

 

Architectures: s390

Parameters: none

Returns: 0 onsuccess, negative value on error

 

Allows use of the vectorregisters introduced with z13 processor, and

provides for thesynchronization between host and user space. Will

return -EINVAL if themachine does not support vectors.

 

7.4KVM_CAP_S390_USER_STSI

 

Architectures: s390

Parameters: none

 

This capabilityallows post-handlers for the STSI instruction. After

initial handling inthe kernel, KVM exits to user space with

KVM_EXIT_S390_STSI toallow user space to insert further data.

 

Before exiting touserspace, kvm handlers should fill in s390_stsi field of

vcpu->run:

struct {

        __u64 addr;

        __u8 ar;

        __u8 reserved;

        __u8 fc;

        __u8 sel1;

        __u16 sel2;

} s390_stsi;

 

@addr - guest addressof STSI SYSIB

@fc   - function code

@sel1 - selector 1

@sel2 - selector 2

@ar   - access register number

 

KVM handlers should exitto userspace with rc = -EREMOTE.

 

 

8. Othercapabilities.

----------------------

 

This section listscapabilities that give information about other

features of the KVMimplementation.

 

8.1 KVM_CAP_PPC_HWRNG

 

Architectures: ppc

 

This capability, ifKVM_CHECK_EXTENSION indicates that it is

available, means thatthat the kernel has an implementation of the

H_RANDOM hypercallbacked by a hardware random-number generator.

If present, thekernel H_RANDOM handler can be enabled for guest use

with the KVM_CAP_PPC_ENABLE_HCALLcapability.

 

你可能感兴趣的:(KVM ioctl API)