虚拟化 cpu虚拟化_谁在偷虚拟CPU时间?

虚拟化 cpu虚拟化

虚拟化 cpu虚拟化_谁在偷虚拟CPU时间?_第1张图片

Hi! In this article, I want to explain, in layman’s terms, how steal appears in VMs and tell you about some of the less-than-obvious artifacts that we found during research on the topic that I was involved in as CTO of the Mail.ru Cloud Solutions platform. The platform runs KVM.

嗨! 在本文中,我想以通俗易懂的方式来解释窃取如何在虚拟机中出现,并向您介绍在我作为邮件的 CTO参与主题研究期间发现的一些不太明显的工件。 ru云解决方案平台。 该平台运行KVM。

CPU steal time is the time during which a VM doesn’t receive the necessary resources to operate. This time can only be calculated in a guest OS in virtualization environments. It is extremely unclear where the allocated resources are lost, just like in real life situations. However, we decided to figure it out, and we even performed a series of tests to do so. That is not to say that we know everything about steal, but there are some fascinating things that we would like to share with you.

CPU窃取时间是指VM无法接收到所需的资源以进行操作的时间。 此时间只能在虚拟化环境中的来宾OS中计算。 就像在现实生活中一样,目前还不清楚分配的资源在哪里丢失。 但是,我们决定解决这个问题,我们甚至进行了一系列测试。 这并不是说我们对盗窃一无所知但我们希望与您分享一些有趣的事情。

1.什么是偷窃(1. What is steal?)

Steal is a metric that indicates a lack of CPU time for VM processes. As described in the 窃取是一种指标,表明VM进程缺少CPU时间。 如 KVM kernel patch, KVM内核补丁中所述 , steal is the time that a hypervisor spends running other processes in a host OS, while VM process is in a run queue. In other words, 窃取是虚拟机管理程序花时间在VM进程处于运行队列中时在主机OS中运行其他进程的时间。 换句话说, steal is calculated as the difference between the moment when a process is ready to run and the moment when CPU time is allocated to the process. 窃取被计算为进程准备运行的时刻与为该进程分配CPU时间之间的差。

The VM kernel gets the steal metric from the hypervisor. The hypervisor doesn’t specify which processes it is running. It just says: «I’m busy, and can’t allocate any time to you.» In a KVM, steal calculation is supported in patches. There are two main points regarding this:

VM内核从管理程序获取窃取指标。 系统管理程序未指定其正在运行的进程。 它只是说:“我很忙,无法给您分配任何时间。” 在KVM中, 补丁程序支持窃取计算。 关于此点有两个要点:

  • A VM learns about steal from the hypervisor. This means that in terms of losses, steal is an indirect measurement that can be distorted in several ways.

    VM从管理程序了解窃取 。 这意味着就损失而言, 窃取是一种间接测量,可以通过多种方式进行失真。

  • The hypervisor doesn’t share with the VM information regarding what it is occupied with. The most crucial point is that it doesn’t allocate time to it. The VM itself, therefore, cannot detect distortions in the steal metric, which could be estimated by the nature of the competing processes.

    系统管理程序不会与VM共享有关其占用的信息。 最关键的一点是它没有分配时间。 因此,VM本身无法检测到窃取度量标准中的失真这可以通过竞争进程的性质来估计。

2.什么会影响盗窃(2. What affects steal?)

2.1。 计算盗窃 (2.1. Calculating steal)

Essentially, steal is calculated in more or less the same way as CPU utilization time. There isn’t a great deal of information regarding how utilization is calculated. That’s probably because most professionals think it’s obvious. However, there are some pitfalls. The process is described in an article by Brendann Gregg. He discusses a whole host of nuances regarding how to calculate utilization and scenarios in which the calculation will be wrong:

本质上, 窃取的计算方式与CPU利用率时间大致相同。 关于如何计算利用率的信息不多。 这可能是因为大多数专业人士认为这很明显。 但是,有一些陷阱。 Brendann Gregg在一篇文章中描述了该过程。 他讨论了有关如何计算利用率以及计算错误的方案的许多细微差别:

  • CPU overheating and throttling.

    CPU过热和节流。

  • Turning Turbo Boost on/off, resulting in a change in CPU clock rate.

    打开/关闭Turbo Boost,导致CPU时钟速率改变。
  • The time slice change that occurs when CPU power-saving technologies, e.g. SpeedStep, are used.

    使用CPU节电技术(例如SpeedStep)时发生的时间片更改。

  • Problems related to calculating averages: measuring utilization for one minute at 80% power could hide a short-term 100 % boost.

    与计算平均值有关的问题:以80%的功率测量一分钟的利用率可能会在短期内隐藏100%的提升。

  • A spinlock that results in a scenario whereby the processor is utilized, but the user process doesn’t progress. As a result, the calculated CPU utilization will be 100%, but the process will not actually consume CPU time.

    自旋锁导致使用处理器但用户进程无法进行的情况。 结果,计算出的CPU利用率将为100%,但是该进程实际上不会消耗CPU时间。

I haven’t come across any articles describing such calculations of steal (if you know of any, please share them in the comments section). As you can see from the source code, the calculation mechanism is the same as for utilization. The only difference is that another counter is added specifically for the KVM process (VM process), which calculates how long the KVM process has been waiting for CPU time. The counter takes data on the CPU from its specification and checks if all its ticks are being utilized by the VM process. If all the ticks are being used, then the CPU was only busy with the VM process. Otherwise, we know that the CPU was doing something else and steal appears.

我没有见过任何描述这种偷窃计算方法的文章(如果您知道任何方法,请在评论部分分享)。 从源代码中可以看到,计算机制与利用率相同。 唯一的不同是专门为KVM进程(VM进程)添加了另一个计数器,该计数器计算KVM进程等待CPU时间的时间。 计数器从其规格中获取CPU上的数据,并检查VM进程是否正在使用其所有滴答。 如果所有刻度都被使用,则CPU仅忙于VM进程。 否则,我们知道CPU正在执行其他操作并出现盗窃

The process by which steal is calculated is subject to the same issues as the regular calculation of utilization. These issues are not that common, but they can appear rather confusing.

计算窃取的过程与常规使用率计算存在相同的问题。 这些问题并不常见,但可能会引起混乱。

2.2。 KVM虚拟化的类型 (2.2. Types of KVM virtualization)

In general, there are three types of virtualization, and they are all supported by a KVM. The mechanism by which steal occurs may depend on the type of virtualization.

通常,虚拟化分为三种类型,并且都由KVM支持。 发生窃取的机制可能取决于虚拟化的类型。

Translation. In this case, the VM OS will work with physical hypervisor devices in the following way:

翻译。 在这种情况下,VM OS将通过以下方式与物理管理程序设备一起使用:

  1. The guest OS sends a command to its guest device.

    来宾操作系统向其来宾设备发送命令。

  2. The guest device driver accepts the command, creates a BIOS device request, and sends the command to the hypervisor.

    来宾设备驱动程序接受该命令,创建BIOS设备请求,然后将该命令发送到管理程序。

  3. The hypervisor process translates the command into a physical device command, making it more secure, among other things.

    系统管理程序进程将命令转换为物理设备命令,从而使其更加安全。

  4. The physical device driver accepts the modified command and forwards it to the physical device itself.

    物理设备驱动程序接受修改后的命令,并将其转发给物理设备本身。

  5. The execution results of the command return following the same path.

    命令的执行结果返回相同的路径。

The advantage of translation is that it allows us to emulate any device and requires no special preparation of the OS kernel. But this comes at the expense of performance.

转换的优点是它允许我们模拟任何设备,并且不需要特殊准备OS内核。 但这是以性能为代价的。

Hardware virtualization. In this case, a device receives commands from the OS on the hardware level. This is the fastest and overall best method. Unfortunately, not all physical devices, hypervisors, and guest OSs support it. For now, the main devices that support hardware virtualization are CPUs.

硬件虚拟化。 在这种情况下,设备会在硬件级别上从OS接收命令。 这是最快,总体上最好的方法。 不幸的是,并非所有物理设备,虚拟机管理程序和来宾操作系统都支持它。 目前,支持硬件虚拟化的主要设备是CPU。

Paravirtualization. The most common option for device virtualization on a KVM and the most widespread type of virtualization for guest OSs. Its main feature is that it works with some hypervisor subsystems (e.g. network or drive stack) and allocates memory pages using a hypervisor API without translating low-level commands. The disadvantage of this virtualization method is the need to modify the guest OS’s kernel to allow for interaction with the hypervisor using the same API. The most common solution to this issue is to install special drivers into the guest OS. In a KVM this API is called a virtio API.

半虚拟化。 KVM上的设备虚拟化的最常见选项,以及来宾OS的最广泛的虚拟化类型。 它的主要功能是它可以与某些虚拟机监控程序子系统(例如网络或驱动器堆栈)一起使用,并使用虚拟机监控程序API分配内存页面,而无需转换低级命令。 这种虚拟化方法的缺点是需要修改来宾OS的内核,以允许使用相同的API与管理程序进行交互。 解决此问题的最常见方法是在来宾操作系统中安装特殊的驱动程序。 在KVM中,此API称为virtio API 。

When paravirtualization is used, the path to the physical device is much shorter than in cases when translation is used, because commands are sent directly from the VM to the hypervisor process in the host. This accelerates the execution of all instructions within the VM. In a KVM, a virtio API is responsible for this. It only works for some devices like network and drive adapters. This is why virtio drivers are installed to VMs.

使用半虚拟化时,到物理设备的路径比使用转换时要短得多,因为命令是直接从VM发送到主机中的管理程序进程的。 这加速了VM中所有指令的执行。 在KVM中,需要使用virtio API。 它仅适用于某些设备,例如网络和驱动器适配器。 这就是将virtio驱动程序安装到VM的原因。

The flip side of such acceleration is that not all processes executed in a VM stay within the VM. This result in a number of effects, which might cause steal. If you would like to learn more, start with An API for virtual I/O: virtio.

这种加速的不利方面是,并非在VM中执行的所有进程都留在VM中。 这会导致多种影响,这可能会导致偷窃 。 如果您想了解更多信息,请从虚拟I / O API开始:virtio 。

2.3。 公平安排 (2.3. Fair scheduling)

A VM on a hypervisor is, in fact, a regular process, which is subject to scheduling laws (resource distribution between processes) in a Linux kernel. Let’s take a closer look at this.

虚拟机管理程序上的VM实际上是一个常规进程,该进程受Linux内核中的调度规则(进程之间的资源分配)约束。 让我们仔细看看。

Linux uses so-called CFS, Completely Fair Scheduler, which became the default with kernel 2.6.23. To get a handle on this algorithm, read Linux Kernel Architecture or the source code. The essence of CFS lies in the distribution of CPU time between processes, depending on their run time. The more CPU time a process requires, the less CPU time it gets. This guarantees the «fair» execution of all processes and helps to avoid one process taking up all of the processors, all of the time and allows other processes to run too.

Linux使用所谓的CFS,完全公平调度程序,它在内核2.6.23中成为默认设置。 要了解此算法,请阅读Linux Kernel Architecture或源代码。 CFS的本质在于进程之间的CPU时间分配,具体取决于它们的运行时间。 一个进程需要的CPU时间越长,它获得的CPU时间就越少。 这保证了所有进程的“公平”执行,并有助于避免一个进程一直占用所有处理器,并允许其他进程也运行。

Sometimes this paradigm results in interesting artifacts. Long-standing Linux users will no doubt remember how a regular text editor on the desktop would freeze when running resource-intensive applications like a compiler. This happened because resource-light tasks, such as desktop applications, were competing with tasks that used many resources, like a compiler. CFS considers this to be unfair, and so it stops the text editor from time to time and lets the CPU process the compiler tasks. This was fixed using the sched_autogroup mechanism; there are, however, many other peculiarities of CPU time distribution. This article is not really about how bad CFS is. It is rather an attempt to draw attention to the fact that «fair» distribution of CPU time is not the most trivial task.

有时,这种范例会产生有趣的伪像。 长期使用Linux的用户无疑会记得,当运行诸如编译器之类的资源密集型应用程序时,桌面上的常规文本编辑器将如何冻结。 之所以发生这种情况,是因为资源轻的任务(例如桌面应用程序)正在与使用许多资源的任务(例如编译器)竞争。 CFS认为这是不公平的,因此它会不时停止文本编辑器,并让CPU处理编译器任务。 这是使用sched_autogroup机制修复的; 但是,CPU时间分配还有许多其他特性。 本文并不是真的关于CFS有多糟糕。 这是一种试图引起人们注意的事实,即“公平”分配CPU时间并不是最琐碎的任务。

Another important aspect of a scheduler is preemption. This is necessary to rid the CPU of any over-indulged processes and allow others to work too. This is called context switching. The entire task context is retained: stack status, registers, and so on, after which the process is left to wait and is replaced by another process. This is an expensive operation for an OS. It’s rarely used, but it’s actually not bad at all. Frequent context switching might be an indicator of an OS issue but it usually occurs continuously and is not a sign of any issue in particular.

调度程序的另一个重要方面是抢占。 这对于消除CPU过度沉迷的进程并允许其他进程也很有用。 这称为上下文切换 。 整个任务上下文将保留:堆栈状态,寄存器等,然后让该进程等待,并由另一个进程替换。 对于OS,这是一项昂贵的操作。 它很少使用,但实际上一点也不差。 频繁的上下文切换可能是操作系统问题的指示,但通常会连续发生,并且并不特别表示任何问题。

This long discourse was necessary to explain one fact: in a fair Linux scheduler, the more CPU resources the process consumes, the faster it will be stopped to allow other processes to work. Whether this is right or not is a complex question, and the solution is different depending on the load. Until recently, Windows scheduler prioritized desktop applications, which resulted in slower background processes. In Sun Solaris there were five different scheduler classes. When virtualization was introduced, they added another one, Fair share scheduler, because the others were not running properly with Solaris Zones virtualization. To dig deeper into this, I recommend starting with Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture or Understanding the Linux Kernel.

冗长的论述对于解释一个事实是必要的:在一个公平的Linux调度程序中,进程消耗的CPU资源越多,停止它以允许其他进程工作的速度就越快。 这是否正确是一个复杂的问题,解决方案因负载而异。 直到最近,Windows Scheduler仍对桌面应用程序进行了优先排序,这导致后台进程变慢。 在Sun Solaris中,有五个不同的调度程序类。 引入虚拟化后,他们又添加了一个公平共享调度程序 ,因为其他应用程序无法在Solaris Zones虚拟化中正常运行。 为了更深入地了解这一点,我建议从Solaris Internals:Solaris 10和OpenSolaris内核体系结构或了解Linux内核开始 。

2.4。 我们如何监视盗窃(2.4. How can we monitor steal?)

Just like any other CPU metric, it’s easy to monitor steal inside a VM. You can use any CPU metric measurement tool. The main thing is that the VM must be on Linux. For some reason, Windows doesn’t provide such information to the user. :(

就像任何其他CPU指标一样,很容易监控VM中的窃取 。 您可以使用任何CPU指标测量工具。 最主要的是,VM必须在Linux上。 由于某些原因,Windows不会向用户提供此类信息。 :(

虚拟化 cpu虚拟化_谁在偷虚拟CPU时间?_第2张图片
top output: specification of CPU load with steal in the right column 顶部输出:CPU负载的规格,在右列中包含“ steal”

Things become complicated when trying to get this information from a hypervisor. You can try to forecast steal on a host machine, using Load Average (LA), for example. This is the average value of the number of processes in the run queue. The calculation method for this parameter is not a simple one, but in general, if an LA that has been standardized according to the number of CPU threads is greater than 1, it means that the Linux server is overloaded.

尝试从管理程序获取此信息时,事情变得很复杂。 例如,您可以尝试使用平均负载(LA)预测主机上的窃取 。 这是运行队列中进程数的平均值。 此参数的计算方法不是一种简单的方法,但通常,如果根据CPU线程数标准化的LA大于1,则意味着Linux服务器超载。

So, what are all these processes waiting for? Obviously, the CPU. This answer is not quite accurate, however, because sometimes the CPU is free and the LA is way too high. Remember that NFS falls and LA rises at the same time. A similar situation might occur with the drive and other input/output devices. In fact, the processes might be waiting for the end of a lock: physical (related to input/output devices) or logical (a mutex object, for example). The same is true for hardware-level locks (for example, disk response) or logic-level locks (so-called «locking primitives», which include a number of entities, mutex adaptive and spin, semaphores, condition variables, rw locks, ipc locks...).

那么,所有这些过程还在等待什么呢? 显然,CPU。 但是,此答案不太准确,因为有时CPU空闲,而LA太高。 请记住,NFS下降而LA同时上升 。 驱动器和其他输入/输出设备可能会发生类似情况。 实际上,进程可能正在等待锁的结束:物理锁(与输入/输出设备有关)或逻辑锁(例如,互斥对象)。 硬件级别的锁(例如,磁盘响应)或逻辑级别的锁(所谓的“锁定原语”)也是如此,其中包括许多实体,互斥量自适应和自旋,信号量,条件变量,rw锁, ipc锁...)。

Another feature of LA is that it is calculated as an average value within the OS. For example, if 100 processes compete for one file, the LA is 50. This large number might make it seem like this is bad for the OS. However, for poorly written code this can be normal. Only that specific code would be bad, and the rest of OS might be fine.

LA的另一个功能是将其计算为OS内的平均值。 例如,如果100个进程争用一个文件,则LA为50。这个很大的数字可能看起来对操作系统不利。 但是,对于写得不好的代码,这可能是正常的。 只有那个特定的代码会是不好的,而其余的OS可能会很好。

Because of this averaging (for less than a minute), determining anything using an LA is not the best idea, as it can yield extremely ambiguous results in some instances. If you try to find out more about this, you’ll find that Wikipedia and other available resources only describe the simplest of cases, and the process is not described in detail. If you are interested in this, again, visit Brendann Gregg and follow the links.

由于这种平均(少于一分钟),因此使用LA确定任何内容都不是最好的主意,因为在某些情况下,它可能会产生极其模棱两可的结果。 如果您尝试查找有关此内容的更多信息,则会发现Wikipedia和其他可用资源仅描述了最简单的情况,并且未详细描述该过程。 如果您对此感兴趣,请再次访问Brendann Gregg并点击链接。

3.特效 (3. Special effects)

Now let’s get to the main cases of steal that we encountered. Allow me to explain how they result from the above and how they correlate with hypervisor metrics.

现在,让我们了解一下我们遇到的盗窃的主要情况。 让我解释一下它们是如何产生上述结果的,以及它们与虚拟机管理程序指标之间的关系。

Overutilization. The simplest and most common case: the hypervisor is being overutilized. Indeed, with a lot of VMs running and consuming a lot of CPU resources, competition is high, and utilization according to the LA is greater than 1 (standardized according to CPU threads). Everything lags within all VMs. Steal sent from the hypervisor grows as well. You have to redistribute the load or turn something off. On the whole, this is all logical and straightforward.

过度利用。 最简单,最常见的情况是:管理程序被过度利用。 确实,随着许多VM运行并消耗大量CPU资源,竞争非常激烈,根据LA的利用率大于1(根据CPU线程标准化)。 一切都滞留在所有VM中。 从虚拟机监控程序发送的窃取也会增加。 您必须重新分配负载或关闭电源。 总体而言,这是合乎逻辑和直接的。

Paravirtualization vs single instances. There’s only one VM on a hypervisor. The VM consumes a small part of it, but provides high input/output load, for example, for a drive. Unexpectedly, a small steal of less than 10 % appears (as some of the tests we conducted show).

半虚拟化与单个实例。 虚拟机管理程序上只有一个VM。 VM消耗了其中的一小部分,但为驱动器提供了很高的输入/输出负载。 出乎意料的是,出现了少于10%的小偷窃现象(如我们进行的一些测试所示)。

This is a curious case. Here, steal appears because of locks on the level of the paravirtualized devices. Inside the VM, a breakpoint is created. This is processed by the driver and goes to the hypervisor. Due to the breakpoint processing on the hypervisor, the VM sees this as a sent request. It is ready to run and waits for the CPU, but receives no CPU time. The VM thinks that the time has been stolen.

这是一个奇怪的情况。 在这里,由于半虚拟化设备级别上的锁定而出现盗窃 。 在VM内部,将创建一个断点。 这由驱动程序处理,然后转到管理程序。 由于在虚拟机管理程序上进行了断点处理,因此VM将其视为已发送的请求。 它已准备就绪,可以运行并等待CPU,但是没有收到CPU时间。 VM认为时间已被盗。

This happens when the buffer is sent. It goes to the hypervisor’s kernel space and we wait for it. From the point of view of the VM, it should return immediately. Therefore, according to our steal calculation algorithm, this time is considered stolen. It is likely that other mechanisms may be involved in this (e.g. the processing of other sys calls), but they should not differ to any significant degree.

发送缓冲区时会发生这种情况。 它进入虚拟机管理程序的内核空间,我们等待它。 从VM的角度来看,它应该立即返回。 因此,根据我们的窃取计算算法,这次被认为是被盗。 其他机制可能也可能参与其中(例如,其他sys调用的处理),但是它们之间不应有任何重大差异。

Scheduler vs highly loaded VMs. When one VM suffers from steal more than the others, this is connected directly with the scheduler. The greater the load that a process puts on a CPU, the faster a scheduler will throw it out, so as to allow other processes to work. If the VM is consuming little, it will experience almost no steal. Its process has just been sitting and waiting, and it needs to be given more time. If the VM puts a maximum load on all cores, the process is thrown away more often and the VM is afforded less time.

调度程序与高负载虚拟机。 当一个虚拟机遭受的盗窃比其他虚拟机更多时,这将直接与调度程序连接。 进程对CPU的负载越大,调度程序将其扔出的速度就越快,以便允许其他进程正常工作。 如果虚拟机消耗很少,则几乎不会被盗。 它的过程一直在等待,它需要更多的时间。 如果VM在所有内核上都施加了最大负载,则该过程将被更频繁地丢弃,并且为VM提供的时间更少。

It’s even worse when processes within the VM try to get more CPU, because they can’t process the data. Then the OS on the hypervisor will provide less CPU time because of the fair optimization. This process snowballs, and steal surges sky-high, while other VMs may not even notice it. The more cores there are, the worse is it for the unfortunate VM. In short, highly loaded VMs with many cores suffer the most.

当VM中的进程试图获取更多的CPU时,甚至更糟,因为它们无法处理数据。 然后,由于合理的优化,系统管理程序上的操作系统将提供更少的CPU时间。 这个过程雪上加霜,并且窃取的电涌数量很高,而其他VM可能甚至没有注意到它。 内核越多,不幸的VM越差。 简而言之,具有多个内核的高负载VM受害最大。

Low LA but steal is present. If the LA is about 0.7 (meaning that the hypervisor seems underloaded), but there’s steal in some VMs:

低LA,但存在抢断 。 如果LA约为0.7(这意味着管理程序似乎负载不足),但是某些VM会被窃取

  • The aforementioned paravirtualization example applies. The VM might be receiving metrics that indicate steal, while the hypervisor has no issues. According to the results of our tests, such steal does not tend to exceed 10 % and doesn’t have a significant impact on application performance within the VM.

    前面提到的半虚拟化示例适用。 虚拟机可能正在接收指示“ 偷”的指标,而虚拟机监控程序没有问题。 根据我们的测试结果,这种窃取趋势不会超过10%,并且不会对VM中的应用程序性能产生重大影响。

  • The LA parameter has been calculated incorrectly. More precisely, it has been calculated correctly at a specific moment, but when averaging, it is lower than it should be for one minute. For example, if one VM (one-third of the hypervisor) consumes all CPUs for 30 seconds, then the LA for a minute will be 0.15. Four such VMs, working at the same time, will result in a value of 0.6. Based on the LA, you wouldn’t be able to deduce that for 30 seconds for each of them, the steal was almost 25 %.

    LA参数计算错误。 更准确地说,它是在特定时刻正确计算的,但取平均时,它在一分钟内低于应有的值。 例如,如果一个VM(系统管理程序的三分之一)消耗了所有CPU 30秒,那么一分钟的LA将为0.15。 四个同时运行的VM的值为0.6。 基于洛杉矶,您将无法推断出每人30秒的抢断率接近25%。

  • Again, this happened because of the scheduler, which decided that someone was «eating» too much and made them wait. Meanwhile, it will switch context, process breakpoints, and attend to other important system matters. As a result, some VMs experience no issues, and others suffer from significant performance losses.

    同样,这是由于调度程序而发生的,该调度程序决定某人过多地“进食”并让他们等待。 同时,它将切换上下文,处理断点并处理其他重要的系统事务。 结果,某些虚拟机没有任何问题,而其他虚拟机则遭受了严重的性能损失。

4.其他变形 (4. Other distortions)

There are a million possible reasons for distortion of fair CPU time allocation on a VM. For example, hyperthreading and NUMA add complexity to the calculations. They complicate the choice of the core used to run a process because a scheduler uses coefficients; that is to say weights, which complicate the calculations even more than this when switching contexts.

有100万个可能的原因导致VM上的CPU时间分配不合理。 例如,超线程和NUMA增加了计算的复杂性。 因为调度程序使用系数,所以它们使用于运行进程的核心的选择复杂化。 也就是说,权重在切换上下文时使计算更加复杂。

There are distortions that arise from technologies like Turbo Boost or its opposite, power saving mode, which might artificially increase or decrease CPU core speed and even time slice. Turning Turbo Boost on decreases the productivity of one CPU thread due to a performance increase in another one. At that moment, information regarding the current CPU clock speed is not sent to the VM, which thinks that someone is stealing its time (e.g. it requested 2 GHz and got half as much).

诸如Turbo Boost或相反的省电模式之类的技术会产生失真,这些失真可能会人为地增加或降低CPU核心速度甚至时间片。 启用Turbo Boost会降低一个CPU线程的生产率,这是因为另一个线程的性能提高了。 那时,有关当前CPU时钟速度的信息没有发送到VM,VM认为有人正在窃取时间(例如,它请求2 GHz的时间是原来的一半)。

In fact, there can be many reasons for distortion. You may find something else entirely in any given system. I recommend starting with the books linked above and obtaining statistics from the hypervisor using tools such as perf, sysdig, systemtap, and dozens of others.

实际上,可能有很多导致失真的原因。 您可能会在任何给定系统中完全找到其他内容。 我建议从上面链接的书和使用获取工具,如PERF,sysdig,SystemTap的,并从管理程序统计几十人 。

5。结论 (5. Conclusions)

  1. Some steal may appear due to paravirtualization and this can be considered normal. Online sources say that this value can be 5-10 %. It depends on the application within a VM, and the load the VM puts on its physical devices. It is important to pay attention to how applications feel inside a VM.

    由于半虚拟化,可能会出现一些盗窃行为 ,这可以认为是正常现象。 在线消息来源说,该值可以为5-10%。 它取决于VM中的应用程序,以及VM对其物理设备施加的负载。 重要的是要注意应用程序在VM内部的感觉。

  2. The correlation between the load on the hypervisor and steal within a VM is not always certain. Both steal calculations can be wrong in some cases and with different loads.

    虚拟机管理程序上的负载与VM中的窃取之间的相关性并不总是确定的。 在某些情况下和不同的负载下,两种窃取计算都可能是错误的。

  3. Scheduler does not favor processes that request a lot of resources. It tries to give less to those that ask for more. Big instances are mean.

    调度程序不喜欢需要大量资源的进程。 它试图减少那些要求更多的人。 大实例是卑鄙的。

  4. A little steal can be normal without paravirtualization as well (taking into consideration the load within the VM, the particularities of neighbors’ loads, the distribution of the load between threads, and other factors).

    在没有半虚拟化的情况下,进行少量窃取也是正常的(考虑到VM中的负载,邻居负载的特殊性,线程之间的负载分配以及其他因素)。

  5. If you would like to calculate steal in a particular system, research the various possibilities, gather metrics, analyze them thoroughly, and think about how to distribute the load fairly. Regardless, there can be deviations, which must be verified using tests or view these in a kernel debugger.

    如果您想计算特定系统中的窃取率 ,请研究各种可能性,收集指标,进行彻底分析,然后考虑如何公平地分配负载。 无论如何,都可能存在偏差,必须使用测试进行验证或在内核调试器中查看这些偏差。

翻译自: https://habr.com/en/company/mailru/blog/453140/

虚拟化 cpu虚拟化

你可能感兴趣的:(操作系统,python,linux,java,人工智能)