One of the most important features of the cloud is the sharing of resources by multi-tenants. Without sharing and being able to optimize utilization of resources, the cloud operator can’t provide scalability and support “economies of scale” for its business. The IaaS public contains its “cloud magic” as well as real hardware such as computing, storage and network devices. The utilization of these resources should be optimized by meeting demand (by time), hence they must be shared between the cloud consumers.
云的最重要功能之一是多租户共享资源。 如果没有共享资源并无法优化资源利用,云运营商将无法为其业务提供可扩展性和支持“规模经济”。 IaaS公众包含其“云魔力”以及诸如计算,存储和网络设备之类的真实硬件。 这些资源的利用率应通过满足需求(按时间)进行优化,因此必须在云使用者之间共享。
The basic metric for how a server utilizes its CPU is the idle capacity – the amount of CPU that is free. The CPU utilization compounds from allocations of the following:
服务器如何利用其CPU的基本指标是空闲容量-空闲的CPU数量。 CPU利用率由以下分配组成:
User – the running application 用户–正在运行的应用程序 System – the operating systems 系统–操作系统 Interrupt – Hardware interruptions 中断–硬件中断 Wait – waiting for I/O jobs to end 等待–等待I / O作业结束 Steal – cycles that are not related to the virtual machine 窃取–与虚拟机无关的周期 Idle – no work is being done 空闲–未完成任何工作Steal time (ST) also referred to as “Stolen CPU”, exists in virtualized computing environments –It is the time that the CPU uses to run internal virtual machine tasks, with the hypervisor allocating CPU cycles to other “external tasks” that are probably caused by one of your noisy neighbors.
窃取时间(ST)也称为“被盗CPU”,存在于虚拟化计算环境中–这是CPU用来运行内部虚拟机任务的时间,系统管理程序将CPU周期分配给其他“外部任务”,这很可能是由您的一个吵闹的邻居造成。
I researched this subject on AWS forums and found that when CPU utilization spikes for some time (configured by the cloud operator); the system automatically throttles back the CPU to a few usage percentages, while “stealing” the rest of your CPU. This makes sense as the cloud must protect itself from overload and the threat of crash. You can find more information with regards to Micro instance type in the Amazon AWS FAQs: “Micro instances provide a small amount of consistent CPU resources and allow you to burst CPU capacity up to 2 ECUs when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically but very little CPU at other times for background processes, daemons” Read more On the Amazon developers forums you can find the following: “For example, when the occasion comes where I might need to do a “yum update” the system becomes unresponsive within one minute. I would have expected it to do this at three or five minutes, as it has always done, but today this throttling happens at about thirty seconds to one minute.” Check the thread
我在AWS论坛上研究了此主题,发现当CPU利用率达到峰值一段时间(由云运营商配置)时,该问题就消失了。 系统会自动将CPU的流量调低到几个使用百分比,同时“窃取” CPU的其余部分。 这是有道理的,因为云必须保护自己免受过载和崩溃的威胁。 您可以在Amazon AWS FAQ中找到有关微型实例类型的更多信息:“微型实例可提供少量一致的CPU资源,并在有其他可用周期时允许您将CPU容量最多增加2个ECU。 它们非常适合吞吐量较低的应用程序和网站,这些应用程序和网站定期消耗大量计算周期,而其他时间则很少有CPU用于后台进程,守护程序。” 阅读更多内容在Amazon开发者论坛上,您可以找到以下内容:“例如,到我可能需要执行“ yum更新”的位置时,系统在一分钟内变得无响应。 我本来希望它像往常一样在三到五分钟内完成此操作,但是今天这种节流发生在大约三十秒到一分钟之间。” 检查线程
Amazon doesn’t detail the actual Xen configuration though they say that: “The instance is designed to operate with its CPU usage at essentially only two levels: the normal low background level, and then at brief spiked levels much higher than the background level.” Read more According to what I learn, monitoring CPU using a standard monitoring tool can mislead the cloud user. For example, Linux instances will not report the proper values for CPU usage due the virtualization layer on the underlying infrastructure. For accurate values for CPU usage on EC2 instances, the cloud user should rely only on the CloudWatch metrics. Another important aspect regarding CPU utilization is the workload model. I learned that you should differentiate between two workload models – Batch workload and Real-time workload. The former provides greater tolerance for shortage and can wait for an available capacity. The batch model describes a task that generates a steady utilization or aggregated amount of CPU usage, so once there is heavy utilization it will be compensated later on. The real-time workload balance will never be compensated and overloads will be restrained by the cloud operators. Moreover, cloud operators such as Amazon AWS tend to deploy a more batch workload model to control loads on their physical layer. In order to utilize the AWS micro instances, you need to be able to control your online resources behavior. You could also try playing your web server configuration settings, for example, limiting the number of clients. You should use S3 for hosting static files, such as images, video, and audio. Utilizing other AWS services to support your application performance needs can move some of the load to other cloud resources thereby lowering the overall CPU consumption of your EC2 instances. Anyway, it is important to leverage the elastic environment and deploy horizontal (or vertical) scaling methods to protect the environment.
亚马逊并未详细说明实际的Xen配置,尽管他们说:“该实例的设计目的是使其CPU使用率基本上仅在两个级别上运行:正常的低后台级别,然后处于短暂的峰值级别,远高于后台级别。 ” 阅读更多据我了解 ,使用标准监视工具监视CPU可能会误导云用户。 例如,由于基础架构上的虚拟化层,Linux实例将不会报告CPU使用率的正确值。 为了获得EC2实例上CPU使用率的准确值,云用户应仅依赖CloudWatch指标。 有关CPU利用率的另一个重要方面是工作负载模型。 我了解到您应该区分两种工作负载模型-批处理工作负载和实时工作负载。 前者对短缺提供更大的容忍度,可以等待可用容量。 批处理模型描述了一个任务,该任务会产生稳定的利用率或总计的CPU使用量,因此,一旦利用率很高,便会在以后进行补偿。 实时工作负载平衡将永远不会得到补偿,并且云运营商将限制过载。 而且,诸如Amazon AWS的云运营商倾向于部署更多批处理的工作负载模型来控制其物理层上的负载。 为了利用AWS微型实例,您需要能够控制您的在线资源行为。 您也可以尝试播放Web服务器配置设置,例如,限制客户端数量。 您应该使用S3托管静态文件,例如图像,视频和音频。 利用其他AWS服务来支持您的应用程序性能需求可以将一些负载转移到其他云资源,从而降低EC2实例的总体CPU消耗。 无论如何,重要的是要利用弹性环境并部署水平(或垂直)缩放方法来保护环境。
Following different Xen configurations for different instances types, such as the rules for micro CPU, I wonder – Do you need to have different auto-scaling CPU thresholds for different instances types ?
对于不同的实例类型(例如微型CPU的规则)遵循不同的Xen配置,我想知道–您是否需要为不同的实例类型使用不同的自动扩展CPU阈值?
翻译自: https://www.sitepoint.com/who-stole-my-cpu/