首先讲述下操作系统层面的性能监控工具。
在性能调优的工作中通常可见三个必不可少的步骤: 性能监控=》性能分析=》性能调优。
性能监控(perfomance monitoring):在没有任何可用性能数据的情况下,先进行性能的监控。
性能分析(performace profiling):收集应用的性能数据,进行分析。
性能调优(performance tunning): 修改源代码,配置文件等来改善性能。
性能调优中很重要的是CPU的利用率。这个需要在操作系统层面来观察。
在大多数操作系统中,CPU利用率分为用户级CPU利用率和内核(系统)级CPU利用率。用户级CPU利用率表示应用程序花在应用程序代码上的CPU时间比例。内核级CPU利用率表示应用程序花在内核级代码上的CPU时间比例。高的内核级CPU利用率意味着存在共享资源的竞争或大量的IO操作。理想的情况是0%的内核级CPU利用率,这样所有的CPU时钟周期可以被应用程序代码利用。
对于那种计算密集型的应用,不仅仅要关注以上的部分。还需要关注IPC(Instructions per clock)和CPI(cycles per instruction)。这里需要注意一个stall现象。因为大多数操作系统工具在统计CPU利用率的时候并不会统计那些没有执行任何指令的CPU时钟周期,这意味着当CPU在等待数据的时候也被统计为CPU在被利用。通常一个 stall 会浪费几百个CPU周期。所以对计算密集型的应用来说,得减少stall现象的发生。
现在介绍Linux的CPU性能监控工具:
1.vmstat 可以看到操作系统的用户cpu使用时间,内核CPU使用时间,空闲时间等
2. pidstat 可以查看指定进程 以及其内部线程的相关CPU和IO信息。
内存利用率: paging or swapping activity
JVM的垃圾回收器在系统swapping的时候会严重影响性能,因为垃圾回收器在回收无用对象的时候会遍历很大一部分已经被交换出去的内存。这部分内存需要先被交换进内存,才能被垃圾回收期扫描,这会大大延长垃圾回收的时间。如果你发现垃圾回收的时间很长,很可能是因为系统正在进行swap。
vmstat可以看内存的使用率。
在top命令中也可以查看内存的使用率。其中有buffer跟cached两列,这两者是不同的。
我的理解是buffer是操作系统读写块设备的时候的缓冲区,并且它也保存了一些文件系统的元数据。而cache保存的是swap file的cache的,linux使用这些cache来定位需要swap in的page 的信息,以及在linux需要swap out的一些page的时候判断这些page是否需要写入swap file中。这些cache是保存在内存中的,并且可以在内存不够用的时候来占用这部分内存,所以在linux使用时间很长之后,因为缓存的page变多,所以linux中free(空闲)内存很少。
Buffers are associated with a specific block device, and cover caching of filesystem metadata as well as tracking in-flight pages. The cache only contains parked file data. That is, the buffers remember what's in directories, what file permissions are, and keep track of what memory is being written from or read to for a particular block device. The cache only contains the contents of the files themselves.
The buffer
Buffers are in-memory block I/O buffers. They are relatively short-lived. Prior to Linux kernel version 2.4, Linux had separate page and buffer caches. Since 2.4, the page and buffer cache are unified and Buffers is raw disk blocks not represented in the page cache—i.e., not file data. The Buffers metric is thus of minimal importance. On most systems, Buffers is often only tens of megabytes.
The Swap Cache
When swapping pages out to the swap files, Linux avoids writing pages if it does not have to.There are times when a page is both in a swap file and in physical memory.This happens when a page that was swapped out of memory was then brought back into memory when it was again accessed by a process.So long as the page in memory is not written to, the copy in the swap file remains valid.
Linux uses the swap cache to track these pages.The swap cache is a list of page table entries, one per physical page in the system.This is a page table entry for a swapped out page and describes which swap file the page is being held in together with its location in the swap file.If a swap cache entry is non-zero, it represents a page which is being held in a swap file that has not been modified.If the page is subsequently modified (by being written to), its entry is removed from the swap cache.
When Linux needs to swap a physical page out to a swap file it consults the swap cache and,if there is a valid entry for this page, it does not need to write the page out to the swap file.This is because the page in memory has not been modified since it was last read from the swap file.
The entries in the swap cache are page table entries for swapped out pages.They are marked as invalid but contain information which allow Linux to find the right swap file and the right page within that swap file.
Come to same question? No matter how much you put RAM in your motherboard, you quickly notice the free RAM is reduced so fast. Free RAM miscalculation? No!
Before answering this, first check the memory summary located on top's display.There, you will find two fields: buffers and cached. "Buffers" represent how much portion of RAM is dedicated to cache disk block. "Cached" is similar like "Buffers", only this time it caches pages from file reading. For thorough understanding of those terms, refer to Linux kernel book like Linux Kernel Development by Robert M. Love.
It is enough to understand that both "buffers" and "Cached" represent the size of system cache. They dynamically grow or shrink as requested by internal Linux kernel mechanism.
Besides consumed by cache, the RAM itself is also occupied by application data and code. So, to conclude, free RAM size here means RAM area that isn't occupied by cache nor application data/code. Generally,you can consider cache area as another "free" RAM since it will be shrunk gradually if the application demands more memory.
监控java程序的锁竞争不是那么容易,工具也很有限。在java5之前,HotSpotVM 几乎将所有的锁有关的逻辑代码依赖于操作系统原语实现,所以一些操作系统工具如Solaris的mpstat可以很好的监控java程序的锁竞争。然而java5 之后,HotpotVM进行了优化,将很多锁逻辑相关的代码用用户程序代码实现。所以我们得换个方式来监控java的锁竞争。
我们通过操作系统上下文的切换情况来侧面了解java程序锁竞争的情况。上下文的切换分为两种 voluntary context switch and involuntary context switch。
一个锁竞争很多的java程序意味着很多的voluntary context switch 。而voluntary context switch 是很大的,大约8000个时钟周期。
经验说法是 一个java程序如果花费了5%以上的时钟周期在voluntary context switch 上,表明这个程序有可能锁竞争太多。
使用pidstat -w 命令可以看到voluntary context switch (cswch/s列)。
计算的公式是使用这个值除以CPU的核数。然后再乘以8000再除以CPU的主频。
网络监控
分布式的java程序需要监控网络的IO情况如带宽等。
使用nicstat工具。
在java程序中避免大量的网络读写,尽量将更多的数据放在一次读写中。使用java nioframework, 如https://grizzly.java.net/
磁盘IO利用率
使用iostat -xm
如果发现很高的磁盘利用率,那么你可能需要这么做:
在硬件很操作系统层面:
1.购买一个更快的设备
2.将文件分布在不同的磁盘中
3.调优操作系统使之拥有更大的磁盘缓存。(有的操作系统默认是没有磁盘缓存的,开启的话有可能会出现其他问题如corrupted data)
在应用程序层面:
1.使用buffed input和output(减少内核级CPU的使用)
2.实现一个应用程序层面的缓存