首先, 我贴一段某位大神调优之后的JVM启动参数。
虽然很多时候使用框架部门给我们的默认JVM启动参数程序就能跑起来了,并且不太会有问题。 但是一旦出现问题,如果不了解这些参数和背后的原理, 那解决问题和系统调优就无从说起了。
名词释义
在学习之前, 首先我们需要了解几个名词, 如果你已经了解了的话, 可以跳过这一段,直接看GC的那一段。
Stop-the-world
JVM 因为要执行GC而停止了应用程序的执行。Stop-the-world会在任何一种GC算法中发生。当Stop-the-world发生时,除了GC所需的线程以外,所有线程都处于等待状态,直到GC任务完成。GC优化很多时候就是指减少Stop-the-world发生的时间。
Minor GC
是指新生代的GC过程。(分代机制这里就不讲了, 不清楚的同学可以看看书)。
Full GC
或者叫major GC,是指老年代和永久代的GC。正常情况下,Full GC要比Minor GC的次数少的多, 如果程序频繁的进行Full GC, 那就一定是哪里出现了问题,需要进行JVM参数调优或者修改应用程序的代码了。
有一个说法是Full GC的时候一定会先进行Minor GC,其实这个是不对的。默认Full GC之前并不会进行Minor GC,但是,因为老年代很多对象都会引用到新生代的对象,先进行一次Minor GC可以提高老年代GC的速度, 所以我们可以修改启动参数,让Full GC之前先进行一次Minor GC。比如老年代使用CMS时,设置CMSScavengeBeforeRemark优化,让CMS remark之前先进行一次Minor GC。
Bump-The-Pointer和TLABs(Thread-Local Allocation Buffers)
这两个技术都是HotSpot虚拟机用来加快内存分配的。
Bump-The-Pointer跟踪在Eden创建的最后一个对象。这个对象会被放在Eden区的顶部。如果之后再需要创建对象,只需要检查Eden区是否有足够的剩余空间。如果有足够的空间,对象就会被创建在Eden区,并且被放置在顶部。这样一来,每次创建新的对象时,只需要检查最后被创建的对象。这将极大地加快内存分配速度。但是,如果我们在多线程的情况下,事情将截然不同。如果想要以线程安全的方式以多线程在Eden空间存储对象,不可避免的需要加锁,而这将极大地的影响性能。
TLABs 是HotSpot虚拟机针对这一问题的解决方案。该方案为每一个线程在Eden空间分配一块独享的空间,这样每个线程只访问他们自己的TLAB空间,再与bump-the-pointer技术结合可以在不加锁的情况下分配内存。你可以通过‐XX:+UseTLAB和‐XX:TLABSize来开启和设置TLAB的大小。默认每个线程是32K。
Bump-The-Pointer和TLABs 的参考文章如下:
When new objectsare allocated on the heap, if TLAB ( Thread Local Allocation Buffers ) areenabled, the object will first be placed in the TLAB, this buffer only existswithin eden space. Each thread has its own TLAB to allow faster memoryallocation, as the thread is able to allocate additional memory within the bufferwithout a lock. The TLAB is pre allocated for each thread. As a thread usesmemory within the TLAB it moves a pointer accordingly.
To enable TLAB set‐XX:+UseTLAB, You can set the size allocated to the the TLAB via ‐XX:TLABSize,its default size is 32k or if you prefer you can use the ‐XX:+ResizeTLAB toallow dynamic resizing of the TLAB.
Using TLAB, usesmore of your Eden space, but you may get a slight performance benefit whencreating objects.
The amount ofmemory allocated to all your TLAB's will be proportional to the number ofthreads in your application.
GC
这个图是sun的一位JVM工程师08年的时候在他的博客中贴出来的。文章名:《Our Collectors》,博客地址:https://blogs.oracle.com/jonthecollector/our-collectors。
从上图我们可以清晰的看出, 新生代的收集器有三种Serial、ParNew、ParallelScavenge。老年代的收集器有CMS、Serial Old、Parallel Old。当然,因为博客比较早了,少了G1。(但其实,那个蓝色问号代表的就是G1,G1也是他们项目组那个时候正在开发中的收集器。)
下面的内容,基本上就是把博客翻译了一遍。
首先是新生代的3个收集器
Serial
Serial是一个单线程的GC。我们知道, 分代GC的过程如下:将Eden里的对象移动到S1或者S2,S1或者S2满了之后,就对S1或者S2进行GC,S1和S2切换达到一定的次数之后,如果对象还没有被回收,就进入老年代。所以,涉及到三个copy的过程:Eden到S0或S1,S0和S1之间相互copy, S0或S1到old。单线程指的是, 这个copy的过程是单线程的。
ParNew
ParNew, 跟Serial的区别就是copy的过程是多线程的,跟Parallel Scavenge的区别就是,他是Parallel Scavenge的增强版, 他可以和CMS一起使用。
Parallel Scavenge
Parallel Scavenge, 同样是Serial的多线程版本。
我们再来看下老年代的三个收集器:
Serial Old
Serial Old, 采用mark-sweep-compact算法来进行老年代的回收,并且是单线程的。mark-sweep-compact算法是啥, 我就不介绍了。
CMS
CMS, 是老年代的低延时的, 并发的收集器。
Parallel Old
Parallel Old, 是Serial Old的多线程版本。
JVM里的GC参数
下面我们再来看下JVM里关于使用某种收集器的参数,都是啥意思。
+UseSerialGC 表示使用Serial作为新生代收集器, Serial Old作为老年代收集器。
+UseParNewGC 表示ParNew作为新生带收集器,Serial Old作为老年代收集器。
+UseConcMarkSweepGC 表示采用"ParNew" +
"CMS" + "Serial Old"。其中ParNew作为新生代收集器。CMS和Serial Old作为老年代收集器。其中CMS默认,当CMS并发收集Fail的时候,采用Serial Old作为备选方案。
+UseParallelGC 表示Parallel Scavenge作为新生代收集器,Serial Old作为老年代收集器。
+UseParallelOldGC 表示Parallel Scavenge作为新生代收集器,Parallel Old作为老年代收集器。
Questions
问题1:
+UseParNewGC和+UseParallelGC的唯一区别就是UseParNewGC采用ParNew作为新生代收集器,UseParallelGC采用Parallel Scavenge作为新生代收集器。那么,同样是新生代的并发收集,哪一个更快呢?
答案:
不好说,大部分情况都是相等的, 但是我也见过不同场景下,一个比另一个表现的要好。
问题2:
为啥ParNew和Parallel Old,不能一起工作呢?
答案:
简单点说, 就是Parallel Scavenge和Parallel Old,不是用同一套接口实现的,而其他四个就是一套接口实现的。跟算法、技术啥都没关系,完全就是人为造成的。
翻译过来的东西可能大家不太好懂, 我就直接参考了知乎的一位大神的回答了,当然如果大家不感兴趣的话也可以跳过。
原本hotspot里是没有并行GC的, 只有串行GC,后来,准备要加入新生代的并行GC。本来HotSpot VM鼓励开发者尽量在这个框架内开发GC,但后来有个开发就是不愿意被这框架憋着,自己硬写了个没有使用已有框架的新并行GC,并拉拢性能测试团队用这个并行GC来跑分,成绩也还不错,于是这个GC就放进HotSpot VM里了。这就是我们现在看到的ParallelScavenge。其实最初的ParallelScavenge的目标只是并行收集young gen,而full GC的实际实现还是跟serial GC一样。只不过因为它没有用HotSpot VM的generational GC framework,自己实现了一个CollectedHeap的子类ParallelScavengeHeap,里面都弄了独立的一套接口,而跟HotSpot当时其它几个GC不兼容。其实真的有用的代码大部分就在PSScavenge(=“ParallelScavenge的Scavenge”)里,也就是负责minor GC的收集器;而负责full GC的收集器叫做PSMarkSweep(=“ParallelScavenge的MarkSweep”),其实只是在serial GC的核心外面套了层皮而已,骨子里是一样的LISP2算法的mark-compact收集器(别被名字骗了,它并不是一个mark-sweep收集器)。
当启用-XX:+UseParallelGC时,用的就是PSScavenge+PSMarkSweep的组合。
这是名副其实的“ParallelScavenge”——只并行化了“scavenge”。
所以其实非要说对应关系的话,PSScavenge才是真的跟ParNew对等的东西;ParallelScavenge这个名字既指代整套新GC,也可指代其真正卖点的PSScavenge。
不知道后来什么原因导致full
GC的并行化并没有在原本的generational GC framework上进行,而只在ParallelScavenge系上进行了。其成果就是使用了LISP2算法的并行版的full GC收集器,名为PSCompact(=“ParallelScavenge-MarkCompact”),收集整个GC堆。
问题3:
如果我想新生代用Serial,老年代用CMS,那我应该怎么配置?
答案:
-XX:+UseConcMarkSweepGC -XX:-UseParNewGC,千万不能用-XX:+UseConcMarkSweepGC and
-XX:+UseSerialGC.那样会导致JVM启动的时候报配置冲突的错误,尽管后面的配置看起来更加合理。好吧,我承认这个是我们的锅,我们设计的不好。
问题4:
那个蓝色框里的问号是什么鬼?
答案:
那个问号是我们正在开发的一个新的收集器,叫做Garbage First, 简称G1。
我们预期的G1,将会是一个比ParNew+CMS更少停顿的收集器。如果你想要问我啥时候这东西会出来的话, 我只能告诉你, 这是我们项目组最高机密。但是他会随着JDK7一起出来。
http://portal.acm.org/citation.cfm?id=1029879这个是我(Jon)发的G1的论文,如果你有ACM的账号的,你可以去下载。(坑爹的,居然要钱,那我还是看免费的介绍吧)
翻译就到这里了, 相信大家也明白了这几个GC和参数之间的关系。下一章,我们将会讲解下CMS和G1的工作过程。
为了方便大家阅读, 我把原文copy下来了。
Our Collectors
By: Guest Author
I drew this diagram on a white board for some customers recently. They seemed to
like it (or were just being very polite) so I thought I redraw it for your
amusement.
Each blue box represents a collector that is used to collect a generation. The
young generation is collected by the blue boxes in the yellow region and
the tenured generation is collected by the blue boxes in the gray region.
"Serial" is a stop-the-world, copying collector which uses a single GC thread.
"ParNew" is a stop-the-world, copying collector which uses multiple GC threads. It differs
from "Parallel Scavenge" in that it has enhancements that make it usable
with CMS. For example, "ParNew" does the
synchronization needed so that it can run during the
concurrent phases of CMS.
"Parallel Scavenge" is a stop-the-world, copying collector
which uses multiple GC threads.
"Serial Old" is a stop-the-world,
mark-sweep-compact collector that uses a single GC thread.
"CMS" is a mostly concurrent, low-pause collector.
"Parallel Old" is a compacting collector that uses multiple GC threads.
Using the -XX flags for our collectors for jdk6,
UseSerialGC is "Serial" + "Serial Old"
UseParNewGC is "ParNew" + "Serial Old"
UseConcMarkSweepGC is "ParNew" + "CMS" + "Serial Old". "CMS" is used most of the time to collect the tenured generation. "Serial Old" is used when a concurrent mode failure occurs.
UseParallelGC is "Parallel Scavenge" + "Serial Old"
UseParallelOldGC is "Parallel Scavenge" + "Parallel Old"
FAQ
1) UseParNew and UseParallelGC both collect the young generation using
multiple GC threads. Which is faster?
There's no one correct answer for
this questions. Mostly they perform equally well, but I've seen one
do better than the other in different situations. If you want to use
GC ergonomics, it is only supported by UseParallelGC (and UseParallelOldGC)
so that's what you'll have to use.
2) Why doesn't "ParNew" and "Parallel Old" work together?
"ParNew" is written in
a style where each generation being collected offers certain interfaces for its
collection. For example, "ParNew" (and "Serial") implements
space_iterate() which will apply an operation to every object
in the young generation. When collecting the tenured generation with
either "CMS" or "Serial Old", the GC can use space_iterate() to
do some work on the objects in the young generation.
This makes the mix-and-match of collectors work but adds some burden
to the maintenance of the collectors and to the addition of new
collectors. And the burden seems to be quadratic in the number
of collectors.
Alternatively, "Parallel Scavenge"
(at least with its initial implementation before "Parallel Old")
always knew how the tenured generation was being collected and
could call directly into the code in the "Serial Old" collector.
"Parallel Old" is not written in the "ParNew" style so matching it with
"ParNew" doesn't just happen without significant work.
By the way, we would like to match "Parallel Scavenge" only with
"Parallel Old" eventually and clean up any of the ad hoc code needed
for "Parallel Scavenge" to work with both.
Please don't think too much about the examples I used above. They
are admittedly contrived and not worth your time.
3) How do I use "CMS" with "Serial"?
-XX:+UseConcMarkSweepGC -XX:-UseParNewGC.
Don't use -XX:+UseConcMarkSweepGC and -XX:+UseSerialGC. Although that's seems like
a logical combination, it will result in a message saying something about
conflicting collector combinations and the JVM won't start. Sorry about that.
Our bad.
4) Is the blue box with the "?" a typo?
That box represents the new garbage collector that we're currently developing called
Garbage First or G1 for short. G1 will provide
More predictable GC pauses
Better GC ergonomics
Low pauses without fragmentation
Parallelism and concurrency in collections
Better heap utilization
G1 straddles the young generation - tenured generation boundary because it is
a generational collector only in the logical sense. G1 divides the
heap into regions and during a GC can collect a subset of the regions.
It is logically generational because it dynamically selects a set of
regions to act as a young generation which will then be collected at
the next GC (as the young generation would be).
The user can specify a goal for the pauses and G1
will do an estimate (based on past collections) of how many
regions can be collected in that time (the pause goal).
That set of regions is called a collection set and G1 will
collect it during the next GC.
G1 can choose the regions with the most garbage to collect first (Garbage First, get it?)
so gets the biggest bang for the collection buck.
G1 compacts so fragmentation is much less a problem. Why is it a problem at all?
There can be internal fragmentation due to partially filled regions.
The heap is not statically divided into
a young generation and a tenured generation so the problem of
an imbalance in their sizes is not there.
Along with a pause time goal the user can specify a goal on the fraction of
time that can be spent on GC during some period (e.g., during the next 100 seconds
don't spend more than 10 seconds collecting). For such goals (10 seconds of
GC in a 100 second period) G1 can choose a collection set that it expects it can collect in 10 seconds and schedules the collection 90 seconds (or more) from the previous collection. You can see how an evil user could specify 0 collection
time in the next century so again, this is just a goal,
not a promise.
If G1 works out as we expect, it will become our low-pause collector in place of
"ParNew" + "CMS". And if you're about to ask when will it be ready, please don't
be offended by my dead silence. It's the highest priority project for our team,
but it is software development so there are the usual unknowns. It will be out
by JDK7. The sooner the better as far as we're concerned.
Updated February 4. Yes, I can edit an already posted blog. Here's
a reference to the G1 paper if you have ACM portal access.
http://portal.acm.org/citation.cfm?id=1029879