线上服务启动卡死,堆栈分析

背景

服务启动时候会从mysql加载数据到es中,测试环境正常,线上异常卡住,不动。

查看堆栈信息

关键点


"elasticsearch[_client_][generic][T#5]" #843 daemon prio=5 os_prio=0 tid=0x00007fb3ec007000 nid=0x601b waiting on condition [0x00007fb1b5596000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006ef4aee60> (a org.elasticsearch.common.util.concurrent.EsExecutors$ExecutorScalingQueue)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:734)
        at java.util.concurrent.LinkedTransferQueue.xfer(LinkedTransferQueue.java:647)
        at java.util.concurrent.LinkedTransferQueue.poll(LinkedTransferQueue.java:1273)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

"MySQL Statement Cancellation Timer" #839 daemon prio=5 os_prio=0 tid=0x00007fb698005000 nid=0x5c16 in Object.wait() [0x00007fb1a266c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.util.TimerThread.mainLoop(Timer.java:526)
        - locked <0x00000006f728f0a0> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:505)

   Locked ownable synchronizers:
        - None

"MySQL Statement Cancellation Timer" #838 daemon prio=5 os_prio=0 tid=0x00007fb688008000 nid=0x5c15 in Object.wait() [0x00007fb1a276d000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at java.util.TimerThread.mainLoop(Timer.java:526)
        - locked <0x00000006f729d658> (a java.util.TaskQueue)
        at java.util.TimerThread.run(Timer.java:505)

   Locked ownable synchronizers:
        - None

"TotalParallelLoad-pool-parallelLoad-thread-200" #837 prio=5 os_prio=0 tid=0x00007fb47d15b800 nid=0x5c14 waiting on condition [0x00007fb1a286e000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000006f937a600> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
        at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - None

线上服务启动卡死,堆栈分析_第1张图片

也就是

线上服务启动卡死,堆栈分析_第2张图片

分析

这一步,就是等待队列为非空的时候,才会执行下去,但是现在队列一直为空,线程都在等待。

因为我加载mysql采用的是多线程方式,且通过数据量的最大id/ 1000(每次加载1000) 个线程数。
所以,这次查看,直接给我整了1W多个线程,再次查看总数据量总共才40W,那么1w * 1000,那岂不是1000W了,所以,我怀疑表数据的id有问题,后面发现果然是,id是从1000W多开始的,而不是从0开始,那么就导致前面的数据队列任务个数一直为0。所以引起等待。后面解决了这个问题,程序又恢复了正常。

 private void mainLoop() {
        while (true) {
            try {
                TimerTask task;
                boolean taskFired;
                synchronized(queue) {
                    // Wait for queue to become non-empty
                    while (queue.isEmpty() && newTasksMayBeScheduled)
                        queue.wait();
                    if (queue.isEmpty())
                        break; // Queue is empty and will forever remain; die

                    // Queue nonempty; look at first evt and do the right thing
                    long currentTime, executionTime;
                    task = queue.getMin();
                    synchronized(task.lock) {
                        if (task.state == TimerTask.CANCELLED) {
                            queue.removeMin();
                            continue;  // No action required, poll queue again
                        }
                        currentTime = System.currentTimeMillis();
                        executionTime = task.nextExecutionTime;
                        if (taskFired = (executionTime<=currentTime)) {
                            if (task.period == 0) { // Non-repeating, remove
                                queue.removeMin();
                                task.state = TimerTask.EXECUTED;
                            } else { // Repeating task, reschedule
                                queue.rescheduleMin(
                                  task.period<0 ? currentTime   - task.period
                                                : executionTime + task.period);
                            }
                        }
                    }
                    if (!taskFired) // Task hasn't yet fired; wait
                        queue.wait(executionTime - currentTime);
                }
                if (taskFired)  // Task fired; run it, holding no locks
                    task.run();
            } catch(InterruptedException e) {
            }
        }
    }
}

你可能感兴趣的:(报错,JVM)