1. What was the issue?
One day, after some changes were made, we hit an issue named "TaskRejectedException" when our service tried to put some new data into the database.
Here is an example of this exception:
2. What caused this?
There was a new requirement from our client, that we needed to widen the scope when we search for some data and then save the result set into the database.
And unfortunately, the result set size got way larger than we expected, and we did not double-check this.
After diving, we found that the original result set size was less than 20,000 entries, and the new result set size after the change made reached 200,000 size, which means 10 times bigger. No wonder we got that "rejected exception".
But why we got this particular exception? How did it work when we save data into the database?
3. Why did we hit this?
We were using a thread pool(executor) to save large-sized result sets. When a thread pool is used, some configurations are used by the pool. We can config this pool with some of its parameters, For example:
executor.setCorePoolSize(n);
executor.setMaxPoolSize(m);
executor.setQueueCapacity(x);
...
From our service, we configured the queue capacity to 20,000, and it was working properly all the time. But after the new requirement was done, when the result set size reaches 10 times bigger, you can "obviously" see what would happen.
4. How we resolved this?
Luckily, we found the root cause straightforwardly, and we changed the queue capacity size to 300,000. The exception was gone. For example:
executor.setQueueCapacity(300000);
Is it time to cheer? For us, yes, because our server's ram is big enough for this large queue size. But for some of you, it is not always so easy to solve this kind of problem. Because server resource is always "not enough". And is not enough for all of us just "make it work", not only for us developers but for the rest people in this world.
5. Better ways?
The answer is "yes".
If we dive deeper into the source code "ThreadPoolTaskExecutor.class", we can see where the exception came from:
See, if we did not config any other params for the executor, it will throw a TaskRejectedException directly, by "default".
Actually, there are four returns for this exception:
(1)ThreadPoolExecutor.AbortPolicy //default policy, when a task failed to execute, throw a RejectedExecutionException。
(2)ThreadPoolExecutor.CallerRunsPolicy //let the caller run this task. If the caller had finished, the task will be dropped.
(3)ThreadPoolExecutor.DiscardPolicy //discard this task.
(4)ThreadPoolExecutor.DiscardOldestPolicy //discard the oldest task and retry to execute this task. If failed again, then retry again.
If we stop here, we can use the "CallerRunsPolicy" to let the main thread run this task. But this will stop other threads from running new tasks until the main thread finishes this one.
So, we can mock a queue to try to avoid this exception and this blocking situation:
while (true) {
try {
executor.execute(() -> xxxx);
break;
} catch (RejectedExecutionException e) {
try {
sleepCount++;
Thread.sleep(1000);
} catch (InterruptedException ite) {
log.error("",ite);
}
} finally {
if (sleepCount > 10) {
log.error("Task tried to run for " + sleepCount + "times and still failed,drop it.");
break;
}
}
}