Handler消息机制,在Java层的MessageQueue中,调用了native方法实现消息循环的休眠和唤醒(也就是线程的休眠和唤醒)。native方法nativePollOnce实现了消息循环的休眠,而方法nativeWake实现了消息循环的唤醒。
在native层,消息循环的休眠和唤醒使用了Linux内核的epoll机制和pipe。下面我们简单介绍一下epool机制是什么?
我们首先来介绍什么是文件描述符?
在Linux系统中,把一切都看做是文件,当进程打开现有文件或创建新文件时,内核向进程返回一个文件描述符,文件描述符就是内核为了高效管理已被打开的文件所创建的索引,用来指向被打开的文件,所有执行I/O操作的系统调用都会通过文件描述符。
内核利用文件描述符来访问文件,文件描述符是非负整数。打开现存文件或新建文件时,内核会返回一个文件描述符,读写文件也需要使用文件描述符来指定待读写的文件。
pipe是Linux中最基本的一种IPC机制,可以用来实现进程、线程间通信。
首先使用pipe创建两个fd:writeFD、readFD。当线程A想唤醒线程B的时候,就可以往writeFD中写数据,这样线程B阻塞在readFD中就能返回。
也就是说,当文件描述符指向的文件(内核缓冲区)为空时,则线程进行休眠。当另一个线程向缓冲区写入内容时,则将当前线程进行唤醒。
Linux的select和poll实现了文件流的I/O监控,它可以同时观察许多流的I/O事件,在空闲的时候,会把当前线程阻塞掉,当有一个或多个流有I/O事件时,就从阻塞态中醒来。但我们从select那里仅仅知道了,有I/O事件发生了,并不知道事件是谁产生的,我们只能无差别轮询所有文件流。
基于select和poll的性能问题,Linux使用了一种叫epoll的机制,epoll可以理解为event poll,epoll会把哪个流发生了怎样的I/O事件通知我们,系统内核中使用红黑树维护了一张表来负责维护文件流和事件的注册列表。
epoll是在2.6内核中提出的,是select和poll的增强版。相对于select和poll来说,epoll更加灵活,没有描述符数量限制。epoll使用一个文件描述符管理多个描述符,将用户空间的文件描述符的事件存放到内核的一个事件表中,这样在用户空间和内核空间的copy只需一次。epoll机制是Linux最高效的I/O复用机制,在一处等待多个文件句柄的I/O事件。
对于Handler中的Looper来说,所谓的I/O事件就是所监控的文件描述符上没有有数据到达。
epoll支持四类事件:
需要注意的是,当创建好epoll句柄后,它就是会占用一个fd值,可以在linux下/proc/进程id/fd/看到这个fd,所以在使用完epoll后,必须调用close()关闭,否则可能导致fd被耗尽。
struct epoll_event {
uint32_t events; /* epoll events (bit mask) */
epoll_data_t data; /* User data */
};
我们用到的事件有:EPOLLIN(代表有数据可读)、EPOLLOUT(代表有数据可写)。
typedef union epoll_data {
void *ptr; /* Pointer to user-defined data */
int fd; /*File descriptor */
uint32_t u32; /* 32-bit integer */
uint64_t u64; /* 64-bit integer */
} epoll_data_t;
从Java层我们调用了native方法nativePollOnce实现了消息循环的休眠,而方法nativeWake实现了消息循环的唤醒。
我们来看它们对应的native方法在哪里注册的:
base/core/jni/android_os_MessageQueue.cpp
static const JNINativeMethod gMessageQueueMethods[] = {
/* name, signature, funcPtr */
{ "nativeInit", "()J", (void*)android_os_MessageQueue_nativeInit },
{ "nativeDestroy", "(J)V", (void*)android_os_MessageQueue_nativeDestroy },
{ "nativePollOnce", "(JI)V", (void*)android_os_MessageQueue_nativePollOnce },
{ "nativeWake", "(J)V", (void*)android_os_MessageQueue_nativeWake },
{ "nativeIsPolling", "(J)Z", (void*)android_os_MessageQueue_nativeIsPolling },
{ "nativeSetFileDescriptorEvents", "(JII)V",
(void*)android_os_MessageQueue_nativeSetFileDescriptorEvents },
};
在Java层MessageQueue的构造函数中,调用了nativeInit()方法:
private native static long nativeInit();
MessageQueue(boolean quitAllowed) {
mQuitAllowed = quitAllowed;
mPtr = nativeInit();
}
nativeInit()方法是一个native方法,它返回了一个int值,作为native层在Java层的一个指针引用。
我们来看native层的实现。
static jlong android_os_MessageQueue_nativeInit(JNIEnv* env, jclass clazz) {
NativeMessageQueue* nativeMessageQueue = new NativeMessageQueue();
if (!nativeMessageQueue) {
jniThrowRuntimeException(env, "Unable to allocate native queue");
return 0;
}
nativeMessageQueue->incStrong(env);
return reinterpret_cast(nativeMessageQueue);
}
NativeMessageQueue::NativeMessageQueue() :
mPollEnv(NULL), mPollObj(NULL), mExceptionObj(NULL) {
mLooper = Looper::getForThread();
if (mLooper == NULL) {
mLooper = new Looper(false);
Looper::setForThread(mLooper);
}
}
/system/core/libutils/Looper.cpp
Looper::Looper(bool allowNonCallbacks)
: mAllowNonCallbacks(allowNonCallbacks),
mSendingMessage(false),
mPolling(false),
mEpollRebuildRequired(false),
mNextRequestSeq(0),
mResponseIndex(0),
mNextMessageUptime(LLONG_MAX) {
mWakeEventFd.reset(eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC));//初始化一个管道,用于唤醒Looper
LOG_ALWAYS_FATAL_IF(mWakeEventFd.get() < 0, "Could not make wake event fd: %s", strerror(errno));
AutoMutex _l(mLock);
rebuildEpollLocked();
}
初始化一个管道,用于唤醒Looper。mWakeEventFd是一个android::base::unique_fd类型,它的内部实现了pipe管道的创建。
void Looper::rebuildEpollLocked() {
// 关闭旧的epoll实例
if (mEpollFd >= 0) {
#if DEBUG_CALLBACKS
ALOGD("%p ~ rebuildEpollLocked - rebuilding epoll set", this);
#endif
mEpollFd.reset();
}
// Allocate the new epoll instance and register the wake pipe.
// 创建一个epoll实例并注册唤醒管道
mEpollFd.reset(epoll_create1(EPOLL_CLOEXEC));
LOG_ALWAYS_FATAL_IF(mEpollFd < 0, "Could not create epoll instance: %s", strerror(errno));
struct epoll_event eventItem;//epoll要监控的事件
// 清空,把未使用的数据区域置0
memset(& eventItem, 0, sizeof(epoll_event)); // zero out unused members of data field union
eventItem.events = EPOLLIN;// EPOLLIN可读事件
eventItem.data.fd = mWakeEventFd.get();//设置事件对应的fd
int result = epoll_ctl(mEpollFd.get(), EPOLL_CTL_ADD, mWakeEventFd.get(), &eventItem);//唤醒管道添加到epoll监听
LOG_ALWAYS_FATAL_IF(result != 0, "Could not add wake event fd to epoll instance: %s",
strerror(errno));
for (size_t i = 0; i < mRequests.size(); i++) {
const Request& request = mRequests.valueAt(i);
struct epoll_event eventItem;
request.initEventItem(&eventItem);
int epollResult = epoll_ctl(mEpollFd.get(), EPOLL_CTL_ADD, request.fd, &eventItem);
if (epollResult < 0) {
ALOGE("Error adding epoll events for fd %d while rebuilding epoll set: %s",
request.fd, strerror(errno));
}
}
}
rebuildEpollLocked()方法负责创建一个epoll对象,然后将文件描述符事件添加到epoll监控,这里添加的是可读事件,当文件流可读时,则唤醒线程。
base/core/jni/android_os_MessageQueue.cpp
static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj,
jlong ptr, jint timeoutMillis) {
NativeMessageQueue* nativeMessageQueue = reinterpret_cast(ptr);
nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
}
void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
mPollEnv = env;
mPollObj = pollObj;
mLooper->pollOnce(timeoutMillis);
mPollObj = NULL;
mPollEnv = NULL;
if (mExceptionObj) {
env->Throw(mExceptionObj);
env->DeleteLocalRef(mExceptionObj);
mExceptionObj = NULL;
}
}
这里调用了Looper对象的pollOnce方法实现线程的休眠操作。
system/core/libutils/include/utils/Looper.h
system/core/libutils/Looper.cpp
int pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData);
inline int pollOnce(int timeoutMillis) {
return pollOnce(timeoutMillis, nullptr, nullptr, nullptr);
}
int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
int result = 0;
for (;;) {
while (mResponseIndex < mResponses.size()) {// 先处理没有Callback方法的Response事件
const Response& response = mResponses.itemAt(mResponseIndex++);
int ident = response.request.ident;
if (ident >= 0) {//表示没有callback, 因为POLL_CALLBACK的值是-2
int fd = response.request.fd;
int events = response.events;
void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
"fd=%d, events=0x%x, data=%p",
this, ident, fd, events, data);
#endif
if (outFd != nullptr) *outFd = fd;
if (outEvents != nullptr) *outEvents = events;
if (outData != nullptr) *outData = data;
return ident;
}
}
if (result != 0) {
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - returning result %d", this, result);
#endif
if (outFd != nullptr) *outFd = 0;
if (outEvents != nullptr) *outEvents = 0;
if (outData != nullptr) *outData = nullptr;
return result;
}
result = pollInner(timeoutMillis);
}
}
这里一个参数的pollOnce方法,调用了4个参数的pollOnce方法,除了休眠时间外,其他参数都为nullptr。
首先定义了一个无限循环,在循环内部,先处理没有Callback方法的Response事件。然后,调用pollInner进行文件描述符事件的处理。
int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
int result = 0;
for (;;) {
while (mResponseIndex < mResponses.size()) {// 先处理没有Callback方法的Response事件
const Response& response = mResponses.itemAt(mResponseIndex++);
int ident = response.request.ident;
if (ident >= 0) {//表示没有callback, 因为POLL_CALLBACK的值是-2
int fd = response.request.fd;
int events = response.events;
void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
"fd=%d, events=0x%x, data=%p",
this, ident, fd, events, data);
#endif
if (outFd != nullptr) *outFd = fd;
if (outEvents != nullptr) *outEvents = events;
if (outData != nullptr) *outData = data;
return ident;
}
}
if (result != 0) {
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - returning result %d", this, result);
#endif
if (outFd != nullptr) *outFd = 0;
if (outEvents != nullptr) *outEvents = 0;
if (outData != nullptr) *outData = nullptr;
return result;
}
result = pollInner(timeoutMillis);
}
}
int Looper::pollInner(int timeoutMillis) {
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
#endif
// Adjust the timeout based on when the next message is due.
if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) {
nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
if (messageTimeoutMillis >= 0
&& (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
timeoutMillis = messageTimeoutMillis;
}
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - next message in %" PRId64 "ns, adjusted timeout: timeoutMillis=%d",
this, mNextMessageUptime - now, timeoutMillis);
#endif
}
// Poll.
int result = POLL_WAKE;
mResponses.clear();
mResponseIndex = 0;
// We are about to idle. //即将处于闲置状态
mPolling = true;
struct epoll_event eventItems[EPOLL_MAX_EVENTS];
int eventCount = epoll_wait(mEpollFd.get(), eventItems, EPOLL_MAX_EVENTS, timeoutMillis);//等待已注册在epoll的文件描述符事件发生,如果有事件发生,则把事件返回到eventItems数组中。
// No longer idling.
mPolling = false; //不再处于闲置状态
// Acquire lock.
mLock.lock();
// Rebuild epoll set if needed.
if (mEpollRebuildRequired) { //重置epoll
mEpollRebuildRequired = false;
rebuildEpollLocked();
goto Done;
}
// Check for poll error.
if (eventCount < 0) {//错误处理
if (errno == EINTR) {
goto Done;
}
ALOGW("Poll failed with an unexpected error: %s", strerror(errno));
result = POLL_ERROR;
goto Done;
}
// Check for poll timeout.
if (eventCount == 0) {//没有对应的事件发生,表示已超时返回
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - timeout", this);
#endif
result = POLL_TIMEOUT;
goto Done;
}
// Handle all events.
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount);
#endif
//处理所有的返回事件
for (int i = 0; i < eventCount; i++) {
int fd = eventItems[i].data.fd;
uint32_t epollEvents = eventItems[i].events;
if (fd == mWakeEventFd.get()) {//唤醒文件描述符相关事件
if (epollEvents & EPOLLIN) {//是我们注册的读事件
awoken();//读取当前线程关联的管道的数据
} else {
ALOGW("Ignoring unexpected epoll events 0x%x on wake event fd.", epollEvents);
}
} else {//处理其他文件描述符事件
ssize_t requestIndex = mRequests.indexOfKey(fd);
if (requestIndex >= 0) {
int events = 0;
if (epollEvents & EPOLLIN) events |= EVENT_INPUT;
if (epollEvents & EPOLLOUT) events |= EVENT_OUTPUT;
if (epollEvents & EPOLLERR) events |= EVENT_ERROR;
if (epollEvents & EPOLLHUP) events |= EVENT_HANGUP;
pushResponse(events, mRequests.valueAt(requestIndex));//处理request,生成对应的reponse对象,push到响应数组
} else {
ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is "
"no longer registered.", epollEvents, fd);
}
}
}
Done: ;
// Invoke pending message callbacks.
mNextMessageUptime = LLONG_MAX;
while (mMessageEnvelopes.size() != 0) { //处理native的Message消息,然后调用相应回调方法
nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0);
if (messageEnvelope.uptime <= now) {
// Remove the envelope from the list.
// We keep a strong reference to the handler until the call to handleMessage
// finishes. Then we drop it so that the handler can be deleted *before*
// we reacquire our lock.
{ // obtain handler
sp handler = messageEnvelope.handler;
Message message = messageEnvelope.message;
mMessageEnvelopes.removeAt(0);
mSendingMessage = true;
mLock.unlock();
#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d",
this, handler.get(), message.what);
#endif
handler->handleMessage(message);//执行消息处理
} // release handler
mLock.lock();
mSendingMessage = false;
result = POLL_CALLBACK;//返回callback类型
} else {
// The last message left at the head of the queue determines the next wakeup time.
mNextMessageUptime = messageEnvelope.uptime;
break;
}
}
// Release lock.
mLock.unlock();
//处理带有Callback()方法的Response事件,执行Reponse相应的回调方法
for (size_t i = 0; i < mResponses.size(); i++) {
Response& response = mResponses.editItemAt(i);
if (response.request.ident == POLL_CALLBACK) {
int fd = response.request.fd;
int events = response.events;
void* data = response.request.data;
#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
this, response.request.callback.get(), fd, events, data);
#endif
// Invoke the callback. Note that the file descriptor may be closed by
// the callback (and potentially even reused) before the function returns so
// we need to be a little careful when removing the file descriptor afterwards.
int callbackResult = response.request.callback->handleEvent(fd, events, data);// 处理请求的回调方法
if (callbackResult == 0) {
removeFd(fd, response.request.seq);
}
// Clear the callback reference in the response structure promptly because we
// will not clear the response vector itself until the next poll.
response.request.callback.clear();//清除reponse引用的回调方法
result = POLL_CALLBACK;//返回callback类型
}
}
return result;
}
void Looper::awoken() {
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ awoken", this);
#endif
uint64_t counter;
TEMP_FAILURE_RETRY(read(mWakeEventFd.get(), &counter, sizeof(uint64_t)));
}
这里将与当前线程所关联的管道数据读出来,以便可以清理这个管道的数据。TEMP_FAILURE_RETRY表示忽略系统中断造成的错误。
Java层的MessageQueue中,(例如,enqueueMessage方法)调用了native方法nativeWake(mPtr)来实现消息循环的唤醒操作。我们来看native层是如何实现的。
base/core/jni/android_os_MessageQueue.cpp
static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
NativeMessageQueue* nativeMessageQueue = reinterpret_cast(ptr);
nativeMessageQueue->wake();
}
void NativeMessageQueue::wake() {
mLooper->wake();
}
android_os_MessageQueue_nativeWake方法调用了NativeMessageQueue的wake(),NativeMessageQueue的wake()又调用了Looper的wake方法。
void Looper::wake() {
#if DEBUG_POLL_AND_WAKE
ALOGD("%p ~ wake", this);
#endif
uint64_t inc = 1;
ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd.get(), &inc, sizeof(uint64_t)));
if (nWrite != sizeof(uint64_t)) {
if (errno != EAGAIN) {
LOG_ALWAYS_FATAL("Could not write wake signal to fd %d (returned %zd): %s",
mWakeEventFd.get(), nWrite, strerror(errno));
}
}
}
这里的实现非常简单,只是向文件描述符mWakeEventFd中,写入数据1即可。
这里通常会在子线程中调用,在文件描述符mWakeEventFd中写入数据后,通过epoll机制的监听,在主线程端接收到该事件之后,就会唤醒主线程。
本文从native层分析了Looper消息循环休眠和唤醒的实现原理。这也是UI线程进入Looper的无限循环之后,没有ANR的原因,低层使用了Linux的epoll机制和pipe实现了线程在空闲时的休眠,并且不会占用系统CPU资源。
本章重点内容如下: