CTS问题分析9

CTS/GTS问题分析9

一个由kernel层驱动错误导致的错误,查明的过程比较曲折,因此记录一下

问题初探

测试命令: run cts -m CtsHardwareTestCases -t android.hardware.input.cts.tests.GamepadTestCase#testButtonA

报错如下:


  
    
      
        java.lang.AssertionError: expected:<1> but was:<0>
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:834)
  at org.junit.Assert.assertEquals(Assert.java:645)
  at org.junit.Assert.assertEquals(Assert.java:631)
  at android.hardware.input.cts.tests.InputTestCase.assertReceivedKeyEvent(InputTestCase.java:145)
  at android.hardware.input.cts.tests.GamepadTestCase.testButtonA(GamepadTestCase.java:41)

测试case code如下:

32    @Test
33    public void testButtonA() throws Exception {
34        registerInputDevice(R.raw.gamepad_register_device);
35
36        sendHidCommands(R.raw.gamepad_button_a_down);
37        sendHidCommands(R.raw.gamepad_delay);
38        assertReceivedKeyEvent(KeyEvent.ACTION_DOWN, KeyEvent.KEYCODE_BUTTON_A);
39
40        sendHidCommands(R.raw.gamepad_button_a_up);
41        assertReceivedKeyEvent(KeyEvent.ACTION_UP, KeyEvent.KEYCODE_BUTTON_A);
42
43        assertNoMoreEvents();
44    }

那么看出流程大致如下: 首先注册一个名为Odie (Test)的hid device,首先发送一个down事件,延迟10ms,检查测试界面是否接收到了down事件;然后再传递一个up事件,然后检查测试界面是否接收到了up事件

case栈报错的原因是应该接收到up事件时接收到的还是down事件,因此case报错

问题分析

(a)分析过程1:
首先直觉怀疑是不是case的时间设置问题导致发送的down事件被测试页面当作了长按,这个验证很简单,简单修改下case

206    private class InputListener implements InputCallback {
207        @Override
208        public void onKeyEvent(KeyEvent ev) {
209            boolean done = false;
210            do {
211                try {
212                    //mKeys.put(new KeyEvent(ev));
                       //改为:
                       if(ev.getRepeatCount()==0){
                          mKeys.put(new KeyEvent(ev));
                       }
213                    done = true;
214                } catch (InterruptedException ignore) { }
215            } while (!done);
216        }
   结果发现case还是fail,报错变为不能得到up事件,因此这里就可以证明,cts的测试页面InputCtsActivity并没有接收到up事件,侧面证明下,我们在测试过之前打开/dev/input,执行getevent -ltr,发现:

异常F1:

add device 8: /dev/input/event7
  name:     "Odie (Test)"
[   24251.158735] /dev/input/event7: EV_LED       LED_NUML             00000001           
[   24251.158735] /dev/input/event7: EV_MSC       MSC_SCAN             00090001           
[   24251.158735] /dev/input/event7: EV_KEY       BTN_GAMEPAD          DOWN               
[   24251.158735] /dev/input/event7: EV_ABS       ABS_X                0000007f           
[   24251.158735] /dev/input/event7: EV_ABS       ABS_Y                0000007f           
[   24251.158735] /dev/input/event7: EV_ABS       ABS_Z                0000007f           
[   24251.158735] /dev/input/event7: EV_ABS       ABS_RZ               0000007f           
[   24251.158735] /dev/input/event7: EV_SYN       SYN_REPORT           00000000             rate 0
case 卡住,强制结束case
[   24416.424598] /dev/input/event7: EV_KEY       BTN_GAMEPAD          UP                 
[   24416.424612] /dev/input/event7: EV_SYN       SYN_REPORT           00000001             rate 0
remove device 8: /dev/input/event7

正常F5:

add device 8: /dev/input/event7
  name:     "Odie (Test)"
[    9500.572331] /dev/input/event7: EV_LED       LED_NUML             00000001           
[    9500.572331] /dev/input/event7: EV_MSC       MSC_SCAN             00090001         //hid码   
[    9500.572331] /dev/input/event7: EV_KEY       BTN_GAMEPAD          DOWN               
[    9500.572331] /dev/input/event7: EV_ABS       ABS_X                0000007f           
[    9500.572331] /dev/input/event7: EV_ABS       ABS_Y                0000007f           
[    9500.572331] /dev/input/event7: EV_ABS       ABS_Z                0000007f           
[    9500.572331] /dev/input/event7: EV_ABS       ABS_RZ               0000007f           
[    9500.572331] /dev/input/event7: EV_SYN       SYN_REPORT           00000000 (上报) //相当于一个按键结束      
[    9500.582252] /dev/input/event7: EV_MSC       MSC_SCAN             00090001           
[    9500.582252] /dev/input/event7: EV_KEY       BTN_GAMEPAD          UP                 
[    9500.582252] /dev/input/event7: EV_SYN       SYN_REPORT           00000000              rate 100
remove device 8: /dev/input/event7

可见,up事件在case运行过程中并没有被正常接收到,直到case fail,测试页面InputCtsActivity pause才发出了up事件

  11-07 03:03:58.819 30370 30370 D LifecycleMonitor: Lifecycle status change: android.hardware.input.cts.InputCtsActivity@f112acb in: PAUSED
  11-07 03:03:58.820  4792 17838 D ActivityTrigger: activityResumeTrigger: The activity in ApplicationInfo{9d04809 com.android.launcher3} is now in focus and seems to be in full-screen mode
  11-07 03:03:58.820  4792 17838 E ActivityTrigger: activityResumeTrigger: not whiteListedcom.android.launcher3/com.android.launcher3.Launcher/28
  11-07 03:03:58.824 30370 30370 W ViewRootImpl[InputCtsActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_BUTTON_A, scanCode=304, metaState=0, flags=0xa8, repeatCount=0, eventTime=12083079, downTime=12082572, deviceId=14, source=0x501 }

(b)分析过程2:
上面说到up事件没有被正常接收到,首先看一下底层驱动是否有正常上传up事件,首先在input event从内核上传的必经之路input_event加一些log

http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/kernel/msm-4.14/drivers/input/input.c#510

void input_event(struct input_dev *dev,
         unsigned int type, unsigned int code, int value)
{
    unsigned long flags;
    printk("weijuncheng input_event, dev = %s\n",dev->name);
    printk("weijuncheng input_event, type = %x\n",type);
    printk("weijuncheng input_event, code = %x\n",code);
    printk("weijuncheng input_event, value = %x\n",value);
    printk("---------------------------------------------------->a3");
    dump_stack();
    printk("---------------------------------------------------->a4");
    if (is_event_supported(type, dev->evbit, EV_MAX)) {
         
        spin_lock_irqsave(&dev->event_lock, flags);
        printk("---------------------------------------------------->a5");
        printk("weijuncheng input_event, dev = %s\n",dev->name);
        printk("weijuncheng input_event, type = %x\n",type);
        printk("weijuncheng input_event, code = %x\n",code);
        printk("weijuncheng input_event, value = %x\n",value);
        input_handle_event(dev, type, code, value);
        printk("---------------------------------------------------->a6");
        spin_unlock_irqrestore(&dev->event_lock, flags);
#ifdef CONFIG_LAST_TOUCH_EVENTS
        touch_press_release_events_collect(dev, type, code, value);
#endif
    }
}
EXPORT_SYMBOL(input_event);

相关log:

11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a3
11-08 03:16:25.894  root     0     0 I         : CPU: 1 PID: 4803 Comm: HidDeviceHandle Tainted: G S              4.14.76-perf-g92d7574-dirty #3
11-08 03:16:25.894  root     0     0 I Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 CEPHEUS (DT)
11-08 03:16:25.894  root     0     0 I Call trace: 
11-08 03:16:25.894  root     0     0 I         : dump_backtrace+0x0/0x188
11-08 03:16:25.894  root     0     0 I         : show_stack+0x18/0x20
11-08 03:16:25.894  root     0     0 I         : dump_stack+0xc4/0x100
11-08 03:16:25.894  root     0     0 I         : input_event+0x78/0x268
11-08 03:16:25.894  root     0     0 I         : hidinput_hid_event+0x498/0x530  //加工为input系统可识别的event
11-08 03:16:25.894  root     0     0 I         : hid_process_event+0x12c/0x168
11-08 03:16:25.894  root     0     0 I         : hid_report_raw_event+0x540/0x5d8
11-08 03:16:25.894  root     0     0 I         : hid_input_report+0x178/0x1a8
11-08 03:16:25.894  root     0     0 I         : uhid_char_write+0x390/0x5b8
11-08 03:16:25.894  root     0     0 I         : __vfs_write+0x44/0x138   //从用户空间读取输入数据
11-08 03:16:25.894  root     0     0 I         : vfs_write+0xc8/0x188
11-08 03:16:25.894  root     0     0 I         : SyS_write+0x54/0xb8
11-08 03:16:25.894  root     0     0 I         : el0_svc_naked+0x34/0x38
11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a4
11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a5
11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, dev = Odie (Test)
11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, type = 4
11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, code = 4
11-08 03:16:25.894  root     0     0 I         : weijuncheng input_event, value = 90001
11-08 03:16:25.894  root     0     0 I         : ---------------------------------------------------->a6

最后发现,并没有up input event上传的迹象,而down事件是正常上传的,那么传递event事件流程中出问题的概率就很小了,那么首先怀疑up事件是否就根本没有从用户空间传下去,那么先看传输流程

case是通过hid command命令进行事件传输的,我们找到相关位置http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/cmds/hid/src/com/android/commands/hid/Hid.java#46

整理关键代码如下:

93    private void process(Event e) {
94        final int index = mDevices.indexOfKey(e.getId());
95        if (index >= 0) {
96            Device d = mDevices.valueAt(index);
97            if (Event.COMMAND_DELAY.equals(e.getCommand())) {
98                d.addDelay(e.getDuration());
99            } else if (Event.COMMAND_REPORT.equals(e.getCommand())) {
100                d.sendReport(e.getReport());
101            } else {
102                if (Event.COMMAND_REGISTER.equals(e.getCommand())) {
103                    error("Device id=" + e.getId() + " is already registered. Ignoring event.");
104                } else {
105                    error("Unknown command \"" + e.getCommand() + "\". Ignoring event.");
106                }
107            }
108        } else if (Event.COMMAND_REGISTER.equals(e.getCommand())) {
109            registerDevice(e);
110        } else {
111            Log.e(TAG, "Unknown device id specified. Ignoring event.");
112        }
113    }

frameworks/base/cmds/hid/src/com/android/commands/hid/Device.java

72    public void sendReport(byte[] report) {
73        Message msg = mHandler.obtainMessage(MSG_SEND_REPORT, report);
74        // if two messages are sent at identical time, they will be processed in order received
75        mHandler.sendMessageAtTime(msg, mTimeToSend);
76    }
77
78    public void addDelay(int delay) {
79        mTimeToSend = Math.max(SystemClock.uptimeMillis(), mTimeToSend) + delay;
80    }
 
 
100        @Override
101        public void handleMessage(Message msg) {
102            switch (msg.what) {
103                case MSG_OPEN_DEVICE:
104                    SomeArgs args = (SomeArgs) msg.obj;
105                    mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3,
106                            (byte[]) args.arg2, new DeviceCallback());
107                    pauseEvents();
108                    break;
109                case MSG_SEND_REPORT:
110                    if (mPtr != 0) {
111                        nativeSendReport(mPtr, (byte[]) msg.obj);
112                    } else {
113                        Log.e(TAG, "Tried to send report to closed device.");
114                    }
115                    break;
116                case MSG_CLOSE_DEVICE:
                   ...
128                default:
129                    throw new IllegalArgumentException("Unknown device message");
130            }
131        }

frameworks/base/cmds/hid/jni/com_android_commands_hid_Device.cpp

240static JNINativeMethod sMethods[] = {
241    { "nativeOpenDevice",
242            "(Ljava/lang/String;III[B"
243            "Lcom/android/commands/hid/Device$DeviceCallback;)J",
244            reinterpret_cast(openDevice) },
245    { "nativeSendReport", "(J[B)V", reinterpret_cast(sendReport) },
246    { "nativeCloseDevice", "(J)V", reinterpret_cast(closeDevice) },
247};
 
 
222static void sendReport(JNIEnv* env, jclass /* clazz */, jlong ptr, jbyteArray rawReport) {
223    size_t size;
224    std::unique_ptr report = getData(env, rawReport, size);
225    uhid::Device* d = reinterpret_cast(ptr);
226    if (d) {
227        d->sendReport(report.get(), size);
228    } else {
229        LOGE("Could not send report, Device* is null!");
230    }
231}
 
 
159void Device::sendReport(uint8_t* report, size_t reportSize) {
160    struct uhid_event ev;
161    memset(&ev, 0, sizeof(ev));
162    ev.type = UHID_INPUT2;
163    ev.u.input2.size = reportSize;
164    memcpy(&ev.u.input2.data, report, reportSize);
165    ssize_t ret = TEMP_FAILURE_RETRY(::write(mFd, &ev, sizeof(ev))); //在这里将相应event写到内核中
166    if (ret < 0 || ret != sizeof(ev)) {
167        LOGE("Failed to send hid event: %s", strerror(errno));
168    }
169}
通过加log,发现up事件的handleMessage没有如期处理,那么up事件就没有被向下传递,问题的分析又被转回到上层,下面就要分析handleMessage没有被处理的原因

(c)分析过程3:
首先如果要继续分析,需要对Handler有个大致的了解,简单来说就是一个线程向另一个持有looper的线程发送消息,持有looper的线程通过loop()循环选取Message进行处理

那么只有两种可能: 1.消息没入队 2.入队没处理

首先看消息没入队的情况,本例中是通过sendMessageAtTime来向队列中投递消息的,大体流程

frameworks/base/core/java/android/os/Handler.java

689    public boolean sendMessageAtTime(Message msg, long uptimeMillis) {
690        MessageQueue queue = mQueue;
691        if (queue == null) {
692            RuntimeException e = new RuntimeException(
693                    this + " sendMessageAtTime() called with no mQueue");
694            Log.w("Looper", e.getMessage(), e);
695            return false;
696        }
697        return enqueueMessage(queue, msg, uptimeMillis);
698    }
 
 
740    private boolean enqueueMessage(MessageQueue queue, Message msg, long uptimeMillis) {
741        msg.target = this;
742        if (mAsynchronous) {
743            msg.setAsynchronous(true);
744        }
745        return queue.enqueueMessage(msg, uptimeMillis);
746    }

frameworks/base/core/java/android/os/MessageQueue.java

    boolean enqueueMessage(Message msg, long when) {
537        if (msg.target == null) {
538            throw new IllegalArgumentException("Message must have a target.");
539        }
540        if (msg.isInUse()) {
541            throw new IllegalStateException(msg + " This message is already in use.");
542        }
543
544        synchronized (this) {
545            if (mQuitting) {
546                IllegalStateException e = new IllegalStateException(
547                        msg.target + " sending message to a Handler on a dead thread");
548                Log.w(TAG, e.getMessage(), e);
549                msg.recycle();
550                return false;
551            }
552
553            msg.markInUse();
554            msg.when = when;
555            Message p = mMessages;
556            boolean needWake;
557            if (p == null || when == 0 || when < p.when) {
558                // New head, wake up the event queue if blocked.
559                msg.next = p;
560                mMessages = msg;
561                needWake = mBlocked; //如果入队前队列为空,或者入队Message为对头,需要唤醒queue
562            } else {
563                // Inserted within the middle of the queue.  Usually we don't have to wake
564                // up the event queue unless there is a barrier at the head of the queue
565                // and the message is the earliest asynchronous message in the queue.
566                needWake = mBlocked && p.target == null && msg.isAsynchronous(); //如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message)
567                Message prev;
568                for (;;) {
569                    prev = p;
570                    p = p.next;
571                    if (p == null || when < p.when) {
572                        break;
573                    }
574                    if (needWake && p.isAsynchronous()) {
575                        needWake = false;
576                    }
577                }
578                msg.next = p; // invariant: p == prev.next
579                prev.next = msg; //插入到适当位置
580            }
581
582            // We can assume mPtr != 0 because mQuitting is false.
583            if (needWake) {
584                nativeWake(mPtr); //唤醒队列
585            }
586        }
587        return true;
588    }

我们注意到enqueueMessage是会返回是否入队成功的,那么问题就很简单了,在case里把是否入队成功打出来就行了,结果发现入队是成功的;那么就剩下第二种可能,消息虽然在队中,但没有被取出来执行,这里的流程在http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/frameworks/base/core/java/android/os/Looper.java#137:

137    public static void loop() {
138        final Looper me = myLooper();
          ...
142        final MessageQueue queue = me.mQueue;
143
           ...
158
159        for (;;) {
160            Message msg = queue.next(); // might block
               ...
192            try {
193                msg.target.dispatchMessage(msg);
194                dispatchEnd = needEndTime ? SystemClock.uptimeMillis() : 0;
195            } finally {
196                if (traceTag != 0) {
197                    Trace.traceEnd(traceTag);
198                }
199            }
               ...
233            msg.recycleUnchecked();
234        }
235    }

那么肯定是卡在Message msg = queue.next(); // might block ,这也解释了为什么测试报错之后会卡住不生成log的原因,直到触发了CTS框架的整体超时机制,那么下面看一下MessageQueue.next()的操作

Message next() {
311        // Return here if the message loop has already quit and been disposed.
312        // This can happen if the application tries to restart a looper after quit
313        // which is not supported.
314        final long ptr = mPtr;
315        if (ptr == 0) {
316            return null;
317        }
318
319        int pendingIdleHandlerCount = -1; // -1 only during first iteration
320        int nextPollTimeoutMillis = 0;
321        for (;;) {
322            if (nextPollTimeoutMillis != 0) {
323                Binder.flushPendingCommands();
324            }
325
326            nativePollOnce(ptr, nextPollTimeoutMillis); //nextPollTimeoutMillis 队列为空时一直等待,直到被唤醒(下一个Message入队)
327
328            synchronized (this) {
329                // Try to retrieve the next message.  Return if found.
330                final long now = SystemClock.uptimeMillis();
331                Message prevMsg = null;
332                Message msg = mMessages;
333                if (msg != null && msg.target == null) {
334                    // Stalled by a barrier.  Find the next asynchronous message in the queue.
335                    do {
336                        prevMsg = msg;
337                        msg = msg.next;
338                    } while (msg != null && !msg.isAsynchronous());
339                }
340                if (msg != null) {
341                    if (now < msg.when) {
342                        // Next message is not ready.  Set a timeout to wake up when it is ready.
343                        nextPollTimeoutMillis = (int) Math.min(msg.when - now, Integer.MAX_VALUE); //队列中有Message了,但最快需要执行的Message还没到时间
344                    } else {
345                        // Got a message.
346                        mBlocked = false;
347                        if (prevMsg != null) {
348                            prevMsg.next = msg.next;
349                        } else {
350                            mMessages = msg.next;
351                        }
352                        msg.next = null;
353                        if (DEBUG) Log.v(TAG, "Returning message: " + msg);
354                        msg.markInUse();
355                        return msg; //到了时间后,取出Message,并将queue的状态改为blocked = false;
356                    }
357                } else {
358                    // No more messages.
359                    nextPollTimeoutMillis = -1; //队列为空时一直阻塞,直到被唤醒
360                }
361
362                // Process the quit message now that all pending messages have been handled.
363                if (mQuitting) {
364                    dispose();
365                    return null;
366                }
367
368                // If first time idle, then get the number of idlers to run.
369                // Idle handles only run if the queue is empty or if the first message
370                // in the queue (possibly a barrier) is due to be handled in the future.
371                if (pendingIdleHandlerCount < 0
372                        && (mMessages == null || now < mMessages.when)) {
373                    pendingIdleHandlerCount = mIdleHandlers.size();
374                }
375                if (pendingIdleHandlerCount <= 0) {
376                    // No idle handlers to run.  Loop and wait some more.
377                    mBlocked = true; //除非顺利返回了Message,否则 nextPollTimeoutMillis >0或-1,分别代表首个Message没到时间或队列为空,此时队列阻塞,并进入下一个等待                             循环
378                    continue;
379                }
380
381                if (mPendingIdleHandlers == null) {
382                    mPendingIdleHandlers = new IdleHandler[Math.max(pendingIdleHandlerCount, 4)];
383                }
384                mPendingIdleHandlers = mIdleHandlers.toArray(mPendingIdleHandlers);
385            }
386
387            // Run the idle handlers.
388            // We only ever reach this code block during the first iteration.
389            for (int i = 0; i < pendingIdleHandlerCount; i++) {
390                final IdleHandler idler = mPendingIdleHandlers[i];
391                mPendingIdleHandlers[i] = null; // release the reference to the handler
392
393                boolean keep = false;
394                try {
395                    keep = idler.queueIdle();
396                } catch (Throwable t) {
397                    Log.wtf(TAG, "IdleHandler threw exception", t);
398                }
399
400                if (!keep) {
401                    synchronized (this) {
402                        mIdleHandlers.remove(idler);
403                    }
404                }
405            }
406
407            // Reset the idle handler count to 0 so we do not run them again.
408            pendingIdleHandlerCount = 0;
409
410            // While calling an idle handler, a new message could have been delivered
411            // so go back and look again for a pending message without waiting.
412            nextPollTimeoutMillis = 0;
413        }
414    }

因为case卡住了,那么很方便我们抓trace,先判断卡在哪里,发现果然卡在nativePollOnce

"main" prio=10 tid=1 Waiting
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x12c01ee8 self=0x7758e14c00
  | sysTid=11568 nice=-8 cgrp=default sched=0/0 handle=0x77dec83548
  | state=S schedstat=( 171782606 895523 56 ) utm=11 stm=5 core=7 HZ=100
  | stack=0x7fde185000-0x7fde187000 stackSize=8MB
  | held mutexes=
  at java.lang.Object.wait(Native method)
  - waiting on <0x0d54e4d2> (a java.lang.Object)
  at com.android.commands.hid.Device.close(Device.java:107)
  - locked <0x0d54e4d2> (a java.lang.Object)
  at com.android.commands.hid.Hid.run(Hid.java:90)
  at com.android.commands.hid.Hid.main(Hid.java:60)
  at com.android.internal.os.RuntimeInit.nativeFinishInit(Native method)
  at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:388)
 
 
 
 
"HidDeviceHandler" prio=10 tid=10 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x12c31a98 self=0x7758f8f000
  | sysTid=11583 nice=0 cgrp=default sched=0/0 handle=0x7751a484f0
  | state=S schedstat=( 8710574 6102134 7 ) utm=0 stm=0 core=4 HZ=100
  | stack=0x7751945000-0x7751947000 stackSize=1041KB
  | held mutexes=
  kernel: (couldn't read /proc/self/task/11583/stack)
  native: #00 pc 000000000007aacc  /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 0000000000014d48  /system/lib64/libutils.so (android::Looper::pollInner(int)+148)
  native: #02 pc 0000000000014c18  /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+60)
  native: #03 pc 00000000001271b8  /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:326)
  at android.os.Looper.loop(Looper.java:162)
  at android.os.HandlerThread.run(HandlerThread.java:65)

队列为空时timeout为-1,会一直等待,但是等到下一个Message入队时就应该被激活啊,而我们前面已经验证了up Message是入队的,因此这里就矛盾了,所以肯定还有什么我们没考虑的地方

先看下nativePollOnce的等待流程

frameworks/base/core/jni/android_os_MessageQueue.cpp

212static const JNINativeMethod gMessageQueueMethods[] = {
213    /* name, signature, funcPtr */
       ...
216    { "nativePollOnce", "(JI)V", (void*)android_os_MessageQueue_nativePollOnce },
217    { "nativeWake", "(J)V", (void*)android_os_MessageQueue_nativeWake },
       ...
221};
 
 
188static void android_os_MessageQueue_nativePollOnce(JNIEnv* env, jobject obj,
189        jlong ptr, jint timeoutMillis) {
190    NativeMessageQueue* nativeMessageQueue = reinterpret_cast(ptr);
191    nativeMessageQueue->pollOnce(env, obj, timeoutMillis);
192}
 
 
107void NativeMessageQueue::pollOnce(JNIEnv* env, jobject pollObj, int timeoutMillis) {
108    mPollEnv = env;
109    mPollObj = pollObj;
110    mLooper->pollOnce(timeoutMillis);
111    mPollObj = NULL;
112    mPollEnv = NULL;
    ...
119}

/system/core/libutils/Looper.cpp

178int Looper::pollOnce(int timeoutMillis, int* outFd, int* outEvents, void** outData) {
179    int result = 0;
180    for (;;) {
181        while (mResponseIndex < mResponses.size()) {
182            const Response& response = mResponses.itemAt(mResponseIndex++);
183            int ident = response.request.ident;
184            if (ident >= 0) {
185                int fd = response.request.fd;
186                int events = response.events;
187                void* data = response.request.data;
188#if DEBUG_POLL_AND_WAKE
189                ALOGD("%p ~ pollOnce - returning signalled identifier %d: "
190                        "fd=%d, events=0x%x, data=%p",
191                        this, ident, fd, events, data);
192#endif
193                if (outFd != NULL) *outFd = fd;
194                if (outEvents != NULL) *outEvents = events;
195                if (outData != NULL) *outData = data;
196                return ident;  //针对回调函数为空的情况,ident值必为一个大于等于0的值
197            }
198        }
199
200        if (result != 0) {
201#if DEBUG_POLL_AND_WAKE
202            ALOGD("%p ~ pollOnce - returning result %d", this, result);
203#endif
204            if (outFd != NULL) *outFd = 0;
205            if (outEvents != NULL) *outEvents = 0;
206            if (outData != NULL) *outData = NULL;
207            return result; //根据pollInner的结果进行处理,实际上是针对设置了回调函数的情况,因为设置了回调函数,所以已经对发生的事件做了处理了,所以,不需要将发生事件的相关信息再返回                给调用者了
208        }
209
210        result = pollInner(timeoutMillis);
211    }
212}
214int Looper::pollInner(int timeoutMillis) {
215#if DEBUG_POLL_AND_WAKE
216    ALOGD("%p ~ pollOnce - waiting: timeoutMillis=%d", this, timeoutMillis);
217#endif
218
219    // Adjust the timeout based on when the next message is due.
220    if (timeoutMillis != 0 && mNextMessageUptime != LLONG_MAX) {
221        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
222        int messageTimeoutMillis = toMillisecondTimeoutDelay(now, mNextMessageUptime);
223        if (messageTimeoutMillis >= 0
224                && (timeoutMillis < 0 || messageTimeoutMillis < timeoutMillis)) {
225            timeoutMillis = messageTimeoutMillis; //重新计算超时时间
226        }
227#if DEBUG_POLL_AND_WAKE
228        ALOGD("%p ~ pollOnce - next message in %" PRId64 "ns, adjusted timeout: timeoutMillis=%d",
229                this, mNextMessageUptime - now, timeoutMillis);
230#endif
231    }
232
233    // Poll.
234    int result = POLL_WAKE;
235    mResponses.clear();
236    mResponseIndex = 0;
237
238    // We are about to idle.
239    mPolling = true;
240
241    struct epoll_event eventItems[EPOLL_MAX_EVENTS];
242    int eventCount = epoll_wait(mEpollFd, eventItems, EPOLL_MAX_EVENTS, timeoutMillis);
243
244    // No longer idling.
245    mPolling = false;
246
247    // Acquire lock.
248    mLock.lock();
249
250    // Rebuild epoll set if needed.
251    if (mEpollRebuildRequired) {
252        mEpollRebuildRequired = false;
253        rebuildEpollLocked();
254        goto Done;
255    }
256
257    // Check for poll error.
258    if (eventCount < 0) { //返回值小于零,表示发生错误
259        if (errno == EINTR) {
260            goto Done;
261        }
262        ALOGW("Poll failed with an unexpected error: %s", strerror(errno));
263        result = POLL_ERROR;
264        goto Done;
265    }
266
267    // Check for poll timeout.
268    if (eventCount == 0) { //eventCount为零,表示发生超时,因此直接跳转到Done
269#if DEBUG_POLL_AND_WAKE
270        ALOGD("%p ~ pollOnce - timeout", this);
271#endif
272        result = POLL_TIMEOUT;
273        goto Done;
274    }
275
276    // Handle all events.
277#if DEBUG_POLL_AND_WAKE
278    ALOGD("%p ~ pollOnce - handling events from %d fds", this, eventCount);
279#endif
280
281    for (int i = 0; i < eventCount; i++) {
282        int fd = eventItems[i].data.fd;
283        uint32_t epollEvents = eventItems[i].events;
284        if (fd == mWakeEventFd) {
285            if (epollEvents & EPOLLIN) {
                  //通过wake唤醒队列
286                awoken();
287            } else {
288                ALOGW("Ignoring unexpected epoll events 0x%x on wake event fd.", epollEvents);
289            }
290        } else {
291            ssize_t requestIndex = mRequests.indexOfKey(fd);
292            if (requestIndex >= 0) {
293                int events = 0;
294                if (epollEvents & EPOLLIN) events |= EVENT_INPUT;
295                if (epollEvents & EPOLLOUT) events |= EVENT_OUTPUT;
296                if (epollEvents & EPOLLERR) events |= EVENT_ERROR;
297                if (epollEvents & EPOLLHUP) events |= EVENT_HANGUP;
298                pushResponse(events, mRequests.valueAt(requestIndex)); //如果是其他FD发生事件,则根据Request构造Response,并push到Response数组
                              //当某个监控fd上发生事件后,就会把对应的Request取出来调用。
299            } else {
300                ALOGW("Ignoring unexpected epoll events 0x%x on fd %d that is "
301                        "no longer registered.", epollEvents, fd);
302            }
303        }
304    }
305Done: ;
306
307    // Invoke pending message callbacks.
308    mNextMessageUptime = LLONG_MAX;
309    while (mMessageEnvelopes.size() != 0) {
310        nsecs_t now = systemTime(SYSTEM_TIME_MONOTONIC);
311        const MessageEnvelope& messageEnvelope = mMessageEnvelopes.itemAt(0);
312        if (messageEnvelope.uptime <= now) {
313            // Remove the envelope from the list.
314            // We keep a strong reference to the handler until the call to handleMessage
315            // finishes.  Then we drop it so that the handler can be deleted *before*
316            // we reacquire our lock.
317            { // obtain handler
318                sp handler = messageEnvelope.handler;
319                Message message = messageEnvelope.message;
320                mMessageEnvelopes.removeAt(0);
321                mSendingMessage = true;
322                mLock.unlock();
323
324#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
325                ALOGD("%p ~ pollOnce - sending message: handler=%p, what=%d",
326                        this, handler.get(), message.what);
327#endif
328                handler->handleMessage(message);
329            } // release handler
330
331            mLock.lock();
332            mSendingMessage = false;
333            result = POLL_CALLBACK;
334        } else {
335            // The last message left at the head of the queue determines the next wakeup time.
336            mNextMessageUptime = messageEnvelope.uptime;
337            break;
338        }
339    }
340
341    // Release lock.
342    mLock.unlock();
343
344    // Invoke all response callbacks.
345    for (size_t i = 0; i < mResponses.size(); i++) {
346        Response& response = mResponses.editItemAt(i);
347        if (response.request.ident == POLL_CALLBACK) {
348            int fd = response.request.fd;
349            int events = response.events;
350            void* data = response.request.data;
351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
352            ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
353                    this, response.request.callback.get(), fd, events, data);
354#endif
355            // Invoke the callback.  Note that the file descriptor may be closed by
356            // the callback (and potentially even reused) before the function returns so
357            // we need to be a little careful when removing the file descriptor afterwards.
358            int callbackResult = response.request.callback->handleEvent(fd, events, data);//回调其他fd定义的callback
359            if (callbackResult == 0) {
360                removeFd(fd, response.request.seq);
361            }
362
363            // Clear the callback reference in the response structure promptly because we
364            // will not clear the response vector itself until the next poll
365            response.request.callback.clear();
366            result = POLL_CALLBACK;
367        }
368    }
369    return result;
370}

Looper初始化时会初始化一个fd,**mWakeEventFd = eventfd(0, EFD_NONBLOCK | EFD_CLOEXEC); **

前面说道,当一个队列中没有Message时,会陷入无限等待,当新消息入队时,我们将MessageQueue重新唤醒:

enqueueMessage中

583            if (needWake) {
584                nativeWake(mPtr);
585            }
194static void android_os_MessageQueue_nativeWake(JNIEnv* env, jclass clazz, jlong ptr) {
195    NativeMessageQueue* nativeMessageQueue = reinterpret_cast(ptr);
196    nativeMessageQueue->wake();
197}
 
 
121void NativeMessageQueue::wake() {
122    mLooper->wake();
123}
398void Looper::wake() {
399#if DEBUG_POLL_AND_WAKE
400    ALOGD("%p ~ wake", this);
401#endif
402
403    uint64_t inc = 1;
404    ssize_t nWrite = TEMP_FAILURE_RETRY(write(mWakeEventFd, &inc, sizeof(uint64_t)));
405    if (nWrite != sizeof(uint64_t)) {
406        if (errno != EAGAIN) {
407            LOG_ALWAYS_FATAL("Could not write wake signal to fd %d: %s",
408                    mWakeEventFd, strerror(errno));
409        }
410    }
411}

通过epoll机制自动唤醒mWakeEventFd,那么没有被唤醒说明needWake为false,如果MessageQueue为空,这个值只能根据当前是否block来决定,那么必然会wake,只能怀疑Messagequeue不为空

在根据这里,强烈怀疑设置了一个屏障Message导致queue没有被wakeup

needWake = mBlocked && p.target == null && msg.isAsynchronous(); //如果当前队列中已经有其他消息,只有特殊情况下才需要wake queue (异步message)

再看case:

100        @Override
101        public void handleMessage(Message msg) {
102            switch (msg.what) {
103                case MSG_OPEN_DEVICE:
104                    SomeArgs args = (SomeArgs) msg.obj;
105                    mPtr = nativeOpenDevice((String) args.arg1, args.argi1, args.argi2, args.argi3,
106                            (byte[]) args.arg2, new DeviceCallback());
107                    pauseEvents();
 
 
143    private class DeviceCallback {
144        public void onDeviceOpen() {
145            mHandler.resumeEvents();
146        }

处理Message (MSG_OPEN_DEVICE)设置了屏障,在DeviceCallback中的onDeviceOpen被消除,如果这个没有回调,就会发生这个up消息入队却无法执行的现象,case中加log发现果然没有被回调。那么问题又回到底层,需要找出Callback没有被回调的原因。

(d)分析过程4:
首先看这个callback被调用的位置

171int Device::handleEvents(int events) {
172    if (events & (ALOOPER_EVENT_ERROR | ALOOPER_EVENT_HANGUP)) {
173        LOGE("uhid node was closed or an error occurred. events=0x%x", events);
174        mDeviceCallback->onDeviceError();
175        return 0;
176    }
177    struct uhid_event ev;
178    ssize_t ret = TEMP_FAILURE_RETRY(::read(mFd, &ev, sizeof(ev)));
179    if (ret < 0) {
180        LOGE("Failed to read from uhid node: %s", strerror(errno));
181        mDeviceCallback->onDeviceError();
182        return 0;
183    }
184
185    if (ev.type == UHID_OPEN) {
186        mDeviceCallback->onDeviceOpen();
187    }
188
189    return 1;
190}
52static int handleLooperEvents(int /* fd */, int events, void* data) {
53    Device* d = reinterpret_cast(data);
54    return d->handleEvents(events);
55}
133Device::Device(int32_t id, int fd, std::unique_ptr callback) :
134            mId(id), mFd(fd), mDeviceCallback(std::move(callback)) {
135    ALooper* aLooper = ALooper_forThread();
136    if (aLooper == NULL) {
137        LOGE("Could not get ALooper, ALooper_forThread returned NULL");
138        aLooper = ALooper_prepare(ALOOPER_PREPARE_ALLOW_NON_CALLBACKS);
139    }
140    ALooper_addFd(aLooper, fd, 0, ALOOPER_EVENT_INPUT, handleLooperEvents,
141                  reinterpret_cast(this));
142}

在这里add_Fd,然后在pollInner中进行处理

345    for (size_t i = 0; i < mResponses.size(); i++) {
346        Response& response = mResponses.editItemAt(i);
347        if (response.request.ident == POLL_CALLBACK) {
348            int fd = response.request.fd;
349            int events = response.events;
350            void* data = response.request.data;
351#if DEBUG_POLL_AND_WAKE || DEBUG_CALLBACKS
352            ALOGD("%p ~ pollOnce - invoking fd event callback %p: fd=%d, events=0x%x, data=%p",
353                    this, response.request.callback.get(), fd, events, data);
354#endif
355            // Invoke the callback.  Note that the file descriptor may be closed by
356            // the callback (and potentially even reused) before the function returns so
357            // we need to be a little careful when removing the file descriptor afterwards.
               //在这里进行callback的回调
358            int callbackResult = response.request.callback->handleEvent(fd, events, data);
359            if (callbackResult == 0) {
360                removeFd(fd, response.request.seq);
361            }
362
363            // Clear the callback reference in the response structure promptly because we
364            // will not clear the response vector itself until the next poll.
365            response.request.callback.clear();
366            result = POLL_CALLBACK;
367        }
368    }
369    return result;
370}

可见onDeviceOpen这个callback的回调与这个event UHID_OPEN息息相关,可见是没有从内核上传UHID_OPEN上来,我们看一下uhid.txt中的说明:

47The first thing you should do is sending an UHID_CREATE2 event. This will
48register the device. UHID will respond with an UHID_START event. You can now
49start sending data to and reading data from UHID. However, unless UHID sends the
50UHID_OPEN event, the internally attached HID Device Driver has no user attached.
51That is, you might put your device asleep unless you receive the UHID_OPEN
52event. If you receive the UHID_OPEN event, you should start I/O. If the last
53user closes the HID device, you will receive an UHID_CLOSE event. This may be
54followed by an UHID_OPEN event again and so on. There is no need to perform
55reference-counting in user-space. That is, you will never receive multiple
56UHID_OPEN events without an UHID_CLOSE event. The HID subsystem performs
57ref-counting for you.
58You may decide to ignore UHID_OPEN/UHID_CLOSE, though. I/O is allowed even
59though the device may have no users.

The first thing you should do is sending an UHID_CREATE2 event. This will register the device. UHID will respond with an UHID_START event.

从用户态发送UHID_CREATE2到内核态(UHID),然后内核态(UHID)返回UHID_START和UHID_OPEN到用户态

然后通过添加log,大致理清楚了流程

首先在http://guard.pt.miui.com/opengrok/xref/v10-p-dipper-dev/kernel/msm-4.9/drivers/hid/uhid.c#642中初始化工作队列:

623static int uhid_char_open(struct inode *inode, struct file *file)
624{ 625 struct uhid_device *uhid;
626
627 uhid = kzalloc(sizeof(*uhid), GFP_KERNEL);
628 if (!uhid)
629 return -ENOMEM;
630
631 mutex_init(&uhid->devlock);
632 mutex_init(&uhid->report_lock);
633 spin_lock_init(&uhid->qlock);
634 init_waitqueue_head(&uhid->waitq);
635 init_waitqueue_head(&uhid->report_wait); 636 uhid->running = false;
637 INIT_WORK(&uhid->worker, uhid_device_add_worker);
638
639 file->private_data = uhid;
640 nonseekable_open(inode, file);
641
642 return 0;
643}
 
 
11-15 01:51:18.455 root 0 0 I : uhid_char_open+0xb4/0x110
11-15 01:51:18.455 root 0 0 I : misc_open+0x12c/0x168
11-15 01:51:18.455 root 0 0 I : chrdev_open+0x12c/0x200
11-15 01:51:18.455 root 0 0 I : do_dentry_open+0x12c/0x2e8
11-15 01:51:18.455 root 0 0 I : vfs_open+0x54/0x88
11-15 01:51:18.455 root 0 0 I : path_openat+0x91c/0x14d0
11-15 01:51:18.455 root 0 0 I : do_filp_open+0x74/0x120
11-15 01:51:18.455 root 0 0 I : do_sys_open+0x134/0x248
11-15 01:51:18.455 root 0 0 I : SyS_openat+0x10/0x18
11-15 01:51:18.455 root 0 0 I : el0_svc_naked+0x34/0x38

然后从用户态向内核态发送event UHID_CREATE2

92Device* Device::open(int32_t id, const char* name, int32_t vid, int32_t pid,
93        std::unique_ptr descriptor, size_t descriptorSize,
94        std::unique_ptr callback) {
95
96    int fd = ::open(UHID_PATH, O_RDWR | O_CLOEXEC);
97    if (fd < 0) {
98        LOGE("Failed to open uhid: %s", strerror(errno));
99        return nullptr;
100    }
101
102    struct uhid_event ev;
103    memset(&ev, 0, sizeof(ev));
104    ev.type = UHID_CREATE2;
105    strncpy((char*)ev.u.create2.name, name, UHID_MAX_NAME_LENGTH);
106    memcpy(&ev.u.create2.rd_data, descriptor.get(),
107            descriptorSize * sizeof(ev.u.create2.rd_data[0]));
108    ev.u.create2.rd_size = descriptorSize;
109    ev.u.create2.bus = BUS_BLUETOOTH;
110    ev.u.create2.vendor = vid;
111    ev.u.create2.product = pid;
112    ev.u.create2.version = 0;
113    ev.u.create2.country = 0;
114
115    errno = 0;
116    ssize_t ret = TEMP_FAILURE_RETRY(::write(fd, &ev, sizeof(ev))); //向内核态写
117    if (ret < 0 || ret != sizeof(ev)) {
118        ::close(fd);
119        LOGE("Failed to create uhid node: %s", strerror(errno));
120        return nullptr;
121    }
122
123    // Wait for the device to actually be created.
124    ret = TEMP_FAILURE_RETRY(::read(fd, &ev, sizeof(ev))); //等待内核态的返回结果
125    if (ret < 0 || ev.type != UHID_START) {
126        ::close(fd);
127        LOGE("uhid node failed to start: %s", strerror(errno));
128        return nullptr;
129    }
130    return new Device(id, fd, std::move(callback));
131}

然后内核态监听到这个event后,开始uhid_dev_create2创建操作

734 case UHID_CREATE2:
735     ret = uhid_dev_create2(uhid, &uhid->input_buf);
736     break;
478static int uhid_dev_create2(struct uhid_device *uhid,
479             const struct uhid_event *ev)
480{
481 struct hid_device *hid;
482 size_t rd_size, len;
483 void *rd_data;
484 int ret;
485
486 if (uhid->running)
487     return -EALREADY;
488
489 rd_size = ev->u.create2.rd_size;
490 if (rd_size <= 0 || rd_size > HID_MAX_DESCRIPTOR_SIZE)
491     return -EINVAL;
492
493 rd_data = kmemdup(ev->u.create2.rd_data, rd_size, GFP_KERNEL);
494 if (!rd_data)
495     return -ENOMEM;
496
497 uhid->rd_size = rd_size;
498 uhid->rd_data = rd_data;
499
500 hid = hid_allocate_device();
501 if (IS_ERR(hid)) {
502     ret = PTR_ERR(hid);
503     goto err_free;
504 }
505
506 len = min(sizeof(hid->name), sizeof(ev->u.create2.name)) - 1;
507 strncpy(hid->name, ev->u.create2.name, len);
508 len = min(sizeof(hid->phys), sizeof(ev->u.create2.phys)) - 1;
509 strncpy(hid->phys, ev->u.create2.phys, len);
510 len = min(sizeof(hid->uniq), sizeof(ev->u.create2.uniq)) - 1;
511 strncpy(hid->uniq, ev->u.create2.uniq, len);
512
513 hid->ll_driver = &uhid_hid_driver;
514 hid->bus = ev->u.create2.bus;
515 hid->vendor = ev->u.create2.vendor;
516 hid->product = ev->u.create2.product;
517 hid->version = ev->u.create2.version;
518 hid->country = ev->u.create2.country;
519 hid->driver_data = uhid;
520 hid->dev.parent = uhid_misc.this_device;
521
522 uhid->hid = hid;
523 uhid->running = true;
524
525 /* Adding of a HID device is done through a worker, to allow HID drivers
526  * which use feature requests during .probe to work, without they would
527  * be blocked on devlock, which is held by uhid_char_write.
528  */
529 schedule_work(&uhid->worker);
530
531 return 0;
532
533err_free:
534 kfree(uhid->rd_data);
535 uhid->rd_data = NULL;
536 uhid->rd_size = 0;
537 return ret;
538}

注意这里schedule_work(&uhid->worker); 将工作结构变量添加到系统共享队列,等待执行,其工作处理函数为uhid_device_add_worker,在其中发送UHID_START和UHID_OPEN,大体流程如下:

<4>[  643.029505] weijuncheng ---------------------------------------------------->uhid_queue UHID_START start
<4>[  643.029510] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G        W  O    4.9.112-perf-g3dd9240-dirty #2
<4>[  643.029511] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT)
<4>[  643.029519] Workqueue: events uhid_device_add_worker
<4>[  643.029522] Call trace:
<4>[  643.029527] [] dump_backtrace+0x0/0x2ac
<4>[  643.029529] [] show_stack+0x14/0x1c
<4>[  643.029533] [] dump_stack+0x94/0xb4
<4>[  643.029534] [] uhid_hid_start+0xb8/0xe4
<4>[  643.029536] [] hid_device_probe+0xfc/0x140
<4>[  643.029539] [] driver_probe_device+0x24c/0x430
<4>[  643.029541] [] __device_attach_driver+0xd4/0x148
<4>[  643.029542] [] bus_for_each_drv+0x60/0xb0
<4>[  643.029543] [] __device_attach+0x100/0x17c
<4>[  643.029544] [] device_initial_probe+0x10/0x18
<4>[  643.029545] [] bus_probe_device+0x90/0x98
<4>[  643.029547] [] device_add+0x420/0x58c
<4>[  643.029548] [] hid_add_device+0xe0/0x2a0
<4>[  643.029550] [] uhid_device_add_worker+0x1c/0x58
<4>[  643.029552] [] process_one_work+0x168/0x45c
<4>[  643.029553] [] worker_thread+0x54/0x46c
<4>[  643.029555] [] kthread+0xe0/0xf4
<4>[  643.029556] [] ret_from_fork+0x10/0x30
<4>[  643.029558] weijuncheng ---------------------------------------------------->uhid_queue UHID_START end
<4>[  643.029762] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN start
<4>[  643.029764] CPU: 4 PID: 7339 Comm: kworker/4:0 Tainted: G        W  O    4.9.112-perf-g3dd9240-dirty #2
<4>[  643.029765] Hardware name: Xiaomi Technologies, Inc. Perseus P3 v2.1 (DT)
<4>[  643.029767] Workqueue: events uhid_device_add_worker
<4>[  643.029768] Call trace:
<4>[  643.029770] [] dump_backtrace+0x0/0x2ac
<4>[  643.029772] [] show_stack+0x14/0x1c
<4>[  643.029774] [] dump_stack+0x94/0xb4
<4>[  643.029775] [] uhid_hid_open+0x6c/0x9c
<4>[  643.029776] [] hidinput_open+0x1c/0x24
<4>[  643.029779] [] input_open_device+0x80/0xac
<4>[  643.029781] [] input_leds_connect+0xd0/0x2ac
<4>[  643.029782] [] input_attach_handler+0x19c/0x1fc
<4>[  643.029783] [] input_register_device+0x36c/0x4b4
<4>[  643.029784] [] hidinput_connect+0x43c/0x3d18
<4>[  643.029785] [] hid_connect+0x1f4/0x334
<4>[  643.029786] [] hid_device_probe+0x110/0x140
<4>[  643.029787] [] driver_probe_device+0x24c/0x430
<4>[  643.029788] [] __device_attach_driver+0xd4/0x148
<4>[  643.029789] [] bus_for_each_drv+0x60/0xb0
<4>[  643.029791] [] __device_attach+0x100/0x17c
<4>[  643.029792] [] device_initial_probe+0x10/0x18
<4>[  643.029793] [] bus_probe_device+0x90/0x98
<4>[  643.029794] [] device_add+0x420/0x58c
<4>[  643.029795] [] hid_add_device+0xe0/0x2a0
<4>[  643.029796] [] uhid_device_add_worker+0x1c/0x58
<4>[  643.029798] [] process_one_work+0x168/0x45c
<4>[  643.029799] [] worker_thread+0x54/0x46c
<4>[  643.029800] [] kthread+0xe0/0xf4
<4>[  643.029801] [] ret_from_fork+0x10/0x30
<4>[  643.029802] weijuncheng ---------------------------------------------------->uhid_queue UHID_OPEN end

然后排查这一段逻辑:

 <4>[  643.029775] [] uhid_hid_open+0x6c/0x9c
 <4>[  643.029776] [] hidinput_open+0x1c/0x24
 <4>[  643.029779] [] input_open_device+0x80/0xac
 <4>[  643.029781] [] input_leds_connect+0xd0/0x2ac
 <4>[  643.029782] [] input_attach_handler+0x19c/0x1fc
 <4>[  643.029783] [] input_register_device+0x36c/0x4b4

最终发现错误在http://guard.pt.miui.com/opengrok/xref/v10-p-cepheus-dev/kernel/msm-4.14/drivers/hid/hid-core.c?r=aaac082d#1804:

1804int hid_hw_open(struct hid_device *hdev)
1805{
1806    int ret;
1807
1808    ret = mutex_lock_killable(&hdev->ll_open_lock);
1809    if (ret)
1810        return ret;
1811
1812    if (!hdev->ll_open_count++) {
1813        ret = hdev->ll_driver->open(hdev);
1814        if (ret)
1815            hdev->ll_open_count--;
1816    }
1817
1818    mutex_unlock(&hdev->ll_open_lock);
1819    return ret;
1820}
1821EXPORT_SYMBOL_GPL(hid_hw_open);

注意

    1812    if (!hdev->ll_open_count++) {
    1813        ret = hdev->ll_driver->open(hdev);

而调用的driver->open里面

142static int uhid_hid_open(struct hid_device *hid)
143{
144 struct uhid_device *uhid = hid->driver_data;
145 int retval = 0;
146
147 if (!hid->ll_open_count++) {
148     retval = uhid_queue_event(uhid, UHID_OPEN);
149     if (retval)
150         hid->ll_open_count--;
151 }
152 return retval;
153}
    147 if (!hid->ll_open_count++) {
    148     retval = uhid_queue_event(uhid, UHID_OPEN);

hid->ll_open_count这个值加了两次,导致无法进入retval = uhid_queue_event(uhid, UHID_OPEN),UHID_OPEN这个event无法向上传递,最终没有调用到定义的callback,导致messagequeue被阻塞,up消息无法发送,case fail,本地修改hid_hw_open为:

int hid_hw_open(struct hid_device *hdev)
{
    /*
    int ret;
 
    ret = mutex_lock_killable(&hdev->ll_open_lock);
    if (ret)
        return ret;
 
    if (!hdev->ll_open_count++) {
        ret = hdev->ll_driver->open(hdev);
        if (ret)
            hdev->ll_open_count--;
    }
 
    mutex_unlock(&hdev->ll_open_lock);
    return ret;
    */
    return hdev->ll_driver->open(hdev);
}

再编bootimage刷一下,就可以通过测试了

问题总结

这个问题是kernel更新之后里面的hid driver实现逻辑有问题造成的,中间回顾了Handler实现逻辑,用到了linux的epoll,workqueue机制(只有基本了解),有待后续进一步详细了解。

你可能感兴趣的:(CTS问题分析9)