Inferring Motion Direction using Commodity Wi-Fi for Interactive Exergames (WiDrance)
利用wifi推断运动方向的互动健身游戏(WiDance)
#摘要
In-air interaction acts as a key enabler for ambient intelligence and augmented reality. As an increasing popular example, exergames, and the alike gesture recognition applications, have attracted extensive research in designing accurate, pervasive and low-cost user interfaces. Recent advances in wireless sensing show promise for a ubiquitous gesture-based interaction interface with Wi-Fi. In this work, we extract complete information of motion-induced Doppler shifts with only commodity Wi-Fi. The key insight is to harness antenna diversity to carefully eliminate random phase shifts while retaining relevant Doppler shifts. We further correlate Doppler shifts with motion directions, and propose a light-weight pipeline to detect,segment, and recognize motions without training. On this basis, we present WiDance, a Wi-Fi-based user interface, which we utilize to design and prototype a contactless dance-pad exergame. Experimental results in typical indoor environment demonstrate a superior performance with an accuracy of 92%,remarkably outperforming prior approaches.
空中交互是环境智能和增强现实的关键因素。作为一个越来越受欢迎的例子,游戏和类似的手势识别应用程序在设计精确、普及、低成本的用户界面方面吸引了广泛的研究。无线传感技术的最新进展为Wi-Fi提供了无处不在的基于手势的交互界面。在这项工作中,我们提取了完整的运动引起的Wi-Fi多普勒频移信息。关键是利用天线分集来xi小心的消除随机相移,同时保持相关的多普勒频移。我们进一步将多普勒频移与运动方向相关联,并提出了一种轻量级流水线来检测,分割,并识别未经训练的动作。在这个基础上上,我们提出了WiDance,一个基于Wi-Fi的用户界面,我们利用它来设计和制作非接触式舞蹈运动游戏。在典型室内环境下的实验结果表明,该算法精度达到92%,明显优于以前的方法。
Exergames, where players are compelled to get up and exercise (e.g., dance, kick-boxing, sports moves), bring more than just fun [8, 15]. Researchers find that exergames can improve the fitness, health and social involvement of players [20, 27]. Due to their health benefits, various exergame interfaces have been developed in both the industries (e.g. Kinect Sports and Wii Fit) and academia [7, 24]. Most interfaces forsonic technology. Despite their high accuracy in tracking motions of players, they suffer from limitations such as sensitivity to lighting condition and line-of-sight condition, requirement of device attachment and high-cost installation and instrumentation. We argue that a more ubiquitous exergame interface with fewer environment constraints is essential to fit in the fragmented free time and space in modern life. For instance, a white-collar worker may play a 5-min exergame in the office to refresh. A housekeeper may take a quick workout during the waiting time when preparing dishes in the kitchen. The need for a low-cost, non-invasive, and ubiquitous user interface has triggered extensive research on in-air human sensing, especially using the almost-everywhere Wi-Fi infrastructure [4, 18, 31, 6, 28]. The main idea is to model and extract motion induced variations on Wi-Fi signals to infer human activities. In principle, it is possible to obtain all parameters of incident signals, including amplitudes, phases, frequency shifts, and relate these parameters with human actions. Pioneer works [4, 22, 12] extract accurate signal parameters to derive motion-induced Doppler shift and time-of-flight, which are used to estimate the speed and distance of motions. However, they require specialized hardware because commodity Wi-Fi devices suffer from random phase shifts caused by lack of synchronization, limited frequency bandwidth and multipath effect. Other works [31, 6, 28, 30] apply machine learning to coarse-grained signal parameters available on commodity Wi-Fi devices to infer user activities. Yet the training efforts involved and the less interpretable features extracted make them unfavorable as a robust and light-weight user gesture interface.
在游戏中,玩家必须站起来进行锻炼(如舞蹈、拳击、体育运动),不仅仅是一种乐趣。研究人员发现,运动可以提高学生的体能、健康和社会参与度。由于其对健康的益处,在这两个行业中已经开发了各种运动。大多数的游戏接口是基于计算机视觉、传感器或声波技术。尽管他们在跟踪玩家运动方面有很高的准确性,但是他们也受到诸如灵敏度、照明条件和视线条件、要求设备附件和高成本安装以及仪表灯等限制.我们认为一个更普遍的游戏界面与更少的环境约束是必要的,以适应现代生活中碎片化的自由时间。例如,一个白领可能会在办公室玩5分钟的游戏。管家可能会在在厨房准备菜肴的等待时间里快速锻炼。对低成本、无侵入性、无处不在的用户界面的需求引发了对空中人体感知的广泛研究,特别是使用几乎随处可见的Wi-Fi 设施。主要思想是建模和提取运动引起的无线信号变化来推断人类活动。原则上,可以获得入射信号的所有参数,包括振幅、相位、频移,并将这些参数与人类行为相关联。提取精确的信号参数以导出运动引起的多普勒频移和飞行时间用于估计运动的速度和距离。然而,他们需要专门的硬件,因为普通WiFi设备会产生随机相移,这是由于缺乏同步、有限的频率带宽和多径效应。将机器学习应用到无线设备所提取的粗粒度信号参数上来推断用户的活动。然而训练所涉及的和提取的特征不适合作为用户界面。
This paper seeks to advance the state-of-the-art in wireless interaction interfaces by accurately deriving motion-induced Doppler shifts using Channel State Information (CSI) available on unmodified Wi-Fi devices, and further extracting motion directions for exergame designs. As a proof-of-concept, we present WiDance, a dance-pad like exergame with commercial Wi-Fi devices. As shown in Figure 1, it tracks the leg moving directions of players by monitoring the minute Doppler shifts in the received CSI of Wi-Fi signals, and recognizes the estimated directions as the ones shown on a screen. Technically, WiDance addresses two critical challenges. (1) How to obtain full information of Doppler shifts from off-the-shelf imperfect Wi-Fi devices? While some previous works [30] have extracted Doppler-related features from commodity WiFi devices, they only extract the absolute values of Doppler shifts without arithmetic signs, and thus fail to identify the direction of motions. Instead, WiDance extracts accurate and comprehensive Doppler shifts with direction information from CSI by leveraging multiple antennas on commodity Wi-Fi devices. The key insight is that while antennas at the same receiver experience different channel distortions due to spatial diversity, they suffer from the same noise sources. Therefore, we propose a series of signal processing steps to properly manipulate signals from multiple antennas, making it possible to eliminate random noises while retaining Doppler shifts of interests. (2) How to detect, segment and recognize complex player actions from Doppler shifts series? To robustly recognize player motions without training, we first verify that single link is insufficient for tracking player actions, and solve inherent ambiguities with minimum cost by adopting one more link.
An effective light-weight model is proposed to relate Doppler shifts observed from joint two wireless links with player actions, and a series of data processing steps are developed to achieve robust detection, segmentation and finally recognition of player actions.
本文旨在通过处理Wi-Fi设备上未修改的信道状态信息(CSI)以获得运动导致的多普勒频移,并进一步提取运动方向,从而推动无线交互发展。作为概念证明,我们推出WiDance,一个带有商用WiFi的类似跳舞垫的游戏。如图1所示,它通过监测接收到的WiFi信号中的CSI中的多普勒偏移来跟踪运动员的腿部移动方向,并在屏幕上显示识别的估计方向。在技术上,WiDance解决了两个关键挑战。
(1)如何从不完善的WiFi设备中获得多普勒频移的全部信息?
以前的一些作品[30]已经从商品Wi-Fi设备中提取了多普勒相关特征,它们仅提取多普勒的绝对值没有算术符号的移位,因此无法识别运动方向。取而代之的是,WiDance利用商业WiFi上的多根天线的CSI方向信息提取精确、全面的多普勒频移。关键是,同一接收器上的天线由于空间、时间和频率差异而经历不同的信道畸变,受到同样的噪声源的影响。因此,我们提出了一系列信号处理步骤来正确处理来自多个天线的信号,使它有可能消除随机噪声同时保持多普勒频移。
(2)如何从多普勒频移序列中检测、分割和识别复杂的层作用?
为了在没有训练的情况下识别运动者的动作,我们首先验证单链不足以跟踪玩家行为,通过增加一个链接,以最小的成本解决问题。提出了一种有效的轻量模型,将两个无线天线观测到的多普勒偏移与玩家运动联系起来,并开发了一系列数据处理步骤,以实现对玩家动作的鲁棒检测、分割和最终识别。
We prototype WiDance with commodity Wi-Fi infrastructure and evaluate its performance in various indoor environments.Experimental results show that WiDance yields accuracies for recognizing player actions of 92%. Compared with the state-of-the-art, the Doppler shifts features obtained by WiDance can differentiate all eight actions required by dancing games, while those in existing approaches [30, 6] can only classify actions into three coarse categories. The motion recognition accuracy of WiDance is comparable to popular classifiers such as HMM, even without training. We envision WiDance as a promising step towards practical wireless human-computer interaction interface, which underpins new insights for future wireless sensing applications.
In summary, the main contributions are as follows:
• We design a novel algorithm to extract complete information of motion-induced Doppler shifts (both absolute values and signs) leveraging antenna diversity on commodity Wi-Fi devices. As far as we are aware of, it is the first work thatobtains accurate arithmetic signs of Doppler shifts on Wi-Fi infrastructure without modification.
• We model the relations between Doppler shifts with motion directions, and develop a wireless interactive exergame, i.e., a dance pad with eight types of inputs. It operates via a light-weight yet effective signal processing pipeline to detect, segment and recognize player actions from Doppler shift series without prior training. In addition to interactive exergames, the core techniques in WiDance are applicable in various gesture recognition applications, including, but not limited to, fall detection for the elderly, and gait recognition for user identification.
• We implement WiDance on commodity Wi-Fi devices and validate its effectiveness with various indoor settings. Experimental results demonstrate that WiDance achieves recognition accuracy of 92%. By exploiting complete informationof Doppler shifts for motion recognition, WiDance outperforms previous feature-based approaches, which fail to derive the direction of motions.The rest of the paper is organized as follows. We first provide the overview of WiDance, followed by the principles of Doppler shifts extraction and player actions recognition. Then, performance evaluation and user study of WiDance are provided. Finally, related works are reviewed and conclusion is drawn.
我们用商用WiFi设施做了一个原型,并评估其在各种室内环境中的性能。实验结果表明,该方法识别玩家动作的准确率为92%。与最先进的技术相比,多普勒频移特征可以区分跳舞游戏所需的全部八个动作,而现有方法[30,6]只能将行动分为三大类。WiDance的动作识别准确性可与流行的分类器(如HMM)相比,即使没有经过训练,我们也认为WiDance是迈向实用无线人机交互的有前途的一步,为未来的无线传感应用奠定了基础。
总之,主要贡献如下:
我们设计了一种新的算法利用商用WiFi网络上的天线多样性设备来提取完整的运动引起的多普勒频移信息(绝对值和标志)。就我们所知,这是第一个无需修改就可获得无线网络上多普勒频移精确算法。
我们模拟了多普勒频移与运动方向之间的关系,并开发一款无线互动健身游戏,即具有八种类型输入的舞蹈垫。它通过一种轻量级但有效的信号处理管道,在没有预先训练的情况下,从多普勒序列中检测、分割和识别玩家动作。除了交互游戏外,WiDance中的核心技术适用于各种手势识别应用,包括但不仅限于老年人跌倒检测和用户身份识别的步态识别。
我们在商用WiFi设备上实现了WiDance,并在各种室内情况下验证其有效性。实验实验结果表明,WiDance达到了92%的识别精度。WiDance利用完整的多普勒频移信息进行运动识别,优于之前的方法。
论文的其余部分组织如下:我们首先提供了WiDance的概述,接着是
多普勒频移提取和玩家动作识别。然后提供了WiDance的性能评估和用户研究。最后,对相关工作进行了回顾,并得出结论。
WiDance is a passively interactive dancing pad-like exergame using off-the-shelf Wi-Fi devices. Figure 2 shows the logic process of WiDance. The game starts by selecting a piece of music. For each note in the music, WiDance rhythmically displays an arrow of certain direction on the screen. The player follows visual notes and moves his/her legs along the directions indicated by the notes. WiDance continuously records and processes CSI for recognizing the player reactions over the whole gaming period. Each recognized reaction is compared with the corresponding visual note, and the comparison result is displayed on the screen and the reaction is scored. The main technical challenge for WiDance is to promptly and robustly recognize player reactions from noisy CSI data. Towards this goal, WiDance leverages the motion-induced Doppler effect observed in CSI and propose a two-step reaction recognition procedure. As shown in Figure 2, the first step is to extract Doppler effect from CSI. It recovers a spectrogram of Doppler frequency shifts in presence of randomphase offsets, burst noise and interferences using a series of signal processing techniques including antenna selection, data sanitization and time-frequency analysis. The second step is to recognize player reactions from the spectrogram of Doppler frequency shifts. The challenge for this step is to robustly recognize each individual player reaction from continuous Doppler frequency shifts that may consist of multiple reactions. Operations adopted in this step include movement detection, trace segmentation and motion classification. The output of this step is reaction series recognized by WiDance.
WiDance是一个使用现成的无线设备的类似于舞蹈垫的互动健身游戏。图2显示了逻辑跳舞的过程。游戏从选择一段音乐开始。对于音乐中的每一个音符,对应在屏幕上一个特定方向的箭头。玩家跟随视觉音符并沿着音符指示的方向移动他/她的腿。WiDance持续记录和处理整个游戏时期CSI以识别玩家的反应。将每个识别的反应与相应的视觉记录进行比较,比较结果显示在屏幕上,并对反应进行评分。
WiDance面临的主要技术挑战是如何从有噪声的CSI 数据中快速准确识别玩家的反应,为了实现这个目标,WiDance利用了在CSI中观察到的运动诱导的多普勒效应,并提出了两步反应识别过程。如图2所示,第一个步骤是从CSI中提取多普勒效应。它利用多普勒频移的天线选择、数据处理和时频分析等一系列信号处理技术解决相位偏移、突发噪声和干扰。第二步是从多普勒声谱图中识别用户的反应频率偏移。这一步的挑战是从连续的可能由多个动作组成的多普勒频移中识别单独的反应。该步骤中采用的操作包括运动检测,轨迹分割和运动分类。这一步是WiDance认可的反应系列。
WiDance extracts Doppler effect from Wi-Fi signals to recognize dancing actions of players. This section provides the technical preliminaries, fundamental model and practical issues of identifying Doppler frequency shifts from noisy Wi-Fi signals on commercial devices.
WiDance从无线信号中提取多普勒效应以识别玩家的舞蹈动作。本节提供了在商用设备上从有噪声的Wi-Fi信号中识别多普勒频移的基本模型和实际问题。
Doppler shift is the change in the frequency of a wave for observers. It is caused by change in relative locations of sources, observers and reflectors. In the context of contactless sensing, both transmitters (sources) and receivers (observers) are statically deployed, while target objects (reflectors) move and alter the wireless transmission. As shown in Figure 3a, when the target object moves towards the transmitter and the receiver, the crests and troughs of the reflected signals arrive at the receiver at a faster rate. Conversely, when an object moves away from the receiver, the crests and troughs arrive at a slower rate. In general, for a point object, the Doppler frequency shift of the signal reflected off the object is:
where λ is the wavelength of the signal and d(t) is the length of the reflected path.
As an illustrative example, we prototype a wireless transceiversystem using two USRPs synchronized by an external clock.The two USRPs are placed together near the ground, and a participant strides with his right leg at moderate rate, at the direction orthogonal to the link, as in Figure 3a. The Doppler effect caused by striding is obtained by tracking the phase of the received signal. Figure 3b shows the spectrogram of Doppler effect of striding. Clearly, positive Doppler shiftsappear as the user strides towards the link, while negative Doppler shifts appear as the user strides away from the link.Thus, it is possible to track target motion (both speed and direction) by exploiting Doppler effect.
In reality, instead of single path as the reflected path in Figure 3a, there are multiple paths where signal propagates from the transmitter to the receiver. The phenomenon is known as multipath. As a result, the response of the wireless channel at frequency f and time t is the superimposition of responses of each individual path [21]:
where K is the total number of multipath, and αk(t) and τk(t) are the complex attenuation factor and time of flight for the k-th path, respectively.
For the k-th path, the time of flight τk(t) is the time for light totravel at a distance of path length of dk(t), i.e. dk(t) = cτk(t),where c is the speed of light. Thus, according to Equation 1,the channel response can be represented by Doppler frequencyshift on each path and further divided into two categories:
where Hs(f) is the sum of responses of all static path (fD = 0),and Pd is the set of dynamic path (fD 不等于 0).
Assuming that αk(t) and fDk (t) are nearly constant during short time interval, Doppler frequency shifts can be obtained from spectrogram with time-frequency analysis:
where B(·) is the window function for cutting out the signalsegment of interest.
CSI is the sampled version of the channel response in Equation 2and 3. It is available from upper layers on off-the-shelfWi-Fi Network Interface Cards with only slight driver modification [9]. However, lack of synchronization between Wi-FiNICs induces unknown phase shifts in raw CSI:
where 2π(Δf t +Δt f) is phase shift caused by carrier frequency and timing offset. Therefore, it is infeasible to directly extract Doppler components from actual CSI measurements.Prior works [30, 6] eliminates phase noises by calculating CSI power, i.e. |Hˆ(f,t)|2 = |H(f,t)|2. However, this process meanwhile eliminates the imagery part of CSI, and thus loses the information of signs of Doppler shifts. That is, with only CSI power, we have no idea whether reflectors moves towards or away from the link. As a result, CSI power can only be used to recognize what target does (e.g. activity types), but not how target does (e.g. activity directions), through an upper-layer learning-based framework.
对于观察者来说,多普勒频移是波的频率变化。它是由光源、观测者和反射器的相对位置变化引起的。在非接触式传感环境中,发射器(源)和接收器(观察者)都是静态部署的,而目标物体(反射器)会移动并改变无线传输。如图3a所示,当目标物体向发射器和接收器移动时,反射信号的波峰和波谷以更快的速度到达接收器。相反,当一个物体离开接收器时,波峰和波谷到达的速度较慢。通常,对于点目标,从目标反射的信号的多普勒频移为:
作为一个说明性的例子,我们使用由外部时钟同步的两个USRP来制作一个无线收发系统的原型。两个USRP在靠近地面的地方放在一起,一个参与者适中的速度在与链接点正交的方向上迈右腿行走,如图3a所示。通过跟踪接收信号的相位来获得由跨步引起的多普勒效应。图3b是大步走多普勒效应的频谱图。很明显,当用户大步走向链路时,正多普勒频移出现,而当用户大步离开链路时,负多普勒频移出现。因此,利用多普勒效应跟踪目标运动(速度和方向)是可能的。
实际上,信号从发射机传播到接收机有多条路径,而不是图3a中的反射路径。这种现象被称为多径。结果,无线信道在频率f和时间t的响应是每个单独路径的响应的叠加[21]:
对于第k条路径,飞行时间τk(t)是光在路径长度为dk(t)的距离上行进的时间,即dk(t) = cτk(t),其中c是光速。因此,根据等式1,信道响应可以由每条路径上的多普勒频移来表示,并进一步分为两类:
假设αk(t)和fDk (t)在短时间间隔内几乎恒定,多普勒频移可以通过时频分析从频谱图中获得:
CSI是等式2和3中通道响应的采样版本。它可以从下层的无线网络接口卡上获得,只需稍微修改驱动程序[9]。然而,无线传感器网络之间缺乏同步会导致原始信道状态信息出现未知的相移:
因此,从实际的CSI测量中直接提取多普勒分量是不可行的。先前的工作[30,6]通过计算CSI功率来消除相位噪声,即|H(f,t)|2 = |H(f,t)|2。然而,该过程同时消除了信干噪比的图像部分,因此丢失了多普勒频移的符号信息。也就是说,只有CSI功率,我们不知道反射器是朝向链路还是远离链路。因此,通过一个基于上层学习的框架,CSI能力只能用于识别目标做什么(例如活动类型),而不能用于识别目标如何做(例如活动方向)。
To remove unknown phase shifts while still retain complete Doppler frequency shifts, WiDance uses multiple antennas available on Wi-Fi NICs. Since all antennas on the same NIC experience the same phase shifts, calculating conjugate multiplication of CSI of one pair of antennas also eliminates the phase offset. Specifically, denote the CSI of the i-the antenna as H(i)(f,t), we have the following product:
As in Figure 3c, by closely placing antennas A1 and A2 of the receiver (e.g., 2 f ≤ λ), CSI of two antennas may contain the same major multipaths (i.e., P(1) d = P(2) d ) with different complex attenuation factors but similar Doppler shifts. Then the terms in Equation 6 can be divided into three categories:
Static terms. The static term H(1) s (f)H(2) s (f)∗ are calculatedby multiplication of static responses (e.g., LOS) of two antennas. This term does not contain Doppler shifts and can befiltered out with a high-pass filter.
Target terms. Target terms are calculated by multiplication of static responses of one antenna and responses that contain target Doppler shifts of the other antenna. They contain Doppler shifts of interest and their arithmetic opposite numbers. A sufficient condition for extracting Doppler shifts instead of their opposite numbers is that terms containing Doppler shifts have larger amplitudes. Specifically, for the k-th path, followingcondition should be fulfilled:
Although it is unable to separate static and dynamic responses from CSI and directly verify the condition, there still exist some clues in CSI, which can guide us to obtain multiplication results that satisfy the condition. Note that the issues of antenna selection will be discussed later in this section.
Cross terms. Cross terms are products of dynamic responses of antennas. They only contain difference of Doppler shifts and may obfuscate real Doppler shifts. Fortunately, the static responses indoors are likely to dominate dynamic responses, due to strong LOS signals or large static reflectors like walls. As a result, the cross terms are orders weaker than target terms. Furthermore, in WiDance, only one player moves his legs in the monitor area at any time. And the difference of Doppler shifts caused by different body parts are usually small, and can be filtered out with high-pass filter. Thus, the negative effect of cross terms can be tolerated.
为了消除未知的相移,同时仍然保持完整的多普勒频移,无线网卡使用多个天线。由于同一网卡上的所有天线都经历相同的相移,因此计算一对天线的CSI的共轭相乘也可以消除相位偏移。具体来说,将天线的CSI表示为H(i)(f,t),我们有以下公式:
如图3c所示,通过紧密放置接收机的天线A1和A2(例如,2 f ≤ λ),两个天线的CSI可以包含相同的主多径(即,P(1) d = P(2) d),具有不同的复衰减因子但具有相似的多普勒频移。那么等式6中的项可以分为三类:
静态术语 静态项H(1)s(f)H(2)s(f)÷是通过两个天线的静态响应(如LOS)相乘来计算的。这个项不包含多普勒频移,可以用高通滤波器滤除。
目标条款 目标项通过将一个天线的静态响应与包含另一个天线的target多普勒频移的响应相乘来计算。它们包含感兴趣的多普勒频移及其算术相反的数字。提取多普勒频移而不是相反数目的一个充分条件是包含多普勒频移的项具有更大的幅度。具体而言,对于第k条路径,应满足以下条件:
虽然它不能从CSI中分离出静态和动态响应并直接验证条件,但CSI中仍然存在一些线索,可以指导我们获得满足条件的乘法结果。请注意,天线选择问题将在本节稍后讨论。
交叉条款 交叉项是天线动态响应的产物。它们只包含多普勒频移的差异,可能混淆真实的多普勒频移。幸运的是,由于强烈的视线信号或墙壁等大型静态反射器,室内的静态响应可能会主导动态响应。因此,交叉条款是弱于目标条款的项。再者,在WiDance中,任何时候都只有一个玩家在监控区内活动双腿。而且不同身体部位引起的多普勒频移差异通常很小,可以用高通滤波器滤除。因此,交叉条款的负面影响是可以容忍的。
To convert noisy CSI to spectrogram of Doppler frequency shifts, WiDance introduces a series of processing steps.
Antenna Selection. Recall that to correctly extract Doppler frequency shift, the condition in Equation 7 should be satisfied. Thus, we should properly select pairs of antennas and assign the order of antennas in conjugated multiplication. Despite of the mixture of static and dynamic responses, CSI itself reveals some clues that help verify the condition.
Observation I: CSI with higher amplitude is likely to possess larger static responses. This is because the amplitude of static responses are orders of magnitude larger than that of dynamic responses, due to existence of strong LOS signals and larger static reflectors like walls. Consequently, even ifslightly disturbed by dynamic responses, the averaged CSI canbe used to approximate the static response in Equation 7.
Observation II: CSI with higher variance is likely to possess larger dynamic responses. This is because only dynamic responses contributes to variation of CSI. As a result, the standard deviation of CSI can be used to indicate the dynamic responses in Equation 7.
Figure 4a illustrates the criterion for selection of pair of antennas. The boxes show distributions of CSI of different antennas and across subcarriers over time. In this example, the 2nd antenna has the largest variances with relative small amplitudes, while the 3rd antenna has the largest amplitudes with relative small variances. By comparing the ratio of amplitudes and standard deviations of CSI, the 3rd and 2nd antennas are orderly selected.
Data Sanitization. As noted in Equation 6, raw CSI contains significant static components, low-frequency interferences and burst noises, which obfuscate Doppler shifts of interest. Thus, it is natural to remove these signal components with filters. Specifically, we adopt Butterworth bandpass filter for its flat amplitude response in the pass band, and apply the filter to the multiplication series of each CSI subcarrier. The upper and lower cutoff frequencies of Butterworth filter are set to 40Hz and 2Hz respectively. The upper cutoff frequency is decided by the experimental observation that normal human striding velocities are no more than vm =1m/s. For Wi-Fi devices working at 5.8GHz, the upper bound of Doppler frequency of PLCR is f = 2vmλ ≈ 2×1m/s 0.05m = 40Hz. The lower cutoff frequency is decided by trade-off between fully eliminating interference and loss of low-frequency components of Doppler effect. However, such loss has minor impact on WiDance, since WiDance can still leverage high-frequency components that are more stable against burst noises for motion recognition.
Figure 4b and 4c show the amplitude and phase of CSI series before and after filtering. Clearly, burst noises and low-frequency interferences are removed. Moreover, the cyclic phase changes that correspond to the target Doppler shifts at time 1 and 1.5 second are signified.
Time-frequency analysis. To further denoise and compress CSI data for time-frequency analysis, we perform PrincipleComponent Analysis (PCA) on all CSI subcarriers and select the first principle component that contains major and consistent power variations caused by target motions. Then, shortterm Fourier transform (STFT) is applied to the first principle component to obtain the spectrogram of Doppler frequency shifts.
Specifically, a Gaussian window with length shorter than 0.15s is applied in STFT to meet the assumption of nearly constant amplitudes and Doppler shifts in Equation 4. A zero padding is further applied in order to generate finer-grained spectrogram. Finally, the non-overlapping spectrograms of all CSI segments
are spliced together to generate the whole spectrogram.
Note that there is lower bound on range of actions that can be detected by WiDance, due to the well-known uncertainty principle. Specifically, suppose the time length of the data window of STFT is T, then the frequency resolution of the spectrum is Δf = 1T . To correctly identify the signal of the frequency shift, the amplitude of the frequency shift must fall into the non-DC bins, which correspond to a minimum frequency of 2T . For a signal segment with constant frequency shift, the frequency shift should fulfil F = Vλ ≥ 2T , where V is the change rate of the reflecting signal path and λ is the wave length of the signal. Thus the sensitivity of action range R is:
In reality, the sensitivity of WiDance is worse than the theoretical bound due to complex stride actions including acceleration and deceleration and existence of environmental noises.
Figure 4d illustrates the spectrogram of Doppler shifts induced by striding (first towards and then away from the link), from noisy CSI provided by commercial Wi-Fi NIC. Though fluctuating, the spectrogram clearly reflects the trend of signed Doppler effect (first positive and then negative), which we use to recognize the reaction performed by the player.
为了将有噪声的CSI转换成多普勒频移的频谱图,WiDance引入了一系列处理步骤。
天线选择 回想一下,为了正确提取多普勒频移,应该满足等式7中的条件。因此,我们应该适当地选择天线对,并在共轭乘法中分配天线的顺序。尽管有静态和动态响应的混合,犯罪现场调查本身揭示了一些有助于验证这种情况的线索。
观察一:振幅较高的CSI可能具有较大的静态响应。这是因为静态响应的幅度比动态响应的幅度大几个数量级,这是由于存在强的视线信号和较大的静态反射器,如墙壁。因此,即使受到动态响应的轻微干扰,平均CSI也可用于近似等式7中的静态响应。
观察二:方差较大的CSI可能具有较大的动态响应。这是因为只有动态响应有助于CSI的变化。因此,CSI的标准偏差可用于指示等式7中的动态响应。
图4a说明了选择天线对的标准。方框显示了不同天线和子载波上的CSI随时间的分布。在这个例子中,第二天线具有相对小幅度的最大变化,而第三天线具有相对小幅度的最大幅度变化。通过比较CSI的振幅比和标准差,依次选择第三和第二天线。
数据清理 如等式6所述,原始CSI包含显著的静态分量、低频干扰和突发噪声,这些干扰会使感兴趣的多普勒频移变得模糊。因此,用滤波器去除这些信号成分是很自然的。具体来说,我们采用巴特沃斯带通滤波器,因为它在通带中具有平坦的幅度响应,并将该滤波器应用于每个CSI子载波的乘法序列。巴特沃斯滤波器的上下截止频率分别设置为40Hz和2Hz。上截止频率是由正常人行走速度不超过vm =1m/s的实验观测值决定的。对于5.8GHz的Wi-Fi设备working,PLCR的多普勒频率的上界是f = 2vmλ ≈ 2×1m/s 0.05m = 40Hz。较低的截止频率由完全消除干扰和多普勒效应低频分量损失之间的权衡决定。However,这种损失对WiDance的影响很小,因为WiDance仍然可以利用高频分量进行运动识别,这些高频分量对突发噪声更稳定。
图4b和4c显示了滤波前后CSI的幅度和相位。显然,突发噪声和低频干扰已被消除。此外,表示对应于时间1和1.5秒的目标多普勒频移的循环相位变化。
时频分析 为了进一步去噪和压缩CSI数据用于时频分析,我们在所有CSI子载波上执行主成分分析(PCA ),并选择包含由目标运动引起的主要和consistent功率变化的第一主成分。然后,对第一主分量进行短时傅里叶变换(STFT),得到多普勒频移的频谱图。具体而言,在STFT应用长度小于0.15秒的高斯窗口,以满足等式4中的振幅和多普勒频移几乎恒定的假设。进一步应用零填充以生成更细粒度的谱图。最后,将所有CSI片段的非重叠谱图拼接在一起,生成整个谱图。请注意,由于众所周知的不确定性原理,WiDance可以检测的动作范围有下限。具体地说,假设STFT数据窗口的时间长度为T,那么频谱的频率分辨率为δf = 1T。为了正确地识别频移信号,频移的幅度必须落入非DC仓,其对应于2T的最小频率。对于具有恒定频移的信号段,频移应满足F = Vλ ≥ 2T,其中V是反射信号路径的变化率,λ是信号的波长。因此,作用范围R的灵敏度为:
实际上,由于复杂的跨步动作,包括加速和减速以及环境噪声的存在,WiDance的灵敏度比理论界限更差。图4d示出了从商用无线网卡提供的噪声信道状态信息中跨越(首先朝向链路,然后远离链路)引起的多普勒频移的频谱图。虽然波动,但频谱图清楚地反映了有符号多普勒效应的趋势(先正后负),我们用它来识别玩家的反应。
This section details the principles and practical issues to recognize player reactions from spectrogram of Doppler shifts.
本节详细介绍了从多普勒频移频谱图中识别玩家反应的原理和实际问题。
We first derive the relation between movements of reflector (player) and Doppler shifts. As shown in Figure 5a, given constant length of the reflecting path, the reflector is on an ellipse with the transceivers as focuses. Based on the impact on the ellipse, the velocity of the reflector can be divided into the tangential velocity along the tangent and the radial velocity along the norm. Specifically, the tangential velocity guides the target moving along the ellipse while the radial velocity drives the target moving off the ellipse. Evidently, the radial velocity is the only cause of change in the length of the reflected path, and thus the Doppler frequency shift. That is, if the reflector moves along various directions with the same speed, the link will experience different changes in the length of the reflected path, according to the radial velocities projected on the norm direction. As a result, it is possible to obtain the moving direction of the reflector with the level of radial velocity derived from Doppler effect.
However, it is insufficient to use one link to estimate the moving direction for the following two reasons. First, the distribution of radial velocity is symmetric about the norm. So it is unable to distinguish any two symmetric directions using Doppler effect of a single link only. Second, recognition using a single link assumes that the player performs reactions at constant speed to consistently map the level of radial velocity to moving direction. However, such assumption will not hold in practice due to diverse actions performed by players. To address the above challenges, we propose a recognition scheme to solve the symmetric ambiguities with minimum cost by adding one additional link. As shown in Figure 5b, two links are placed orthogonal with each other. As the player stands at the intersection of midnormals of the two link, for any direction where the player reacts, the velocity can be projected on the 2D V1 −V2 plane, where Vi is the speed of the radial velocity for the i-th link. Even if the player performs reactions at various speeds, the moving direction is reserved by radial velocities of the two orthogonal links, making it possible to recognize player reactions.
我们首先导出反射器(玩家)的运动和多普勒频移之间的关系。如图5a所示,给定恒定长度的反射路径,反射器位于椭圆上,收发器作为焦点。根据对椭圆的影响,反射器的速度可分为沿切线的切向速度和沿范数的径向速度。具体来说,切向速度引导目标沿着椭圆移动,而径向速度驱动目标离开椭圆。显然,径向速度是反射路径长度变化的唯一原因,因此也是多普勒频移的唯一原因。也就是说,如果反射器以相同的速度沿着不同的方向移动,则根据投射在范数方向上的径向速度,链路将经历反射路径长度的不同变化。结果,可以利用从多普勒效应导出的径向速度水平来获得反射器的移动方向。
但是,由于以下两个原因,使用一个链接来估计移动方向是不够的。首先,径向速度的distribution关于范数对称。因此,仅利用单个链路的多普勒效应是无法区分任何两个对称方向的。第二,使用单个链接的识别假设玩家以恒定的速度进行反应,以一致地将径向速度水平映射到移动方向。然而,这种假设在实践中并不成立,因为玩家的行为各不相同。为了应对上述挑战,我们提出了一种识别方案,通过增加一个额外的链接,以最小的代价解决对称模糊。如图5b所示,两个链接彼此正交放置。当玩家站在两条连线的中线交点处时,对于玩家反应的任何方向,速度都可以投影到2D V1-V2平面上,其中Vi是第I条连线的径向速度。即使玩家以不同的速度进行反应,移动方向也由两个正交连杆的径向速度保留,从而可以识别玩家的反应。
Figure 6 shows an illustrative process of motion recognition in WiDance. In this example, a player stands at the intersection of midnormals of the two link, facing towards the transmitter. The player first halts for 1s, then moves the leg in left-rear direction within 2s, and halts again for another 2s. Next, the player moves forward, continuously followed by two left-front movements, which takes 6s in total. Finally, the player halts for 1s. The corresponding spectrogram is shown in Figure 6a.
图6显示了WiDance中运动识别的说明性过程。在这个例子中,一个玩家站在两个链接的中间法线的交叉点上,面向发射器。玩家先暂停1秒,然后在2秒内将腿向左后方移动,再暂停2秒。接下来,玩家向前移动,连续跟随两个左前移动,总共需要6s。最后玩家暂停1秒。相应的频谱图如图6a所示。
Movement Detection. During the game, the player may occasionally halt, due to e.g. long waiting intervals between successive visual action notes (arrow) or tiredness. Periods during which the player halts do not contain valid actions, and should be skipped for efficiency. Thus, it is necessary to perform movement detection before motion recognition. WiDance conducts movement detection based on the following intuition. When a player is static, no Doppler effect will be observed and the spectrogram contains only noises. As a result, the power of spectrogram spreads out across the whole frequency band. In contrast, when a player moves, the spectrogram is dominated by Doppler effect, and the power of spectrogram concentrates on the frequency of interest. Therefore, WiDance calculates and smooths variances of power distribution in frequency domain for movement detection. Note that only the filter of CSI sanitization affects the variance,as the filter decides to which extent the out-of-band components are filtered out. Therefore, regarding a fixed filter, it isfeasible to use a predefined threshold. Player movement isdetected when the variance falls below the threshold, as shownin Figure 6b.
运动检测 在游戏过程中,由于例如连续视觉动作音符(箭头)之间的长时间等待间隔或疲劳,玩家可能偶尔会暂停。玩家暂停的时间段不包含有效的动作,为了提高效率应该跳过。因此,有必要在运动识别之前执行运动检测。WiDance基于以下直觉进行运动检测。当玩家静止时,不会观察到多普勒效应,声谱图只包含噪声。结果,声谱图的功率扩展到整个频带。相比之下,当玩家移动时,声谱图以多普勒效应为主,声谱图的力量集中在感兴趣的频率上。因此,WiDance在频域中计算并平滑功率分布的方差,用于运动检测。请注意,只有CSI净化的过滤器会影响方差,因为过滤器决定带外分量被过滤掉的程度。因此,对于固定过滤器,使用预定义的阈值是可行的。如图6b所示,当方差低于阈值时,检测到玩家移动。
Trace Segmentation During an exergame, a player may frequently perform continuous actions, as in Figure 6. Thus, it is necessary for WiDance to correctly segment them into individual ones for recognition.
WiDance leverages the characteristics of action patterns performed by players. Specifically, for each action, the player first stretches one of his/her legs in some direction, and then retracts the leg back to stand. As a result, the player action causes a pair of peaks or valleys in Doppler frequency shifts with significantly different amplitudes, depending on the player’s moving direction, as illustrated in Figure 6c. For each action, by orthogonally placing two links, at least one link experiences large fluctuations no matter at which direction the player moves. Thus, WiDance computes the average sum of absolute values of Doppler frequency shifts of the two links, and detects the prominent peaks. Then two adjacent peaks are grouped as the spectrum of one complete action. An illustrative segmentation result is shown in Figure 6c.
痕迹分割 在健身游戏中,玩家可能会频繁地执行连续动作,如图6所示。因此,WiDance有必要将它们正确地分割成单个的片段以供识别。
WiDance利用了玩家表演的动作模式的特点。具体来说,对于每个动作,玩家首先向某个方向伸展他/她的一条腿,然后将腿缩回以站立。结果,根据玩家的移动方向,玩家的动作导致多普勒频移的一对峰值或谷值具有显著不同的幅度,如图6c所示。对于每一个动作,通过正交放置两个链接,无论玩家朝哪个方向移动,至少有一个链接会经历很大的波动。因此,WiDance计算两条链路的多普勒频移绝对值的平均值和,并检测显著的峰值。然后,两个相邻的峰被分组为一个完整动作的频谱。图6c显示了一个说明性的分割结果。
Motion Classification Finally, WiDance applies the recognition model in Figure 5b to Doppler frequency segments to identify the corresponding actions. As Doppler frequency shifts vary even within one segment, due to acceleration and deceleration of human motions, simply estimating moving direction corresponding to each time sample may suffer from significant noises. Hence we propose a two-level rule-based classification scheme, which comprehensively takes advantage of all data available within the Doppler frequency segments.
In the first step, WiDance classifies movement directions based on the ratio of accumulative absolute values of Doppler frequency shifts of two links, as shown in Figure 6d. For clarity,we represent the ratio with its arctangent value, which range in [0,90°]. Clearly, the movement directions can be classi-fied into three coarse categories: LR/RF, front/right/rear/left,LF/RR. The theoretical ratios of three categories are 0°, 45°and 90°, respectively. However, due to noises and variationsof player positions, non-zero Doppler frequency shifts can stillbe observed even if the player moves in parallel with the link,which leads to practical ratios slightly larger than 0° for LR/RF category, and smaller than 90° for LF/RR category. Thus, we slightly adjust the thresholds to 30° and 60°, respectively.
In the second step, WiDance further differentiates movement directions in each coarse categories, based on the order of appearances of the positive and negative Doppler frequency shifts in the segment. Specifically, for any link, if the positive Doppler shift firstly appears, the player stretches his leg towards the link. In contrast, if the negative Doppler shift firstly appears, the player stretches his leg away from the link. Thus, the directions in each coarse directions can be further classi-fied with the knowledge of whether the player stretches his leg towards or away from the two links. With the two-step scheme above, the movement directions of actions can be identified.
运动分类 最后,WiDance将图5b中的识别模型应用于多普勒频率段,以识别相应的动作。由于多普勒频移甚至在一个片段内变化,由于人体运动的加速和减速,简单地估计对应于每个时间样本的运动方向可能会受到显著的噪声的影响。因此,我们提出了一种基于规则的两级分类方案,该方案综合利用了多普勒频率段内的所有可用数据。
第一步,WiDance根据两个环节多普勒频移累积绝对值的比值对运动方向进行分类,如图6d所示。为了清楚起见,我们用其反正切值来表示比率,其范围在[0,90 ]内。显然,运动方向可以分为三大类:左后/右前、前/右/后/左、左前/右后。三类的理论比值分别为0、45、90。然而,由于噪声和玩家位置的变化,即使玩家与链路平行移动,也仍然可以观察到非零多普勒频移,这导致实际比率对于左/右类别略大于0,对于左/右类别小于90。因此,我们将阈值分别略微调整到30和60。
在第二步中,基于片段中正多普勒频移和负多普勒频移的出现顺序,WiDance进一步区分每个粗略类别中的运动方向。具体来说,对于任何一个环节,如果正多普勒频移首先出现,球员就把腿伸向这个环节。相比之下,如果负多普勒频移首先出现,玩家将他的腿从连接处移开。因此,每个粗略方向上的方向可以进一步分类,知道玩家是将他的腿伸向还是远离两个链接。通过上述两步方案,可以识别动作的移动方向。
This section presents the experimental settings and the detailed performance of WiDance.
本节介绍了实验设置和WiDance的详细性能。
Evaluation Setup WiDance consists of one transmitter and two receivers equipped with wireless cards. As shown in Figure 7a, three ThinkPad T-series laptops equipped with Intel 5300 wireless NICs are used to establish orthogonal links. For easier deployment, we connect the devices with external antennas. Specifically, the transmitter has one antenna, and each receiver has three antennas. The links are set up to work on Channel 165 at 5.825GHz. CSI are collected with modified network driver [9], and then passed to processing computer via TCP/IP protocol. The processing computer uses a Intel i7-5600U 2.6GHz CPU, and processes CSI data using MATLAB. The antennas of each receiver are placed loosely in a line, with a spacing distance of about one wavelength (5.2cm). And the packet transmission rate is set to 1024Hz, which is later decimated for study of the impact of sampling rates. The transmission power are set to 15dBm by default.
All experiments are conducted in rooms in academic buildings, where experimental areas are surrounded with desks, chairs and other equipments. Players are asked to stand at the intersection of midnormals of the two links, facing towards the transmitter. To interact with players, we write a program that randomly displays visual notes on the screen, guiding players to perform dancing actions. An action of rear stride is illustrated in Figure 7b. To make sure that players concentrate on the experiment, no music is incorporated in the current version.
To fully understand the variations in user diversity, we recruit 30 participants and ask them to play dancing games with WiDance. For preparation, we demonstrate the usage of WiDance. Then, participants are asked to individually practice dancing. During experiment, each participant is asked to play 2-minute dancing game for 4 times. Games are played in turns to ensure that participants get enough rest before each game. All participants are rewarded after the experiment. The total experiment lasts totally for 2 days and 8 hours each day, and over 10,000 actions are recorded during the experiment.
Baselines To fairly demonstrate the performance of WiDance, we implement WiDance and two learning-based schemes, HMM-WiDance and CARM [30], for comparison. On one hand, HMM-WiDance uses Doppler frequency shifts as WiDance does, yet trains HMM for all eight actions. We compare WiDance with HMM-WiDance to evaluate the non-learning recognition scheme. On the other hand, CARM uses only absolute values of Doppler frequency shifts that is obtained from CSI power, and trains HMMs using this truncated features. We compare WiDance with CARM to evaluate the extraction of Doppler frequency shifts. The HMMs implemented in both schemes are similar to those in [30].
评估设置 WiDance由一个发射机和两个配有无线网卡的接收机组成。如图7a所示,三台配备英特尔5300无线网卡的ThinkPad T系列笔记本电脑用于建立正交链路。为了便于部署,我们将设备与外部天线相连。具体来说,发射机有一个天线,每个接收机有三个天线。链路设置为在5.825GHz的165通道上工作。CSI由修改的网络驱动程序[9]收集,然后通过TCP/IP协议传递给处理计算机。处理计算机使用Intel i7-5600U 2.6GHz CPU,使用MATLAB处理CSI数据。每个接收器的天线松散地排成一行,间距约为一个波长(5.2厘米)。并且分组传输速率被设置为1024Hz,稍后对其进行抽取以研究采样率的影响。默认情况下,传输功率设置为15dBm。
所有的实验都是在学术大楼的房间里进行的,那里的实验区被桌子、椅子和其他设备包围着。玩家被要求站在两个链接的中间法线的交叉点上,面向发射器。为了与玩家互动,我们编写了一个程序,在屏幕上随机显示视觉笔记,指导玩家进行舞蹈动作。后跨步的动作如图7b所示。为了确保玩家专注于实验,当前版本中没有音乐。
为了充分了解用户多样性的变化,我们招募了30名参与者,并要求他们使用WiDance玩跳舞游戏。为了做好准备,我们演示了WiDance的用法。然后,要求参与者单独练习舞蹈。在实验过程中,每个参与者被要求玩4次2分钟的舞蹈游戏。游戏是轮流进行的,以确保参与者在每次游戏前得到足够的休息。实验结束后,所有参与者都将得到奖励。整个实验每天共持续2天8小时,实验过程中记录了10000多个动作。
基线 为了公平地展示WiDance的性能,我们实现了WiDance和两个基于学习的方案,HMM-WiDance和CARM [30],用于比较。一方面,HMM-wid dance像WiDance一样使用多普勒频移,但仍然为所有八个动作训练hmm。我们将WiDance与HMM-WiDance进行比较,以评估非学习识别方案。另一方面,CARM仅使用从CSI功率获得的多普勒频移的绝对值,并使用该截断特征来训练HMM。我们将维普斯与CARM进行比较,以评估多普勒频移的提取。在这两种方案中实现的hmm与[30]中的相似。
Overall performance Taking all parameters into consideration, WiDance yields an overall accuracy of 92%. As the confusion matrix in Figure 8a shows, WiDance achieves consistently high recognition accuracy for all actions. Yet, there still exist errors for some directions. Based on the root causes, we divide errors into two categories: errors between adjacent directions and errors between non-adjacent errors. Recall that WiDance recognizes actions with two criterions: the amplitude ratio of Doppler shifts and the appearance order of positive and negative Doppler shifts. As shown in Figure 10a, errors in Doppler ratios make WiDance confuse actions with adjacent directions, while in Figure 10b, errors in moving directions make WiDance confuse actions with non-adjacent directions. Most errors come from misclassification between actions with adjacent directions. For example, about 10.6%, 10.2% and 8.7% actions with right-front, right-rear and left directions are misclassified to their adjacent directions: right, rear and left-rear, respectively. In contrast, errors in moving directions only cause negligible errors, which is about only 0.52% for actions with left direction and 0.14% for actions with front and right direction. The result demonstrates that the condition in Equation 7 can be almost always fulfilled to correctly recognize the true Doppler shifts. Also, it means that compared with signs of Doppler shifts, amplitudes of Doppler shifts is more difficult to estimate, due to environmental noises.
整体性能综合考虑所有参数,WiDance的整体精度为92%。如图8a中的混淆矩阵所示,WiDance对所有动作都达到了consistently高识别精度。然而,有些方向仍然存在误差。根据根本原因,我们将错误分为两类:相邻方向之间的错误和不相邻方向之间的错误。回想一下,WiDance识别动作有两个标准:多普勒频移的幅度比和正负多普勒频移的出现顺序。如图10a所示,多普勒比率的误差使WiDance将动作与相邻方向混淆,而在图10b中,移动方向的误差使WiDance将动作与非相邻方向混淆。大多数错误来自于相邻方向动作之间的错误分类。例如,大约10.6%、10.2%和8.7%的具有右前、右后和左方向的动作被错误地分类到它们相邻的方向:分别是右、后和左后。相比之下,移动方向的误差只造成可忽略的误差,对于向左方向的动作大约只有0.52%,对于向前和向右方向的动作大约只有0.14%。结果表明,等式7中的条件几乎总是能够满足,以正确识别真实的多普勒频移。此外,这意味着与多普勒频移的符号相比,由于环境噪声,多普勒频移的幅度更难估计。
Further, the confusion matrix reveals that WiDance statistically outperforms in straight directions (front, right, rear, left) than in oblique directions (right-front, right-rear, left-rear, left front). Note that the oblique directions are parallel with one of two links. For simplicity of modelling, WiDance assumes that movement in parallel with the link causes zero Doppler shifts to signals of that link. However, such assumption holds only when the player moves strictly at the intersection of midnormals of two links. In reality, as the player strides within a small range, the link may experience significant Doppler shifts even if the player moves in parallel with the link, which may lead to errors in Doppler amplitude ratios and thus incorrect recognition outputs. Such issues can be solved by further modelling positions of legs, in addition to movement directions in current model, which we leave for future work.
此外,混淆矩阵显示,在统计上,直向(前、右、后、左)比斜向(右前、右后、左后、左前)表现更好。请注意,倾斜方向与两个连杆之一平行。为了简化建模,WiDance假设与链路平行的运动导致该链路信号的多普勒频移为零。然而,只有当玩家严格地在两个环节的midnormals交叉点上移动时,这样的假设才成立。实际上,当玩家在小范围内行走时,即使玩家与链路平行移动,链路也可能经历显著的多普勒频移,这可能导致多普勒幅度比的误差,从而导致不正确的识别输出。除了当前模型中的运动方向之外,这些问题可以通过腿的进一步模型设计来解决,这将留给将来的工作。
Performance of recognition scheme To evaluate the performance of non-learning recognition scheme implemented in WiDance, we compare WiDance with HMM-WiDance, which applies Doppler shifts feature to HMM for modelling of eight actions. By carefully tuning the HMM parameters, HMM-WiDance merely achieves an accuracy of 95%, which is slightly higher than that of WiDance. Thus, we claim that the non-learning recognition scheme can achieve high accuracy comparable with the complex learning method.
Different from WiDance which exhibits slightly different performance for different directions, HMM-WiDance achieve mild accuracy for all directions. The accuracies of oblique directions and straight directions achieved by HMM-WiDance are comparable. As trained with real samples, HMM-WiDance is able to differentiate directions at global scale, while modeling small deviations at local scale. Specifically, the fluctuations of Doppler shifts when the player moves in parallel with the link are successfully modelled by HMM. However, HMM-WiDance suffers from the common over-fitting problem. For example, the accuracy of actions with rear direction decreases when applying HMM-WiDance instead of WiDance.
识别方案的性能 为了评估WiDance中实现的非学习识别方案的性能,我们将WiDance与HMM-WiDance进行了比较,HMM-WiDance将多普勒频移特征应用于HMM,用于八个动作的建模。通过仔细调整隐马尔可夫模型的参数,隐马尔可夫模型的准确率仅为95%,略高于隐马尔可夫模型的准确率。因此,我们声称非学习识别方案可以获得与复杂学习方法相当的高精度。
不同于对不同方向表现出稍微不同的性能的WiDance,HMM-WiDance对所有方向都实现了适度的精确度。隐马尔可夫模型实现的斜向和直向的精度是相当的。当用真实样本训练时,隐马尔可夫模型能够在全局尺度上区分方向,而在局部尺度上建模小偏差。具体来说,通过隐马尔可夫模型成功地模拟了玩家与链路平行移动时多普勒频移的波动。但是,HMM-WiDance存在普遍的过拟合问题。例如,当应用HMM-WiDance而不是WiDance时,具有后方方向的动作的准确性降低。
Performance of extraction scheme To demonstrate the uniqueness and evaluate the performance of extraction of Doppler frequency shifts in WiDance, we compare WiDance with CARM. Note that we can omit the impact of difference of recognition methods used by WiDance and CARM, as these methods have comparable performance, as indicated by the comparison between WiDance and HMM-WiDance above.
提取方案的性能 为了证明WiDance中多普勒频移提取的独特性并评估其性能,我们将WiDance与CARM进行了比较。请注意,我们可以忽略WiDance和CARM使用的不同识别方法的影响,因为这些方法具有可比较的性能,如上面WiDance和HMM-WiDance之间的比较所示。
Figure 8c shows the confusion matrix for CARM. CARM fails to recognize actions in several directions, and achieves only 60% accuracy even after carefully tuning the HMM parameters. This is because CARM is based on CSI power and only obtains the absolute values of Doppler shifts, due to loss of imagery part of the signal. Theoretically, if we just use absolute values of Doppler shifts in the non-learning recognition scheme, then only Doppler ratio can be calculated. As a result, only the first step in non-learning recognition scheme can be carried out, and actions can be classified into three coarse categories: LR/RF, front/right/rear/left, LF/RR. However, there is no more clue to differentiate actions in each category. Clearly, some directions are more confused with directions in the same category, besides the large adjacent errors. For example, the right direction is statistically more confused with the front, rear and left direction, and the left-front direction is more confused with the right-rear direction. However, in practice, directions in the same coarse categories are not totally confused by CARM. For all directions, most actions can be correctly classified by CARM. It is because the actions involve movements of whole body rather than only feet and legs, which can be incomprehensively captured by learning method used in CARM to correctly recognize the majority of actions. However, even with these features cannot CARM fully outline the moving directions of actions, which leads to
low recognition accuracy.
图8c显示了CARM的混淆矩阵。CARM无法识别几个方向上的动作,即使在仔细调整隐马尔可夫模型参数后,准确率也只有60%。这是因为,由于信号的图像部分丢失,CARM基于信干噪比功率,并且仅获得多普勒频移的绝对值。理论上,如果我们只使用非学习识别方案中多普勒频移的绝对值,那么只能计算多普勒比。这样一来,只能进行非学习识别方案的第一步,动作可以分为三大类:LR/RF、前/右/后/左、LF/RR。然而,没有更多的线索来区分每个类别中的操作。显然,除了相邻误差大之外,有些方向更容易与同一类别的方向混淆。例如,从统计上看,右方向更容易与前、后、左方向混淆,左前方向更容易与右后方向混淆。然而,在实践中,方向在同样粗糙的类别中并不完全被CARM混淆。就所有方向而言,大多数行动都可以被CARM正确分类。这是因为这些动作涉及全身的运动,而不仅仅是脚和腿的运动,而这些运动可以通过CARM用来正确识别大多数动作的学习方法来不全面地捕捉。然而,即使具有这些特征,CARM也不能完全勾勒出动作的运动方向,这导致识别精度低。
Performance of compound gestures Currently, we choose 9 motion directions as the gesture set to fit the dance game. A natural way to scale the gesture set to enable more HCI applications is to construct compound gestures from the 9 motion directions as primitives, e.g. double-left, front-right. Figure 9 shows the performance of WiDance in recognizing compound gestures that composes two primitive actions. WiDance achieves an overall accuracy of 85%, which is slightly lower than that of recognizing primitive actions. And the accuracy of recognizing compound gestures ranges from 71.4% to 100%. Such diversity in accuracy shows the bias of gesture recognition of WiDance. As a result, it is better for real users to conduct pre-training operations to select gestures with high recognition accuracy for practical use. For example, as shown in Figure 9, 33 out of 64 gestures have recognition accuracy higher than 85%, which can be selected to form a larger gesture set than that of 9 primitive actions.
复合手势的表现 目前,我们选择9个动作方向作为适合舞蹈游戏的手势集。缩放手势集以启用更多人机交互应用的一种自然方式是从9个运动方向构建复合手势作为图元,例如双左、前右。图9显示了WiDance在识别由两个原始动作组成的复合手势方面的性能。WiDance整体准确率达到85%,略低于识别原始动作的准确率。识别复合手势的准确率在71.4%到100%之间。这种准确性的差异显示了WiDance手势识别的偏差。因此,真实用户最好进行预训练操作,选择识别精度高的手势进行实际使用。例如,如图9所示,64个手势中有33个手势的识别准确率高于85%,可以选择这些手势来形成比9个原始动作更大的手势集。
Impact of user diversity To evaluate the robustness of WiDance for various users, we recruit 30 participants (17 males,13 females) to test WiDance. Figure 12 shows the statistics of participants. The participants have various heights, weighs and somatotype, as indicated by Body Mass Index (BMI). Note that the participants also have different levels of body coordination and familiarity with dancing games. Figure 12 shows the performance of WiDance with different participants. WiDance recognizes actions of all participants with accuracy higher than 85%, without any beforehand per-person learning. However, the result shows no clear correlation between user types and performance of WiDance, the further study of which is leaved as future work.
用户多样性的影响 为了评估WiDance对不同用户的稳健性,我们招募了30名参与者(17名男性,13名女性)来测试WiDance。图12显示了参与者的统计数据。参与者有不同的身高、体重和体型,如体重指数所示。请注意,参与者的身体协调性和对舞蹈游戏的熟悉程度也不同。图12显示了不同参与者的WiDance的性能。WiDance以高于85%的准确率识别所有参与者的动作,无需任何预先的个人学习。然而,该结果显示用户类型和WiDance的性能之间没有明显的相关性,对其的进一步研究有待于将来的工作。
Impact of action range We evaluate the sensitivity of WiDance by asking participants to perform actions with various ranges from 0.3m to 0.9m. Note that a range of 0.3m is comparable to the working range of dancing mats, and a range of 0.9m is almost the extreme range that participants can achieve with the limit of both leg length and note interval length. As shown in Figure 13, WiDance maintains consistent high accuracy of 92% when the action range is larger than 0.6m, and slightly degrades to 86% when the range decreases to 0.5m.However, with further decreasing of action range, the accuracy dramatically decreases. It is consistent with the analysis that smaller action ranges leads to shorter actions time and smaller action speed, making it harder to extract Doppler frequency shifts from spectrogram.
作用范围的影响 我们通过要求参与者在0.3m到0.9m的范围内进行各种动作来评估WiDance的灵敏度。请注意,0.3m的范围与跳舞垫的工作范围相当,0.9m的范围几乎是参与者在腿长和音符间隔长度的限制下可以达到的极限范围。如图13所示,当作用距离大于0.6m时,WiDance保持92%的一致高精度,当作用距离减小到0.5m时,精度略微下降到86%。然而,随着作用距离的进一步减小,精度急剧下降。与分析一致的是,较小的作用范围导致较短的作用时间和较小的作用速度,使得从频谱图中提取多普勒频移更加困难。
Impact of note interval In real dancing exergames, visual notes appear with various intervals. To evaluate the performance of WiDance with different note intervals, we conduct experiments with intervals from 1s to 3s. Note that most novices need repetitive practices to catch up with notes with interval less than 2s. For notes with about 1s interval, users have to perform actions promptly in natural amplitudes without stop to catch up with the notes. As shown in Figure 14, WiDance achieves the highest accuracy of 97% with an interval of 1.5s. When the note interval is set to 1s, the accuracy sharply decreases to 84%. The reason is that the interval is too short for players to complete each individual action. Instead, they struggle to catch up with the fast visual notes, and stride with their bodies in an uncontrolled way, which interferes the Doppler frequency shifts of interest. Conversely, when thetime interval is longer than 1.5s, the accuracy slightly decreases to about 92%. It is because with a larger interval, players are able to slowly stride during the interval, causing a smaller Doppler frequency shifts that may be interfered with noises.
音符间隔的影响 在真实的舞蹈练习游戏中,视觉音符以不同的间隔出现。为了评估不同音符间隔的WiDance的性能,我们进行了从1s到3s间隔的实验。请注意,大多数新手需要重复练习才能赶上间隔小于2s的音符。对于间隔大约为1秒的音符,用户必须以自然的振幅迅速地执行动作,而不能停下来跟上音符。如图14所示,WiDance以1.5s的间隔达到最高的97%的准确率,当音符间隔设置为1s时,准确率急剧下降到84%。原因是间隔时间太短,球员无法完成每个单独的动作。相反,他们努力跟上快速的视觉音符,并以不受控制的方式迈着身体大步,这干扰了感兴趣的多普勒频移。相反,当时间间隔长于1.5s时,精度会略微下降到92%左右。这是因为在较大的间隔时间内,球员可以在间隔时间内缓慢行走,导致较小的多普勒频移,这可能会受到噪声的干扰。
Impact of area size We evaluate WiDance in monitoring areas with sizes of 2m×2m, 3m×3m, 4m×4m. Figure 15 plots the performance of WiDance. As shown, even with an area of as large as 16m2, WiDance still achieves an accuracy of 90%. While the size of existing dancing pads is only about 1m×1m, WiDance enables a large exercising area for players. Players can perform macro actions in the area, which is more helpful for their fitness and health.
面积大小的影响 我们在尺寸为2米×2米、3米×3米、4米×4米的监测区域内对WiDance进行评估。图15描绘了WiDance的性能。如图所示,即使面积高达16m2,WiDance仍能达到90%的精度。虽然现有的跳舞垫的尺寸只有大约1米×1米,但WiDance为玩家提供了一个大的锻炼区域。玩家可以在区域内进行宏观动作,对自己的健身健康更有帮助。
Impact of transmission rates As WiDance requires packets transmission for sensing actions, which may occupy channels and interfere normal communication links, we evaluate the performance of WiDance with different transmission rates. Initially, we set the transmission rate to 1024Hz and decimate the CSI series to 512, 256, 128Hz. As shown in Figure 16, with decreasing of transmission rate, the performance of WiDance slightly degrades, as the high-frequency noises aliases with Doppler frequency of interest. However, WiDance still achieves acceptable performance at the transmission rate of 256Hz. Since only the CSIs of packets are used, WiDance can transmit even short packets (e.g. RTS/CTS) to further reduce the impact on the normal communication channel.
A side effect of using lower transmission rate is reduction of processing time cost. Figure 17 plots the per-second computation cost of each step in WiDance. As shown, the major time cost comes from generating spectrogram in the step of time-frequency analysis. By reducing the transmission rate by half, the data amount and thus the processing time reduce by half. With a practical transmission rate of 256Hz, the processing time for a single action is only 25ms, thus enabling real-time processing and reaction of WiDance.
传输速率的影响 由于WiDing需要数据包传输来进行感知操作,这可能会占用信道并干扰正常的通信链路,因此我们评估了WiDing在不同传输速率下的性能。最初,我们将传输速率设置为1024Hz,并将CSI系列抽取为512,256,128Hz。如图16所示,随着传输速率的降低,WiDance的性能略有下降,因为高频噪声与感兴趣的多普勒频率混叠。然而,在256Hz的传输速率下,WiDance仍然获得了可接受的性能。由于只使用数据包的CSI,所以WiDance甚至可以传输短数据包(例如RTS/CTS),以进一步减少对正常通信信道的影响。
使用较低传输速率的副作用是降低了处理时间成本。图17绘制了WiDance中每个步骤的每秒计算成本。如图所示,主要的时间成本来自于在时频分析步骤中生成频谱图。通过将传输速率降低一半,数据量以及处理时间将减少一半。实际传输速率为256Hz,单个动作的处理时间仅为25毫秒,因此可以实现WiDance的实时处理和反应。
Multiple moving objects Targeting single-player dancing,
WiDance is unfortunately vulnerable to movements nearby,since reflections from both dancer and intruder are superimposed at receiver. WiTrack [2] enables multiple human tracking by successive silhouette cancellation using FMCW signals. However, it relies on separation of distance and azimuth of multiple objects, which is not feasible for WiDance, in that the Doppler shift of the dancer is obfuscated by the similar shift of the intruder. Enabling recognition of multiple objects still remains an open and challenging problem in future.
多个移动物体 不幸的是,针对单人舞蹈,WiDance容易受到附近运动的影响,因为舞者和入侵者的反射都叠加在接收器上。WitTrack[2]通过使用FMCW信号的连续轮廓消除来实现多人跟踪。然而,它依赖于多个对象的距离和方位的分离,这对于跳舞者是不可行的,因为跳舞者的多普勒频移被入侵者的类似频移模糊。实现对多个对象的识别仍然是一个开放且具有挑战性的问题。
Dependency on particular hardware cards CSI is formulated in 802.11 standards for OFDM and MIMO operations. However, accessing CSI is limited to certain NICs with modified drivers (e.g. Intel 5300). An alternative to reduce specific devices is to configure a Wi-Fi device with Intel 5300 NIC as a hotspot and send ICMP packets to collect CSIs from multiple other normal Wi-Fi clients [16]. In this way, only one device with specific NIC is needed. Moreover, as CSI-based Wi-Fi sensing applications continue to explode and mature, we envision future NIC manufacturers will expose CSI to upper layers on most NICs in 3-5 years.
对特定硬件卡的依赖性 CSI是在802.11标准中为正交频分复用和多输入多输出操作制定的。但是,访问CSI仅限于某些驱动程序经过修改的网卡(例如英特尔5300)。减少特定设备的另一种方法是将带有英特尔5300网卡的无线设备配置为热点,并发送ICMP数据包以从多个其他普通无线客户端收集CSI[[16]。这样,只需要一个带有特定网卡的设备。此外,随着基于CSI的无线传感应用的不断发展和成熟,我们预计未来的网卡制造商将在3-5年内将CSI暴露在大多数网卡的上层。
Detection range While WiDance supports up to 4m×4m interaction range, achieving whole-home coverage is not an easy task. As the operation distance increases, (1) The reflected power attenuates exponentially while the interference from static signals and noises remains unchanged; (2) The field of directions with Doppler shifts over the sensitivity continuously narrows. Given the lowest SINR and sensitivity of an NIC, these two factors determines the maximum coverage of the system. Nevertheless, whole-home can still be achieved by deploying more systems in the area of interest, or by sensing actions with larger Doppler frequency shifts (e.g. walking).
检测范围 虽然WiDance支持高达4m×4m的交互范围,但实现全家庭覆盖并非易事。随着工作距离的增加,(1)反射功率呈指数衰减,而来自静态信号和噪声的干扰保持不变;(2)多普勒频移超过灵敏度的方向场不断变窄。考虑到网卡的最低SINR和灵敏度,这两个因素决定了系统的最大覆盖范围。然而,通过在感兴趣的区域部署更多的系统,或者通过感测具有更大多普勒频移的动作(例如,行走),仍然可以实现整个家庭。
Potential applications The core technology of WiDance is to derive motion directions in a device-free manner, which can be applied to various scenarios. In addition to games, it facilitates smart home applications such as remote device selection and control. For example, a user can select a lamp by moving his/her arm towards it, and he/she can control the volume of a speaker by pushing towards or polling away from it. It also benefits localization to eliminate ambiguity in walking directions [17], which enables a range of locationbased services such as emergency evacuation, virtual reality and activity tracking. We leave the study on these applications for future work.
潜在应用 WiDance的核心技术是以无设备的方式导出运动方向,可以应用于各种场景。除了游戏之外,它还有助于智能家居应用,如远程设备选择和控制。例如,用户可以通过向灯移动他/她的手臂来选择灯,并且他/她可以通过向灯靠近或远离它来控制扬声器的音量。它也有利于定位,以消除行走方向的模糊性[17],从而实现一系列基于位置的服务,如紧急疏散、虚拟现实和活动跟踪。我们把对这些应用的研究留给未来的工作。
Wireless Sensing Systems. As an alternative of computer vision in NLOS or dark environments, contactless sensing using wireless signals has attracted extensive interests in recognizing location [4, 3, 2, 25], body activities [22, 32, 12], and vital signs [5, 19, 29, 33]. These systems mainly adopt a radar principle by associating motions with physical measurements such as time-of-flight and Doppler effects, and enable finegrained and interpretable motion tracking using specialized hardware. For instance, mTrack [32] accurately locates and tracks finger movement using customized millimeter signals. WiSee [22] is the closest to our work, which extracts Doppler shifts in wide-band OFDM signals using USRPs. WiDance also recognizes motion directions by modeling and interpreting motion-induced Doppler effects. However, WiDance advancesthe state-of-the-art by extracting Doppler shifts on
commercial multi-antenna Wi-Fi devices without any modification.
无线传感系统 作为NLOS或黑暗环境中计算机视觉的替代方案,使用无线信号的非接触式传感在识别位置[4,3,2,25],身体活动[22,32,12]和生命体征[5,19,29,33]方面引起了广泛的兴趣。这些系统主要采用雷达原理,将运动与飞行时间和多普勒效应等物理测量联系起来,并使用专门的硬件实现细粒度和可解释的运动跟踪。例如,mTrack[[32]使用定制的毫米信号精确定位和跟踪手指运动。怀斯·[[22]是最接近我们的工作,它使用超低频脉冲提取宽带正交频分复用信号中的多普勒频移。韦恩斯·also通过模拟和解释运动引起的多普勒效应来识别运动方向。然而,WiDance通过在商用多天线Wi-Fi设备上提取多普勒频移而没有任何修改,从而提高了技术水平。
Wi-Fi-based Gesture Sensing Systems To bring gesture recognition to commodity Wi-Fi devices, both modeling [1,30] and pattern matching based principles [31, 28] have been adopted. E-eyes [31] exploits subcarriers of CSI to recognize household activities such as washing dishes and taking a shower. WiGest [1] maps changes in Wi-Fi RSSI into motion primitives, upon which a family of gestures are defined and accurately recognized for device interaction. WiDir [34] proposes to recognize human motion direction by calculating phase differences between CSI subcarriers. In contrast, WiDance directly extract Doppler frequency shift with multiple antennas available on commercial Wi-Fi devices, which may serves a wide range of sensing applications than human motion direction. CARM [30] extracts speed-related features from CSI and proposes an effective machine learning framework for CSI-based activity recognition. WiDance is built upon this trend of research, and makes one step further by modeling motion directions and extracting the corresoponding Doppler features from noisy CSI, enabling contactless dancing exergame without the need of prior machine learning.
基于无线网络的手势传感系统 为了将手势识别引入商用无线设备,已经采用了建模[1,30]和基于模式匹配的原理[31,28]。E-eyes [31]将CSI的子载波用于识别的家庭活动,如洗碗和洗澡。威格特[1]将无线RSSI中的变化映射为运动原语,在此基础上定义并准确识别一系列手势,用于设备交互。WiDir [34]提出通过计算CSI子载波之间的相位差来识别人体运动方向。相比之下,WiDance可以通过商用Wi-Fi设备上的多根天线直接提取多普勒频移,这可能适用于比人体运动方向更广泛的传感应用。CARM [30]从犯罪现场调查中提取与速度相关的特征,并为基于犯罪现场调查的活动识别提出了一个有效的机器学习框架。WiDance就是建立在这一研究趋势之上的,它通过对运动方向进行建模,并从有噪声的CSI中提取相应的多普勒特征,使无接触舞蹈练习游戏成为可能,而不需要预先的机器学习。
Interfaces for Exergames Exergame enables physical interaction with users for exercise benefits [26, 10]. Mainstream exergame interfaces are based on either computer vision [24,11] or controller embedded with sensors [14, 7]. Kinect Sports and Wii Fit are the leading gaming consoles for indoor exergames. ACW [13] combines computer vision and interactive projected graphics for motivating and instructing indoor wall climbing. RetroFab [23] leverages 3D printing technique to agilely adapt physical controllers to arbitrary use. In contrast, wireless sensing with off-the-shelf devices complements shortages of vision-based and sensor-based sensing. We develop a new wireless based exergame interface that tracks body movements and reactions through Doppler effect, and validate the efficiency of the interface by prototyping a dancing game on it. While preliminary, we believe that WiDance opens up a new direction for design and development of exergame interfaces.
健身游戏的界面 健身游戏使用户能够通过身体互动获得锻炼益处[26,10]。主流的健身游戏界面基于计算机视觉[24,11]或嵌入传感器的控制器[14,7]。Kinect Sports和Wii Fit是室内健身游戏的领先游戏机。ACW [13]结合了计算机视觉和交互式投影图形,用于激励和指导室内攀岩。追溯[23]利用3D打印技术灵活地调整物理控制器以适应任意用途。相比之下,使用现成设备的无线传感弥补了基于视觉和基于传感器的传感的不足。我们开发了一个新的基于无线的健身游戏界面,通过多普勒效应跟踪身体运动和反应,并通过在其上制作一个跳舞游戏的原型来验证界面的效率。虽然是初步的,但我们相信WiDance为游戏界面的设计和开发开辟了一个新的方向。
In this paper, we propose WiDance, a Wi-Fi-based user interface for contactless dance-pad exergame. First, we design a novel algorithm to extract motion-induced Doppler shifts leveraging antenna diversity on commodity Wi-Fi devices.Then, we model the relation between Doppler shifts and motion directions, and propose a light-weight yet effective signal processing pipeline to translate the model into the interactive dancing exergame. Extensive experimental results show that WiDance achieves an overall recognition accuracy of 92% in various indoor environments. Requiring no hardware modifi-cations, WiDance is envisioned as a promising step towards
practical wireless human-computer interface, which underpins new insights for future wireless sensing applications.
在这篇文章中,我们提出了一种基于无线网络的非接触式跳舞机用户界面。首先,我们设计了一种新的算法,利用商用无线设备上的天线分集来提取运动引起的多普勒频移。然后,我们对多普勒频移和运动方向之间的关系进行建模,并提出了一种重量轻但有效的信号处理流水线来将该模型转换成交互式舞蹈练习游戏。大量实验结果表明,在各种室内环境下,WiDance的整体识别准确率达到92%。WiDance不需要硬件修改,被认为是走向实用无线人机界面的一个有前途的步骤,这为未来的无线传感应用提供了新的见解。