论文：Lexical representation explains cortical entrainment during speech comprehension, Stefan L. Frank, Jinbiao Yang, 2018

一、研究背景

1. Debate on the precise role of hierarchical syntactic structure during sentence comprehension

观点1: full hierarchical analysis is part and parcel of the comprehension process （基于规则）
观点2：more shallow or even non-hierarchical processing is common（基于统计）

2. The Ding et al. study: Cortical tracking of hierarchical linguistic structures in connected speech

Participant in the Ding et al. study listened to a continuous stream of Chinese or English syllables, presented at a fixed rate of 250 ms per syllable.（不含音韵信息）
Depending on the experimental condition, each four-unit subsequence contained linguistic units at one or two hierarchically higher levels, or lacked meaningful structure beyond the syllable. 这些序列很难通过前文预测下一个单词，不含单词的转移概率信息。
The MEG signal recorded during speech perception was subjected to a Fourier analysis to obtain a power spectrum, which showed peaks at exactly the presentation frequencies of the three levels of linguistic structure:
- words (or Chinese syllables) at 4 Hz
- phrases (or Chinese words) at 2 Hz
- sentences at 1 Hz

结论：由于MEG信号分解得到的显著频率跟听觉刺激序列的各个语法类（单词、短语、句子）出现的频率一致，因此Ding他们认为在语言理解中，大脑在追踪各个层级的语法结构，从而支持了观点1：分层语法处理在语言理解中是必要的。

二、Motivation

Goal: To investigate if the results could be explained without recourse to syntactic processing.

Method: 用计算模型来模拟Ding他们的实验，而且这个模型的输入就是分布式词向量，是序列的，不含语法结构信息。

Logic：如果上面构建的模型能够得到跟Ding的实验相同的结果（能量谱），那么就说明了在语言处理中，使用非层级处理也是有可能的，Ding的结论并不一定成立。由于只含有lexical representation的模型更加简单（更少的抽象概念），根据奥坎姆剃刀定律，我们应该支持简单的模型而非复杂的模型，除非复杂模型的正确性得到非常充分的证明。

三、实验条件（听觉刺激序列）

本实验的模型测试数据直接取自于Ding的听觉刺激材料，具体分为5种实验条件：

英文[NP VP]结构的四字序列：60个，每个字均为单音节
例如：fat rats sensed fear, kind words warm hearts
语言结构：单词、短语、句子
中文[N V(P)]结构的四音节序列：50个，其中N为2音节，V(P）为2音节
例如：老牛耕地lǎoníu gēngdì、朋友请客péngyǒu qǐngkè
语言结构：音节、单词/短语、句子
中文[V(P)]结构的二音节序列：50个，从②的序列取出其V(P)部分得到
例如：耕地gēngdì、请客qǐngkè
语言结构：音节、单词/短语
中文乱序结构四音节序列：50个，从②的序列中按音节位置随机抽取生成新序列，新生成的序列不具备单词/短语或句子结构
例如：bīng yǒu kàn zhāng 冰友看张
取自冰雪融化、朋友请客、外公看报、公司开张
语言结构：音节
中文[V N(P)]结构的四音节序列：50个，其中V为单音节，N(P)为3音节
例如：蒸灌汤包zhēng guàntāngbāo
语言结构：音节、单词/短语、句子

四、研究方法和思路(Model）

总体思路：使用高维词向量构建时间序列模拟MEG信号，利用傅里叶分析方法分析信号的频率组成成分及其强度，得到与Ding非常类似的能量谱，说明只使用lexical representation足以解释Ding的entrainment results.

1. 使用Skipgram模型大规模语料上训练分布式词向量

利用大规模英文语料、中文语料（使用jieba分词，再转换为拼音）分别训练12个Skipgram模型（模拟12个被试），其中词向量维度N在每个模型中会有不同，N ~ 。

2. 参照Cohort模型表示音节向量（针对中文场景下的多音节单词）

由于cortical entrainment是建立在音节的层面上的，因此每个音节位置都要有一个vector representation. 但是前面训练得到的分布式词向量是针对单词而非音节的，因此需要做一些处理，这里参考了语音识别中的Cohort model的思路：

The syllable sequence (possibly containing only one syllable) activates all words that begin with that sequence; this set of words is called the cohort.
The vector at syllable position , representing the sequence , equals the average vector of words in the cohort, weighted by the words’ corpus frequencies.

举例来说：
xīhóngshì（西红柿）是一个词，对应只有一个词向量，而通过Cohort model可以得到3个音节位置的向量表示，具体如下：

音节位置	序列	Cohort	音节向量
1	xī	xīfāng、xībù、xīshū、xīhóngshì....	对Cohort按词频做加权平均
2	xīhóng	xīhóngshì、...	对Cohort按词频做加权平均
3	xīhóngshì	xīhóngshì	直接取原word vector

3. 将词/音节向量转换成时间序列，模拟人脑对听觉刺激的神经激活MEG信号

仿照Ding的实验设置，每隔250ms输入一个音节(1秒4个音节）。假设词汇信息不是立刻出现(t=0ms)，而是有一点延迟，比如. 的取值是对每个词/音节随机采样的， ~ Uniform(15, 65).
令为表示当前英文单词或者中文音节的N维词向量。The available lexical information at t milliseconds after word onset (for 0 t 250) is represented by a column vector with

where denotes Gaussian noise with mean 0 and standard deviation σ = 0.5.

4. 对时间序列的lexical information进行离散化和矩阵拼接

离散化：对按照5ms间隔进行采样（采样频率200Hz），那么每一个音节（250ms）对应为一个N行50列的矩阵。
矩阵拼接：把同一种实验条件刺激序列横向拼接，得到一个N行12000列的矩阵（以英文为例）。横向代表时间轴，模拟的是MEG采样信号，纵向代表词向量的不同维度。

5. 利用离散傅立叶变换将时域数据转换到频域，分析其频率组成

对的每一行做傅立叶变换转换到频域，计算其能量谱：很显然这个时间序列的数据是离散非周期的，通过离散时间傅立叶变换，可以得到它的频谱构成，这个频谱是连续且周期的，进一步对频率采样并取它的一个主周期，得到最终的频谱分布，横轴为频率，纵轴为幅值，对幅值取平方，得到其能量谱分布（由帕塞瓦尔定理保证）。
把上面计算的能量谱在词向量的N个维度上取平均得到每个频率区间的能量（中文的频率格子宽度为1/9 Hz, 英文1/11 Hz)。
仿照Ding的做法，使用带false discovery rate correction的单侧t检验来看看在每个频率区间内的power是否显著超出其周围两个格子的平均power.

五、结果对比分析

Power spectra from human MEG signal(left) and corresponding model predictions(right) in all five conditions

论文阅读笔记：Lexical representation explains cortical entrainment during speech comprehension_第3张图片

忽略量级的话，人类实验的结果和模型的结果非常相似。
在第二个实验条件和第五个实验条件下差异极有可能是由谐波造成，不具有实际意义。
在第四个实验条件下（中文乱序），不存在句子结构，按理不应在1Hz上出现峰值，但模型在1Hz达到了显著（尽管非常微弱），而且显然这个1Hz不可能是谐波，所以就有点奇怪，作者说造成这种结果的原因可能是（但是我觉得他没有解释清楚，比较牵强，因为这里有两个因素混合在里面）：

A possible explanation for the difference between model prediction and human data is that the model treats the shuffled syllable sequence and grammatical sentences exactly the same, in that at each point all possible word candidates are selected (see under ‘Representing incomplete words’ in the Methods section). In contrast, human participants are likely to forgo pro-active word activation when listening to non-word sequences, even though their task in this condition was to detect the occasional correct sentence.

六、结论与讨论

结论：只用时间序列的词向量就能得到非常相似的能量谱，这是一个简约的模型，根据奥坎姆剃刀定律，请支持我们。

讨论：
1. Why lexical representation?

在心理语言学领域，词向量已被证明跟单个词理解和叙事性阅读的神经活动的解码有关；在书面和口头语言理解中，词向量间的距离可以预测大脑的神经激活。
分布式词向量在很多计算语言学任务中取得了state of the art的结果。
词向量是基于统计的，虽然它表面是序列的，但实际蕴含了语法结构信息：在相似的上下文出现的词有相似的词向量，词向量的相似反映了语意、语法的相似性。（统计学派与规则学派的斗争）

2. What is the origin of the predicted power peaks at the presentation rates of phrases and sentences?

Words that share more syntactic/ semantic properties are encoded by more similar vectors. Consequently, if certain lexical properties occur at a fixed rate in the stimulus sequence, this will be reflected as a recurring approximate numerical pattern in the model’s time-series of vectors.

3. When, and to what extent, does hierarchical structure play a role in cognitive processing at the level of syntax?

We have shown here that the cortical entrainment results DMTP present as evidence for hierarchical processing are also generated by a time-series of word representations. Our point is that hierarchical processing may be less important than is traditionally assumed because simpler, sequential strategies are often available.

七、值得借鉴的地方

对不同的受试者，使用的词向量维度是不一样的。
对中文语料用拼音来模仿人的听觉，而非文字。
用Cohort model的思想表示单个音节。
Lexical information over time的函数构建（理解在听力后延迟发生+高斯白噪声all the time)。
用傅里叶变换将时间序列数据转换到频域进行分析。

论文阅读笔记：Lexical representation explains cortical entrainment during speech comprehension

目录：