吴家行hang

研读pytorch版本的BERT分类代码

1 首先加载了tokenizer

就是有个vocab.txt文件，里面每行是个token，比如：

abc
bcd
吴家行

然后Tokenizer这个类中有几个属性：

vocab

是个字典，也就是将上面的vocab.txt文件变成如下的形式：
```
{
    "abc": 0,
    "bcd": 1,
    "吴家行": 2,
}
```
ids_to_tokens

调换vocab中token和id的位置，变成如下的形式：
```
{
    0: "abc",
    1: "bcd",
    2: "吴家行",
}
```
basic_tokenizer
wordpiece_tokenizer
max_len

2 然后加载预训练好的BERT模型

把结构（各种size）和预训练好的参数（也就是各个模型中的weight和bias）都加载进来

2.1 加载配置

我的理解是模型中规定的超参数，是json格式，形式如下：

{
  "attention_probs_dropout_prob": 0.1, 
  "directionality": "bidi", 
  "hidden_act": "gelu", 
  "hidden_dropout_prob": 0.1, 
  "hidden_size": 768, 
  "initializer_range": 0.02, 
  "intermediate_size": 3072, 
  "max_position_embeddings": 512, 
  "num_attention_heads": 12, 
  "num_hidden_layers": 12, 
  "pooler_fc_size": 768, 
  "pooler_num_attention_heads": 12, 
  "pooler_num_fc_layers": 3, 
  "pooler_size_per_head": 128, 
  "pooler_type": "first_token_transform", 
  "type_vocab_size": 2, 
  "vocab_size": 21128
}

2.2 加载权重

我的理解是模型训练好的参数，是二进制的文件，解析出是dict，我大致把解析的keys输出一下：

bert.embeddings.word_embeddings.weight
bert.embeddings.position_embeddings.weight
bert.embeddings.token_type_embeddings.weight
bert.embeddings.LayerNorm.weight
bert.embeddings.LayerNorm.bias
bert.encoder.layer.0.attention.self.query.weight
bert.encoder.layer.0.attention.self.query.bias
bert.encoder.layer.0.attention.self.key.weight
bert.encoder.layer.0.attention.self.key.bias
bert.encoder.layer.0.attention.self.value.weight
bert.encoder.layer.0.attention.self.value.bias
bert.encoder.layer.0.attention.output.dense.weight
bert.encoder.layer.0.attention.output.dense.bias
bert.encoder.layer.0.attention.output.LayerNorm.weight
bert.encoder.layer.0.attention.output.LayerNorm.bias
bert.encoder.layer.0.intermediate.dense.weight
bert.encoder.layer.0.intermediate.dense.bias
bert.encoder.layer.0.output.dense.weight
bert.encoder.layer.0.output.dense.bias
bert.encoder.layer.0.output.LayerNorm.weight
bert.encoder.layer.0.output.LayerNorm.bias
bert.encoder.layer.1.attention.self.query.weight
bert.encoder.layer.1.attention.self.query.bias
bert.encoder.layer.1.attention.self.key.weight
bert.encoder.layer.1.attention.self.key.bias
bert.encoder.layer.1.attention.self.value.weight
bert.encoder.layer.1.attention.self.value.bias
bert.encoder.layer.1.attention.output.dense.weight
bert.encoder.layer.1.attention.output.dense.bias
bert.encoder.layer.1.attention.output.LayerNorm.weight
bert.encoder.layer.1.attention.output.LayerNorm.bias
bert.encoder.layer.1.intermediate.dense.weight
bert.encoder.layer.1.intermediate.dense.bias
bert.encoder.layer.1.output.dense.weight
bert.encoder.layer.1.output.dense.bias
bert.encoder.layer.1.output.LayerNorm.weight
bert.encoder.layer.1.output.LayerNorm.bias
bert.encoder.layer.2.attention.self.query.weight
bert.encoder.layer.2.attention.self.query.bias
bert.encoder.layer.2.attention.self.key.weight
bert.encoder.layer.2.attention.self.key.bias
bert.encoder.layer.2.attention.self.value.weight
bert.encoder.layer.2.attention.self.value.bias
bert.encoder.layer.2.attention.output.dense.weight
bert.encoder.layer.2.attention.output.dense.bias
bert.encoder.layer.2.attention.output.LayerNorm.weight
bert.encoder.layer.2.attention.output.LayerNorm.bias
bert.encoder.layer.2.intermediate.dense.weight
bert.encoder.layer.2.intermediate.dense.bias
bert.encoder.layer.2.output.dense.weight
bert.encoder.layer.2.output.dense.bias
bert.encoder.layer.2.output.LayerNorm.weight
bert.encoder.layer.2.output.LayerNorm.bias
bert.encoder.layer.3.attention.self.query.weight
bert.encoder.layer.3.attention.self.query.bias
bert.encoder.layer.3.attention.self.key.weight
bert.encoder.layer.3.attention.self.key.bias
bert.encoder.layer.3.attention.self.value.weight
bert.encoder.layer.3.attention.self.value.bias
bert.encoder.layer.3.attention.output.dense.weight
bert.encoder.layer.3.attention.output.dense.bias
bert.encoder.layer.3.attention.output.LayerNorm.weight
bert.encoder.layer.3.attention.output.LayerNorm.bias
bert.encoder.layer.3.intermediate.dense.weight
bert.encoder.layer.3.intermediate.dense.bias
bert.encoder.layer.3.output.dense.weight
bert.encoder.layer.3.output.dense.bias
bert.encoder.layer.3.output.LayerNorm.weight
bert.encoder.layer.3.output.LayerNorm.bias
bert.encoder.layer.4.attention.self.query.weight
bert.encoder.layer.4.attention.self.query.bias
bert.encoder.layer.4.attention.self.key.weight
bert.encoder.layer.4.attention.self.key.bias
bert.encoder.layer.4.attention.self.value.weight
bert.encoder.layer.4.attention.self.value.bias
bert.encoder.layer.4.attention.output.dense.weight
bert.encoder.layer.4.attention.output.dense.bias
bert.encoder.layer.4.attention.output.LayerNorm.weight
bert.encoder.layer.4.attention.output.LayerNorm.bias
bert.encoder.layer.4.intermediate.dense.weight
bert.encoder.layer.4.intermediate.dense.bias
bert.encoder.layer.4.output.dense.weight
bert.encoder.layer.4.output.dense.bias
bert.encoder.layer.4.output.LayerNorm.weight
bert.encoder.layer.4.output.LayerNorm.bias
bert.encoder.layer.5.attention.self.query.weight
bert.encoder.layer.5.attention.self.query.bias
bert.encoder.layer.5.attention.self.key.weight
bert.encoder.layer.5.attention.self.key.bias
bert.encoder.layer.5.attention.self.value.weight
bert.encoder.layer.5.attention.self.value.bias
bert.encoder.layer.5.attention.output.dense.weight
bert.encoder.layer.5.attention.output.dense.bias
bert.encoder.layer.5.attention.output.LayerNorm.weight
bert.encoder.layer.5.attention.output.LayerNorm.bias
bert.encoder.layer.5.intermediate.dense.weight
bert.encoder.layer.5.intermediate.dense.bias
bert.encoder.layer.5.output.dense.weight
bert.encoder.layer.5.output.dense.bias
bert.encoder.layer.5.output.LayerNorm.weight
bert.encoder.layer.5.output.LayerNorm.bias
bert.encoder.layer.6.attention.self.query.weight
bert.encoder.layer.6.attention.self.query.bias
bert.encoder.layer.6.attention.self.key.weight
bert.encoder.layer.6.attention.self.key.bias
bert.encoder.layer.6.attention.self.value.weight
bert.encoder.layer.6.attention.self.value.bias
bert.encoder.layer.6.attention.output.dense.weight
bert.encoder.layer.6.attention.output.dense.bias
bert.encoder.layer.6.attention.output.LayerNorm.weight
bert.encoder.layer.6.attention.output.LayerNorm.bias
bert.encoder.layer.6.intermediate.dense.weight
bert.encoder.layer.6.intermediate.dense.bias
bert.encoder.layer.6.output.dense.weight
bert.encoder.layer.6.output.dense.bias
bert.encoder.layer.6.output.LayerNorm.weight
bert.encoder.layer.6.output.LayerNorm.bias
bert.encoder.layer.7.attention.self.query.weight
bert.encoder.layer.7.attention.self.query.bias
bert.encoder.layer.7.attention.self.key.weight
bert.encoder.layer.7.attention.self.key.bias
bert.encoder.layer.7.attention.self.value.weight
bert.encoder.layer.7.attention.self.value.bias
bert.encoder.layer.7.attention.output.dense.weight
bert.encoder.layer.7.attention.output.dense.bias
bert.encoder.layer.7.attention.output.LayerNorm.weight
bert.encoder.layer.7.attention.output.LayerNorm.bias
bert.encoder.layer.7.intermediate.dense.weight
bert.encoder.layer.7.intermediate.dense.bias
bert.encoder.layer.7.output.dense.weight
bert.encoder.layer.7.output.dense.bias
bert.encoder.layer.7.output.LayerNorm.weight
bert.encoder.layer.7.output.LayerNorm.bias
bert.encoder.layer.8.attention.self.query.weight
bert.encoder.layer.8.attention.self.query.bias
bert.encoder.layer.8.attention.self.key.weight
bert.encoder.layer.8.attention.self.key.bias
bert.encoder.layer.8.attention.self.value.weight
bert.encoder.layer.8.attention.self.value.bias
bert.encoder.layer.8.attention.output.dense.weight
bert.encoder.layer.8.attention.output.dense.bias
bert.encoder.layer.8.attention.output.LayerNorm.weight
bert.encoder.layer.8.attention.output.LayerNorm.bias
bert.encoder.layer.8.intermediate.dense.weight
bert.encoder.layer.8.intermediate.dense.bias
bert.encoder.layer.8.output.dense.weight
bert.encoder.layer.8.output.dense.bias
bert.encoder.layer.8.output.LayerNorm.weight
bert.encoder.layer.8.output.LayerNorm.bias
bert.encoder.layer.9.attention.self.query.weight
bert.encoder.layer.9.attention.self.query.bias
bert.encoder.layer.9.attention.self.key.weight
bert.encoder.layer.9.attention.self.key.bias
bert.encoder.layer.9.attention.self.value.weight
bert.encoder.layer.9.attention.self.value.bias
bert.encoder.layer.9.attention.output.dense.weight
bert.encoder.layer.9.attention.output.dense.bias
bert.encoder.layer.9.attention.output.LayerNorm.weight
bert.encoder.layer.9.attention.output.LayerNorm.bias
bert.encoder.layer.9.intermediate.dense.weight
bert.encoder.layer.9.intermediate.dense.bias
bert.encoder.layer.9.output.dense.weight
bert.encoder.layer.9.output.dense.bias
bert.encoder.layer.9.output.LayerNorm.weight
bert.encoder.layer.9.output.LayerNorm.bias
bert.encoder.layer.10.attention.self.query.weight
bert.encoder.layer.10.attention.self.query.bias
bert.encoder.layer.10.attention.self.key.weight
bert.encoder.layer.10.attention.self.key.bias
bert.encoder.layer.10.attention.self.value.weight
bert.encoder.layer.10.attention.self.value.bias
bert.encoder.layer.10.attention.output.dense.weight
bert.encoder.layer.10.attention.output.dense.bias
bert.encoder.layer.10.attention.output.LayerNorm.weight
bert.encoder.layer.10.attention.output.LayerNorm.bias
bert.encoder.layer.10.intermediate.dense.weight
bert.encoder.layer.10.intermediate.dense.bias
bert.encoder.layer.10.output.dense.weight
bert.encoder.layer.10.output.dense.bias
bert.encoder.layer.10.output.LayerNorm.weight
bert.encoder.layer.10.output.LayerNorm.bias
bert.encoder.layer.11.attention.self.query.weight
bert.encoder.layer.11.attention.self.query.bias
bert.encoder.layer.11.attention.self.key.weight
bert.encoder.layer.11.attention.self.key.bias
bert.encoder.layer.11.attention.self.value.weight
bert.encoder.layer.11.attention.self.value.bias
bert.encoder.layer.11.attention.output.dense.weight
bert.encoder.layer.11.attention.output.dense.bias
bert.encoder.layer.11.attention.output.LayerNorm.weight
bert.encoder.layer.11.attention.output.LayerNorm.bias
bert.encoder.layer.11.intermediate.dense.weight
bert.encoder.layer.11.intermediate.dense.bias
bert.encoder.layer.11.output.dense.weight
bert.encoder.layer.11.output.dense.bias
bert.encoder.layer.11.output.LayerNorm.weight
bert.encoder.layer.11.output.LayerNorm.bias
bert.pooler.dense.weight
bert.pooler.dense.bias
classifier.weight
classifier.bias

看样子预训练模型大致是由:

word_embeddings
position_embeddings
token_type_embeddings
LayerNorm
encoder(12个)
- attention
  - query
  - key
  - value
- intermediate
- output
pooler
classifier

这几个层构成的。

3 设置优化器 BertAdam optimizer

BertAdam

用的是一种比Adam更新的一种优化算法，看样子是加入了正则化项，weight_decay，（待研究）…

这里是论文链接：DECOUPLED WEIGHT DECAY REGULARIZATIO

4 训练

4.1 将样本转化为特征表示

每个example是一个对象，由四个属性构成：guid，text_a，text_b，label
train_examples由好多example构成的列表，我们要将train_examples转化为特征表示。

先用basic_tokenizer分词产生token，然后在对token用wordpiece_tokenizer产生sub_token，其实大部分情况token和sub_token是一样的，然而也有如下的情况：

token: あすみ
sun_token: ['あ', '##す', '##み']

token: clucl
sun_token: ['cl', '##uc', '##l']

token: ５０００
sun_token: ['５０', '##００']
...

处理两个句子，两个句子最多有max_seq_length-3（因为还有[CLS], [SEP], [SEP]）个token，多的会被裁掉。

经过这样的处理后会将example中的两个句子变成如下的样子：(tokens_a具体怎么来的还需要再看看)

tokens_a: ['喜', '欢', '打', '篮', '球', '的', '男', '生', '喜', '欢', '什', '么', '样', '的', '女', '生']
tokens_b: ['爱', '打', '篮', '球', '的', '男', '生', '喜', '欢', '什', '么', '样', '的', '女', '生']

然后再将两个句子合起来，并附上segment id，segment id是为了区分这两句话的，前者都是0，后者都是1:

tokens: ['[CLS]', '喜', '欢', '打', '篮', '球', '的', '男', '生', '喜', '欢', '什', '么', '样', '的', '女', '生', '[SEP]', '爱', '打', '篮', '球', '的', '男', '生', '喜', '欢', '什', '么', '样', '的', '女', '生', '[SEP]']

segment_ids: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

然后利用前面的vocab字典将这些token转化成id:

input_ids: [101, 1599, 3614, 2802, 5074, 4413, 4638, 4511, 4495, 1599, 3614, 784, 720, 3416, 4638, 1957, 4495, 102, 4263, 2802, 5074, 4413, 4638, 4511, 4495, 1599, 3614, 784, 720, 3416, 4638, 1957, 4495, 102]

然后对input_ids，input_mask，segment_ids补齐，用0补到max_seq_length。

input_mask: 有效token为1，补齐的为0

还有label_id，因为是2分类问题，也就是0或1，有个字典里面存的就是label和id的对应关系，如下：

{
    "0": 0,
    "1": 1,
}

以上，就转成了特征表示：

每个特征有以下几个元素：

input_ids
input_mask
segment_ids
label_id

4.2 加载特征进行训练

已经定义损失函数和优化器，损失函数也就是目标函数，loss.backward()用于计算梯度，优化器会利用optimizer.step()来进行梯度下降等优化参数的过程，optimizer.zero_grad()用于将梯度清空，用于下一次计算。

在模型中一般讨论先在__init__()函数中定义model，比如：

self.bert = BertModel(config)

然后通过直接调用结构方法来进行forward()方法的调用，比如：

_, pooled_output = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)

上面这句话其实在执行BertModel中的forward()方法。

然后将input_ids, segment_ids, input_mask喂到模型中，下面就要扣模型的结构了

4.3 模型具体结构

input_ids, token_type_ids, attention_mask都会批处理成Size为[20, 128]张量的形式，20是一批的数据量，128是每个的最大长度，之前已经说过。

BertForSequenceClassification
- bert(BertModel)
  
  在这里面attention_mask会被拓展成extended_attention_mask，size变成[20, 1, 1, 128]，用来表示[batch_size, num_heads, from_seq_length, to_seq_length]
  
  attention_mask.unsqueeze(0)的作用就是在第0个位置添加一个维度，attention_mask.unsqueeze(2)的作用就是在第2个位置添加一个维度
  
  然后extended_attention_mask中原来是0的地方变成-10000，是1的地方变成0
  - embeddings(BertEmbeddings)
    
    input_ids, token_type_ids被送入到embedding层中
    
    这个层还有position_ids，position_ids就是将每个位置从0到max_length编号，Size和input_ids的Size相同，形式如下：
```
tensor([[  0,   1,   2,  ..., 125, 126, 127],
        [  0,   1,   2,  ..., 125, 126, 127],
        [  0,   1,   2,  ..., 125, 126, 127],
        ...,
        [  0,   1,   2,  ..., 125, 126, 127],
        [  0,   1,   2,  ..., 125, 126, 127],
        [  0,   1,   2,  ..., 125, 126, 127]], device='cuda:0')
```
    然后分别将input_ids,position_ids, token_type_ids分别喂到相应的word_embeddings，position_embeddings，token_type_embeddings层中，生成相应的词向量，然后对这三个词向量加和得到embeddings
    - 注意：embeddings的size从token_id的[20,128]已经变成了[20,128,768], 因为每个token都可以有768个维度的特征，用768维的向量表示，这也是为什么后面Attention部分的in_features=768
    输入到LayerNorm层中。
    
    LayerNorm层输出为：
    
    $w\times \frac{embeddings-\overline{embeddings}}{\sqrt{\overline{{(embeddings- \overline{embeddings})}^2}+\epsilon}} +b$
  - encoder(BertEncoder)
    
    "num_hidden_layers"是12，也就是说有12个这个样的层：
```
 BertLayer(
    (attention): BertAttention(
      (self): BertSelfAttention(
        (query): Linear(in_features=768, out_features=768, bias=True)
        (key): Linear(in_features=768, out_features=768, bias=True)
        (value): Linear(in_features=768, out_features=768, bias=True)
        (dropout): Dropout(p=0.1)
      )
      (output): BertSelfOutput(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (LayerNorm): BertLayerNorm()(求和归一)
        (dropout): Dropout(p=0.1)
      )
    )
    (intermediate): BertIntermediate(
      (dense): Linear(in_features=768, out_features=3072, bias=True)
    )
    (output): BertOutput(
      (dense): Linear(in_features=3072, out_features=768, bias=True)
      (LayerNorm): BertLayerNorm()
      (dropout): Dropout(p=0.1)
    )
  )
```
    - BertAttention
    - BertIntermediate
      
      size通过全连接层由[20,128,768]变成了[20,128,3072]
      
      gelu激活
    - BertOutput
      
      size通过BertOutput层由[20,128,3072]变成了[20,128,768]
  - pooler(BertPooler)
    
    size为[20,128,768]的数据，只取第一个token的embedding,所以变成了[20,768]
    
    tanh激活
dropout(Dropout)

在机器学习的模型中，如果模型的参数太多，而训练样本又太少，训练出来的模型很容易产生过拟合的现象。在训练神经网络的时候经常会遇到过拟合的问题，过拟合具体表现在：模型在训练数据上损失函数较小，预测准确率较高；但是在测试数据上损失函数比较大，预测准确率较低。

Dropout可以比较有效的缓解过拟合的发生，在一定程度上达到正则化的效果。

Dropout说的简单一点就是：我们在前向传播的时候，让某个神经元的激活值以一定的概率p停止工作，也就是让原来的某个输出以p的概率变成0，此外其他的数字会变成原来的 $\frac{1}{1-p}$ ，这样可以使模型泛化性更强，因为它不会太依赖某些局部的特征。
```
>>> import torch
>>> from torch inport nn
>>> input = torch.range(1,10,1)
>>> input
tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])
>>> output=m(input)
>>> output
tensor([ 1.2500,  0.0000,  3.7500,  5.0000,  6.2500,  7.5000,  8.7500, 10.0000,
        11.2500, 12.5000])
```
classifier(Linear)

size又通过Pooler层由[20,768]变成了[20,2]，得到20个样本的分类结果

你可能感兴趣的:(BERT-pytorch)

BERT: Bidirectional Encoder Representations from Transformers双向Transformer用于语言模型 NAACL 2018 BUAA～冬之恋论文阅读笔记
论文链接：https://arxiv.org/abs/1810.04805tensorflow版本代码链接：https://github.com/google-research/bertpytorch版本代码链接：https://github.com/codertimo/BERT-pytorch导读这篇论文由谷歌团队发表于2018年的NAACL上《BERT:Pre-trainingofDeepBi
BERT-pytorch源码实现，解决内存溢出问题 Mr Gao 人工智能 python bert pytorch 人工智能
BERT-pytorch源码实现，解决内存溢出问题相信大家很多人都在做BERT这个模型，但是，有些人可能就是直接从transfermer这个模型里直接导入数据，但是这种方法不方便我们修改模型，于是有些人就通过pytorch详细实现了BERT,但是博主发现，这些详细实现BERT的代码出现了内存溢出问题，博主就做了改进，下面代码，我们可以解决掉内存溢出问题，主要还是因为中间结果并没有完全释放，代码如下
从0开始训练bert 爬行程序猿 NLP自然语言处理自然语言处理 bert
之前一直对bert预训练结果是怎么来的感兴趣，今天参考了下github上的代码，跑了一边终于知道是怎么来的了，在这分享下。代码地址：codertimo/BERT-pytorch:GoogleAI2018BERTpytorchimplementation(github.com)https://github.com/codertimo/BERT-pytorchbert模型训练的任务：在bert模型中，
bert-pytorch版源码详细解读 IGV丶明非 NLP BERT 自然语言处理 bert pytorch
前言bert作为当下最火的NLP模型（或者说该类型的模型，包括AlBert，XLNet等）。对于志在NLP的同学，有必要对其原理和代码都进行比较深入的了解。废话不多说，进入正题。PS：1.这里的代码有些参数传入是阉割过的，而且代码版本也是比较老版的，但更容易理解，更详细的还是参考：https://huggingface.co/transformers/2.关键的注解都在代码的注释里。主要代码1.主
BERT代码逐行逐句详解版（pytorch版本） Black_And_Black 正式开始炼丹 pytorch bert 深度学习
最近入门BERT，在网上观看了一些网课视频理解了原理，并且找到了pytorch版本的源码，经过一遍阅读有了初步的认知，所以在此记录，温故而知新。本文所解读的源码链接为：https://github.com/daiwk/BERT-pytorch/tree/master/bert_pytorch其整体代码框架如下（有些部分我也略有改动，但整体不影响）：解读一个项目的代码，自然要从main开始，所以我们
NLP学习之：Bert 模型复现（1）任务分析 + 训练数据集构造暖仔会飞机器学习与深度学习自然语言处理学习 bert
文章目录代码资源原理学习任务代码讲解代码重写说明代码资源Bert-pytorch原理学习任务Bert本质上是Transformer的Encoder端，Bert在预训练时最基本的任务就是：判断输入的两个句子是否真的相邻预测被[MASK]掉的单词通过这两种任务的约束，可以让Bert真正学到：上下句子之间的语义关系的关联关系，一个句子中不同单词之间的上下文关系所以通过BERT在大量文本中有针对的学习之后
笔记：ubuntu 环境下将深度学习(BERT-pytorch)模型调到服务器上运行小杰_a little student 笔记
相关准备git服务器（ubuntu）本地代码（ubuntu系统）顺序是先将本地代码通过git放在github或者gitlab，服务器把代码从github或者gitlab上抓下来。每次修改完本地代码push到github或gitlab上，服务器再拉（复制）下来。git配置第一步：先安装git（用apt就行sudoapt-getinstallgit），找到需要上传文件的目录下，右击终端运行，输入：gi
BERT-Pytorch demo初探 kyle1314608
https://zhuanlan.zhihu.com/p/50773178概述本文基于pytorch-pretrained-BERT(huggingface)版本的复现，探究如下几个问题：pytorch-pretrained-BERT的基本框架和使用如何利用BERT将句子转为词向量如何使用BERT训练模型（针对SQuAD数据集的问答模型，篇幅问题，可能下篇再写）因为已经有很多文章对BERT的结构和
bert-pytorch安装记录 lizzy05 python machine learning
根据Githubhttps://github.com/codertimo/BERT-pytorch中的readme执行pip3installbert-pytorch结果报错：Couldn'tfindaversionthatsatisfiestherequirementtorch>=0.4.0返回condalist发现torch的版本是0.3.1低于0.4.0,网上搜了一下发现windows上已支持
java杨辉三角 3213213333332132 java基础
package com.algorithm; /** * @Description 杨辉三角 * @author FuJianyong * 2015-1-22上午10:10:59 */ public class YangHui { public static void main(String[] args) { //初始化二维数组长度 int[][] y
《大话重构》之大布局的辛酸历史白糖_ 重构
《大话重构》中提到“大布局你伤不起”，如果企图重构一个陈旧的大型系统是有非常大的风险，重构不是想象中那么简单。我目前所在公司正好对产品做了一次“大布局重构”，下面我就分享这个“大布局”项目经验给大家。背景公司专注于企业级管理产品软件，企业有大中小之分，在2000年初公司用JSP/Servlet开发了一套针对中
电驴链接在线视频播放源码 dubinwei 源码电驴播放器视频 ed2k
本项目是个搜索电驴（ed2k）链接的应用,借助于磁力视频播放器（官网： http://loveandroid.duapp.com/ 开放平台），可以实现在线播放视频，也可以用迅雷或者其他下载工具下载。项目源码： http://git.oschina.net/svo/Emule,动态更新。也可从附件中下载。项目源码依赖于两个库项目，库项目一链接： http://git.oschina.
Javascript中函数的toString()方法周凡杨 JavaScript js toString function object
简述 The toString() method returns a string representing the source code of the function. 简译之，Javascript的toString()方法返回一个代表函数源代码的字符串。句法 function.
struts处理自定义异常 g21121 struts
很多时候我们会用到自定义异常来表示特定的错误情况，自定义异常比较简单，只要分清是运行时异常还是非运行时异常即可，运行时异常不需要捕获，继承自RuntimeException，是由容器自己抛出，例如空指针异常。非运行时异常继承自Exception，在抛出后需要捕获，例如文件未找到异常。此处我们用的是非运行时异常，首先定义一个异常LoginException: /** * 类描述：登录相
Linux中find常见用法示例 510888780 linux
Linux中find常见用法示例 ·find path -option [ -print ] [ -exec -ok command ] {} \; find命令的参数；
SpringMVC的各种参数绑定方式 Harry642 springMVC 绑定表单
1. 基本数据类型(以int为例，其他类似)： Controller代码： @RequestMapping("saysth.do") public void test(int count) { } 表单代码： <form action="saysth.do" method="post&q
Java 获取Oracle ROWID aijuans java oracle
A ROWID is an identification tag unique for each row of an Oracle Database table. The ROWID can be thought of as a virtual column, containing the ID for each row. The oracle.sql.ROWID class i
java获取方法的参数名 antlove java jdk parameter method reflect
reflect.ClassInformationUtil.java package reflect; import javassist.ClassPool; import javassist.CtClass; import javassist.CtMethod; import javassist.Modifier; import javassist.bytecode.CodeAtt
JAVA正则表达式匹配查找替换提取操作百合不是茶 java 正则表达式替换提取查找
正则表达式的查找;主要是用到String类中的split(); String str; str.split();方法中传入按照什么规则截取,返回一个String数组常见的截取规则: str.split("\\.")按照.来截取 str.
Java中equals()与hashCode()方法详解 bijian1013 java set equals()hashCode()
一.equals()方法详解 equals()方法在object类中定义如下： public boolean equals(Object obj) { return (this == obj); } 很明显是对两个对象的地址值进行的比较（即比较引用是否相同）。但是我们知道，String 、Math、I
精通Oracle10编程SQL(4)使用SQL语句 bijian1013 oracle 数据库 plsql
--工资级别表 create table SALGRADE ( GRADE NUMBER(10), LOSAL NUMBER(10,2), HISAL NUMBER(10,2) ) insert into SALGRADE values(1,0,100); insert into SALGRADE values(2,100,200); inser
【Nginx二】Nginx作为静态文件HTTP服务器 bit1129 HTTP服务器
Nginx作为静态文件HTTP服务器在本地系统中创建/data/www目录，存放html文件(包括index.html) 创建/data/images目录，存放imags图片在主配置文件中添加http指令 http { server { listen 80; server_name
kafka获得最新partition offset blackproof kafka partition offset 最新
kafka获得partition下标，需要用到kafka的simpleconsumer import java.util.ArrayList; import java.util.Collections; import java.util.Date; import java.util.HashMap; import java.util.List; import java.
centos 7安装docker两种方式 ronin47
第一种是采用yum 方式 yum install -y docker
java-60-在O(1)时间删除链表结点 bylijinnan java
public class DeleteNode_O1_Time { /** * Q 60 在O(1)时间删除链表结点 * 给定链表的头指针和一个结点指针(!!)，在O(1)时间删除该结点 * * Assume the list is: * head->...->nodeToDelete->mNode->nNode->..
nginx利用proxy_cache来缓存文件 cfyme cache
user zhangy users; worker_processes 10; error_log /var/vlogs/nginx_error.log crit; pid /var/vlogs/nginx.pid; #Specifies the value for ma
[JWFD开源工作流]JWFD嵌入式语法分析器负号的使用问题 comsci 嵌入式
假如我们需要用JWFD的语法分析模块定义一个带负号的方程式，直接在方程式之前添加负号是不正确的，而必须这样做： string str01 = "a=3.14;b=2.71;c=0;c-((a*a)+(b*b))" 定义一个0整数c,然后用这个整数c去
如何集成支付宝官方文档 dai_lm android
官方文档下载地址 https://b.alipay.com/order/productDetail.htm?productId=2012120700377310&tabId=4#ps-tabinfo-hash 集成的必要条件 1. 需要有自己的Server接收支付宝的消息 2. 需要先制作app，然后提交支付宝审核，通过后才能集成调试的时候估计会真的扣款，请注意
应该在什么时候使用Hadoop datamachine hadoop
原帖地址：http://blog.chinaunix.net/uid-301743-id-3925358.html 存档，某些观点与我不谋而合，过度技术化不可取，且hadoop并非万能。 --------------------------------------------万能的分割线-------------------------------- 有人问我，“你在大数据和Hado
在GridView中对于有外键的字段使用关联模型进行搜索和排序 dcj3sjt126com yii
在GridView中使用关联模型进行搜索和排序首先我们有两个模型它们直接有关联: class Author extends CActiveRecord { ... } class Post extends CActiveRecord { ... function relations() { return array( '
使用NSString 的格式化大全 dcj3sjt126com Objective-C
格式定义The format specifiers supported by the NSString formatting methods and CFString formatting functions follow the IEEE printf specification; the specifiers are summarized in Table 1. Note that you c
使用activeX插件对象object滚动有重影蕃薯耀 activeX插件滚动有重影
使用activeX插件对象object滚动有重影 <object style="width:0;" id="abc" classid="CLSID:D3E3970F-2927-9680-BBB4-5D0889909DF6" codebase="activex/OAX339.CAB#
SpringMVC4零配置 hanqunfeng springmvc4
基于Servlet3.0规范和SpringMVC4注解式配置方式，实现零xml配置，弄了个小demo，供交流讨论。项目说明如下： 1.db.sql是项目中用到的表，数据库使用的是oracle11g 2.该项目使用mvn进行管理，私服为自搭建nexus,项目只用到一个第三方 jar，就是oracle的驱动； 3.默认项目为零配置启动，如果需要更改启动方式，请
《开源框架那点事儿16》：缓存相关代码的演变 j2eetop 开源框架
问题引入上次我参与某个大型项目的优化工作，由于系统要求有比较高的TPS，因此就免不了要使用缓冲。该项目中用的缓冲比较多，有MemCache，有Redis，有的还需要提供二级缓冲，也就是说应用服务器这层也可以设置一些缓冲。当然去看相关实现代代码的时候，大致是下面的样子。 [java] view plain copy print ? public vo
AngularJS浅析 kvhur JavaScript
概念 AngularJS is a structural framework for dynamic web apps. 了解更多详情请见原文链接：http://www.gbtags.com/gb/share/5726.htm Directive 扩展html，给html添加声明语句，以便实现自己的需求。对于页面中html元素以ng为前缀的属性名称，ng是angular的命名空间
架构师之jdk的bug排查(一)---------------split的点号陷阱 nannan408 split
1.前言. jdk1.6的lang包的split方法是有bug的,它不能有效识别A.b.c这种类型,导致截取长度始终是0.而对于其他字符,则无此问题.不知道官方有没有修复这个bug. 2.代码 String[] paths = "object.object2.prop11".split("'"); System.ou
如何对10亿数据量级的mongoDB作高效的全表扫描 quentinXXZ mongodb
本文链接: http://quentinXXZ.iteye.com/blog/2149440 一、正常情况下，不应该有这种需求首先，大家应该有个概念，标题中的这个问题，在大多情况下是一个伪命题，不应该被提出来。要知道，对于一般较大数据量的数据库，全表查询，这种操作一般情况下是不应该出现的，在做正常查询的时候，如果是范围查询，你至少应该要加上limit。说一下，
C语言算法之水仙花数 qiufeihu c 算法
/** * 水仙花数 */ #include <stdio.h> #define N 10 int main() { int x,y,z; for(x=1;x<=N;x++) for(y=0;y<=N;y++) for(z=0;z<=N;z++) if(x*100+y*10+z == x*x*x
JSP指令 wyzuomumu jsp
jsp指令的一般语法格式： <%@ 指令名属性 =”值 ” %> 常用的三种指令： page,include,taglib page指令语法形式： <%@ page 属性 1=”值 1” 属性 2=”值 2”%> include指令语法形式： <%@include file=”relative url”%> (jsp可以通过 include