你这个代码我看不懂

自然语言处理NLP文本分类顶会论文阅读笔记（一）

笔记目录

关于Transformer
小样本学习
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis
Leveraging Graph to Improve Abstractive Multi-Document Summarization
Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis
Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa
- Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT
COSFORMER : RETHINKING SOFTMAX IN ATTENTION
Label Confusion Learning to Enhance Text Classification Models
MASKER: Masked Keyword Regularization for Reliable Text Classification
- Convolutional Neural Networks for Sentence Classification
Ideography Leads Us to the Field of Cognition: A Radical-Guided Associative Model for Chinese Text Classification
ACT: an Attentive Convolutional Transformer for Efficient Text Classification
Merging Statistical Feature via Adaptive Gate for Improved Text Classification
Pytorch实战
- 初识Pytorch
- 初识词表
- 词嵌入
- TextCNN-Pytorch代码
- 简单CNN二分类Pytorch实现
- 杂记
- - 自定义损失函数踩坑
  - 批归一化
  - Con1d和Conv2d
  - 膨胀卷积
  - 稀疏Attention
  - Prompt Learning初探
  - 固定bert参数的方法
  - 手动反向传播和优化器
Visual Prompt Tuning (VPT)
Adversarial Multi-task Learning for Text Classification
reStructured Pre-training
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer「谷歌：T5」

关于Transformer

transformer在大的数据集上表现更好。
-> BERT模型，在大数据集上进行预训练得到语言模型结果。
vi-T是对顺序不敏感的，因此用固定的位置编码对输入进行补充。
-> 那么为什么Transformer会对位置信息不敏感呢？输入和输出不也是按照一定序列排好的吗？
-> 回忆一下，encoder self-attention机制，在tokens序列中，后面的token包含有前面token的语义信息，而前面的token同样是包含有后面token的信息的，并不像simpleRNN一样是从左向右依次提取。那么这样将会导致序列提取出来的信息“包罗万象”，比如在“我爱你”这句话某一层的提取结果中，每一个位置上的token都会叠加其余位置上token的信息，经过多个自注意力层提取之后，原始输入“我爱你”和“你爱我”这两句话对应的特征序列理应是不容易区分开的，然而这两句话的现实涵义则是完全不同的。
-> 疑惑：RNN有长文本遗忘的问题，对于长文本，语句双向的涵义叠加起来看起来似乎合理，可以解决问题；但对于短文本，双向RNN会不会也有和Transformer同样的问题，即混淆序列中token的位置信息？

[token之间的相关性；K、Q (token*W) 之间的相似性]
transformer N维序列的输入[x]对应N维序列的输出[c]，RNN里边可以只保留最后一个状态向量 $h_i$ ，而transformer必须全部保留，因为参数不共享（多头自注意力机制那里也不共享参数）。
-> 猜测：考虑到句子中每个位置上不同单词出现的频数不同，因此不共享参数可能可以达到更好的效果（多头自注意力机制则更好理解了，如果共享了参数那么也没有其存在的必要了）。

疑惑：如果编码器输入x_1和解码器输入x_1’的维度不一样，那么K和Q之间的相似度该如何计算呢？还是有些模棱两可，不求甚解……
NLP的输入序列必须等长，如果超长就要切片，如果长度不足就要补齐。

Transformer在训练阶段的解码器部分，为了防止自注意力偷窥到预测单词之后的序列，采用mask方法。
有个问题，mask掉的到底是什么？参考了知乎上的回答，虽然可以自圆其说，但整体理解上面还是有些模棱两可：

小样本学习

Tips：

对Transformer求Q和K的相关度时，也可以做此改进。
从K、Q、V入手，看看是否能对Transformer模型改进。

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Transformer的decoder部分为单向模型，因为文本生成需要从左向右依次预测下一个位置上的字母；而encoder部分则为双向模型，因为每一个位置上的token都整合了其余位置上所有token的信息。概括来说，BERT模型是深度堆叠的Transformer，并且只利用了其中encoder部分的模型。

https://zhuanlan.zhihu.com/p/98855346

https://www.cnblogs.com/gaowenxingxing/p/15005130.html，博客园
https://www.zhihu.com/question/425387974，知乎问答

以下来自 https://mp.weixin.qq.com/s/-YLtq25OLl57pHSmnEgahw

SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis

上述提到的情感词的极性，可以理解为给定词语蕴涵积极或消极的意义。那么如何理解以上三类先验知识呢？首先，为了让特征提取器对情感词更敏感，需要在预训练阶段接受情感词的特征信息。其次，模型知道一个词是不是情感词之后，还需要知道这个情感词的意义是积极的还是消极的。另外，假设一条影评“电影很好看，但是爆米花不好吃”，我们该如何做情感分析呢？文中引入了属性词-情感词对的说法，在这个例子中就是“电影-好看”、“爆米花-不好吃”，强化了属性与其对应的情感词之间的联系，而弱化了不相关的词间联系。通过消融实验发现这三种做法确实能够提高模型解释力。
Reference：https://zhuanlan.zhihu.com/p/267837817

Leveraging Graph to Improve Abstractive Multi-Document Summarization

key words：文本生成和摘要、多文档输入、图神经网络

相关背景：
之前的许多模型诸如BERT，RoBERTa等等，均会限制输入的token数量不能超过512个。

利用本文提出的图模型，在BERT和预训练语言模型上进行改进，可以突破序列化结构对输入长度的限制，处理多文档的输入。

[CLS]: classifier, [SEP]: separator , [UNK]: unknow

本文的核心点在于使用图网络对段落间的相关度进行了测量，然后将这一权重作为全局注意力（global attention）进行计算，将预测出的第t个token与某一段中的第i个token之间的相关度作为局部注意力（local attention）进行计算，注意这个local attention通过global attention指导计算得到。

部分参考：https://blog.csdn.net/Suyebiubiu/article/details/106797601

Context-Guided BERT for Targeted Aspect-Based Sentiment Analysis

Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa

cosFormer
这里有两个问题：

pAsd的值为什么反而比Asd更大？
按照理解，Asd计算中，对某一个aspect来说引入了不属于他的“pair”内的sentiment的距离（在multi-aspect任务中），整体的加权平均，对于任何一种句法树来说，都没理由希望Asd的值最小，也就是说，在这里的Asd即使达到最小值并没有解释力。

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

$I n t ro d u c t i o n$

近些年，预训练语言模型 $E L M o 、 BERT 、 X L N e t$ 在各种下游任务中都实现了SOTA。为了更深入的了解预训练语言模型，许多探针任务被设计出来。探针(probe)通常是一个简单的神经网络(具有少量的额外参数)，其使用预训练语言模型输出的特征向量，并执行某些简单任务(需要标注数据)。通常来说，探针的表现可以间接的衡量预训练语言模型生成向量的表现。
基于探针的方法最大的缺点是需要引入额外的参数，这将使最终的结果难以解释，难以区分是预训练语言模型捕获了语义信息还是探针任务学习到了下游任务的知识，并将其编码至引入的额外参数中。
本文提出了一种称为Perturbed Masking的无参数探针，其能用于分析和解释预训练语言模型。
Perturbed Masking通过改进 $M L M$ (Masked Language Model)任务的目标函数来衡量单词 $x_j$ 对于预测 $x_i$ 的重要性。

推荐阅读：
https://www.pianshen.com/article/40441703264/
https://spaces.ac.cn/archives/7476

以下内容来自：https://mp.weixin.qq.com/s/7qbonGz2u9e6BQ444IJbWw
以下，https://mp.weixin.qq.com/s/cUk-bORVqqcYj-oQtDx21A

p(A·B)
分层设置学习率，靠近输入的层提取底层的共性特征，靠近输出的层提取高级的（特定场景）专有特征。微调时，靠近输入层的学习率就应该比较小。这个想法其实对于CV领域的效果会更好。有人做实验，使用ImageNet数据集预训练的模型，用在汉字识别上，只训练最后的全连接层（frozen前面所有卷积层的参数），就取得了与前人相近的实验结果，也是这个道理。

COSFORMER : RETHINKING SOFTMAX IN ATTENTION

推荐阅读：https://www.freesion.com/article/1548803712/
transformer:
$W^Q, key = X * W^K, value = X * W^V (query, key ∈ R^{n \times d_1}, value ∈ R^{n \times d_2})$
cosFormer:
$\geq N$
$output∈R^{n \times d_2}$
$\neq softmax(A) \times softmax(B)$

在另一篇论文中，题出最终的注意力系数取{-1, 0, 1, 2}的情况，我也认为系数应该越丰富越好，但是本文中认为在相关矩阵中负数是一种冗余的数据，剔除掉之后实验效果更好。

Label Confusion Learning to Enhance Text Classification Models

MASKER: Masked Keyword Regularization for Reliable Text Classification

Convolutional Neural Networks for Sentence Classification

Ideography Leads Us to the Field of Cognition: A Radical-Guided Associative Model for Chinese Text Classification

*自然语言处理中N-Gram模型介绍

ACT: an Attentive Convolutional Transformer for Efficient Text Classification

关系抽取：抽取三元组（主体、关系、客体）
由于需要构建知识图谱，所以在实体识别的基础上，我们需要构建一个模型来识别同一个句子中实体间的关系。关系抽取本身是一个分类问题。给定两个实体和两个实体共同出现的句子文本，判别两个实体之间的关系。

transformer的encoder和decoder也各有6层
&多头
bert-12/24

Merging Statistical Feature via Adaptive Gate for Improved Text Classification

如果单词 w 在所有标签上的频率很高或很低，那么我们可以假设 w 对分类任务的贡献有限。相反，如果一个词在特定的标签类中出现得更频繁，假设这个词是携带特殊信息的。
TCoL字典 V 仅从训练集获得，防止信息泄露。
似然估计
观测数据是X，而X由隐变量Z产生，由Z->X是生成模型\theta，就是解码器；

而由x->z是识别模型\phi，类似于自编码器的编码器。
z为原因

p(z)先验概率
p(z|ζ)后验概率
p(ζ|z)似然估计

（K · Q · V）

交叉注意力
自注意力

Pytorch实战

初识Pytorch

使用GPU进行训练：

测试：

加载数据：

训练：

准确率：

初识词表

vocab.pkl

加载词表或构建词表：

    if os.path.exists(config.vocab_path):
        vocab = pkl.load(open(config.vocab_path, 'rb'))  # 语料库，词表
    else:
        vocab = build_vocab(config.train_path, tokenizer=tokenizer, max_size=MAX_VOCAB_SIZE, min_freq=1)
        pkl.dump(vocab, open(config.vocab_path, 'wb'))
    print(f"Vocab size: {len(vocab)}")

def build_vocab(file_path, tokenizer, max_size, min_freq):
    vocab_dic = {}
    with open(file_path, 'r', encoding='UTF-8') as f:
        for line in tqdm(f):
            lin = line.strip()
            if not lin:
                continue
            content = lin.split('\t')[0]
            for word in tokenizer(content):
                vocab_dic[word] = vocab_dic.get(word, 0) + 1  # 字典赋值，构建词表
                # get("key", 默认值)，如果没有找到key，则返回默认值
        vocab_list = sorted([_ for _ in vocab_dic.items() if _[1] >= min_freq], key=lambda x: x[1], reverse=True)[:max_size]
        # 如果大于等于最小出现频次min_freq才放进vocab_list，并排好序，只取词表最大长度max_size之前的键值对
        # reverse=True为降序
        vocab_dic = {word_count[0]: idx for idx, word_count in enumerate(vocab_list)}
        # 为词表vocab_list建立索引，并返回一个字典 {'iii': 0, 'sdf': 1} （丢弃了频次数据）
        # 按照频次降序排列，霍夫曼树？

        '''
        tinydict = {'Name': 'Runoob', 'Age': 7}
        tinydict2 = {'Sex': 'female' }
         
        tinydict.update(tinydict2)
        
        >>
        tinydict :  {'Name': 'Runoob', 'Age': 7, 'Sex': 'female'}
        '''
        vocab_dic.update({UNK: len(vocab_dic), PAD: len(vocab_dic) + 1})
        # len(vocab_dic)
        # {UNK: len(vocab_dic), PAD: len(vocab_dic) + 1}
        # '', ''
    return vocab_dic

词嵌入

        if config.embedding_pretrained is not None:
            self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False)
            # 词嵌入
        else:
            self.embedding = nn.Embedding(config.n_vocab, config.embed, padding_idx=config.n_vocab - 1)

out = self.embedding(x[0])

embedding_SougouNews.npz

TextCNN-Pytorch代码

run.py

import time
import torch
import numpy as np
from train_eval import train, init_network, test
from importlib import import_module
import argparse
from tensorboardX import SummaryWriter

parser = argparse.ArgumentParser(description='Chinese Text Classification')
parser.add_argument('--model', default='TextCNN', type=str,
                    help='choose a model: TextCNN, TextRNN, FastText, TextRCNN, TextRNN_Att, DPCNN, Transformer')
parser.add_argument('--embedding', default='pre_trained', type=str, help='random or pre_trained')
parser.add_argument('--word', default=False, type=bool, help='True for word, False for char')
args = parser.parse_args()

if __name__ == '__main__':
    dataset = 'THUCNews'  # 数据集

    # 搜狗新闻:embedding_SougouNews.npz, 腾讯:embedding_Tencent.npz, 随机初始化:random
    embedding = 'embedding_SougouNews.npz'
    if args.embedding == 'random':
        embedding = 'random'
    model_name = args.model  # TextCNN, TextRNN,
    if model_name == 'FastText':
        from utils_fasttext import build_dataset, build_iterator, get_time_dif

        embedding = 'random'
    else:
        from utils import build_dataset, build_iterator, get_time_dif

    x = import_module('models.' + model_name)  # 导入模块，相对路径
    config = x.Config(dataset, embedding)  # 传入参数，对应模型的Config类初始化
    np.random.seed(1)
    torch.manual_seed(1)
    torch.cuda.manual_seed_all(1)
    torch.backends.cudnn.deterministic = True  # 保证每次结果一样

    start_time = time.time()
    print("Loading data...")
    vocab, train_data, dev_data, test_data = build_dataset(config, args.word)
    train_iter = build_iterator(train_data, config)
    dev_iter = build_iterator(dev_data, config)
    test_iter = build_iterator(test_data, config)
    time_dif = get_time_dif(start_time)
    print("Time usage:", time_dif)

    # train
    config.n_vocab = len(vocab)
    model = x.Model(config).to(config.device)
    writer = SummaryWriter(log_dir=config.log_path + '/' + time.strftime('%m-%d_%H.%M', time.localtime()))
    if model_name != 'Transformer':
        init_network(model)
    print(model.parameters)
    # torch.save(model, "saved\\cnn_model.pth")
    # train(config, model, train_iter, dev_iter, test_iter, writer)
    # train(config, model, train_iter, dev_iter, test_iter, writer)
    # test(config, model, test_iter)
    test(config, model, test_iter)

train_eval.py

# coding: UTF-8
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn import metrics
import time
from utils import get_time_dif
import pickle as pkl
from tensorboardX import SummaryWriter
import csv

# 自定义的函数
# 找字典里value = val的键值对，返回其key
def get_key(_dict_, val):
    for key, value in _dict_.items():
        if value == val:
            return key
    return 'Key Not Found'

# 权重初始化，默认xavier
def init_network(model, method='xavier', exclude='embedding', seed=123):
    for name, w in model.named_parameters():
        if exclude not in name:
            if 'weight' in name:
                if method == 'xavier':
                    nn.init.xavier_normal_(w)
                elif method == 'kaiming':
                    nn.init.kaiming_normal_(w)
                else:
                    nn.init.normal_(w)
            elif 'bias' in name:
                nn.init.constant_(w, 0)
            else:
                pass


def train(config, model, train_iter, dev_iter, test_iter, writer):
    start_time = time.time()
    model.train()
    optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

    # 学习率指数衰减，每次epoch：学习率 = gamma * 学习率
    # scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
    total_batch = 0  # 记录进行到多少batch
    dev_best_loss = float('inf')
    last_improve = 0  # 记录上次验证集loss下降的batch数
    flag = False  # 记录是否很久没有效果提升
    #writer = SummaryWriter(log_dir=config.log_path + '/' + time.strftime('%m-%d_%H.%M', time.localtime()))
    for epoch in range(config.num_epochs):
        print('Epoch [{}/{}]'.format(epoch + 1, config.num_epochs))
        # scheduler.step() # 学习率衰减
        for i, (trains, labels) in enumerate(train_iter):
            #print (trains[0].shape)
            outputs = model(trains)
            model.zero_grad()
            loss = F.cross_entropy(outputs, labels)
            loss.backward()
            optimizer.step()
            if total_batch % 100 == 0:
                # 每多少轮输出在训练集和验证集上的效果
                true = labels.data.cpu()
                predic = torch.max(outputs.data, 1)[1].cpu()
                train_acc = metrics.accuracy_score(true, predic)
                dev_acc, dev_loss = evaluate(config, model, dev_iter)
                if dev_loss < dev_best_loss:
                    dev_best_loss = dev_loss
                    torch.save(model.state_dict(), config.save_path)
                    improve = '*'
                    last_improve = total_batch
                else:
                    improve = ''
                time_dif = get_time_dif(start_time)
                msg = 'Iter: {0:>6},  Train Loss: {1:>5.2},  Train Acc: {2:>6.2%},  Val Loss: {3:>5.2},  Val Acc: {4:>6.2%},  Time: {5} {6}'
                print(msg.format(total_batch, loss.item(), train_acc, dev_loss, dev_acc, time_dif, improve))
                writer.add_scalar("loss/train", loss.item(), total_batch)
                writer.add_scalar("loss/dev", dev_loss, total_batch)
                writer.add_scalar("acc/train", train_acc, total_batch)
                writer.add_scalar("acc/dev", dev_acc, total_batch)
                model.train()
            total_batch += 1
            if total_batch - last_improve > config.require_improvement:
                # 验证集loss超过1000batch没下降，结束训练
                print("No optimization for a long time, auto-stopping...")
                flag = True
                break
        if flag:
            break
    writer.close()
    test(config, model, test_iter)


def test(config, model, test_iter):
    # test
    model.load_state_dict(torch.load(config.save_path))
    model.eval()
    start_time = time.time()
    test_acc, test_loss, test_report, test_confusion = evaluate(config, model, test_iter, test=True)
    msg = 'Test Loss: {0:>5.2},  Test Acc: {1:>6.2%}'
    print(msg.format(test_loss, test_acc))
    print("Precision, Recall and F1-Score...")
    print(test_report)
    print("Confusion Matrix...")
    print(test_confusion)
    time_dif = get_time_dif(start_time)
    print("Time usage:", time_dif)


def evaluate(config, model, data_iter, test=False):
    print(config.class_list)
    model.eval()
    loss_total = 0
    predict_all = np.array([], dtype=int)
    labels_all = np.array([], dtype=int)
    vocab = pkl.load(open(config.vocab_path, 'rb'))
    file = open("saved\\predict.csv", "w", newline='', encoding="utf-8-sig")
    label_txt = open("THUCNews\\data\\test.txt", "r", encoding="utf-8")
    lines = label_txt.readlines()
    num = 0
    with torch.no_grad():
        for texts, labels in data_iter:
            if len(texts[0]) < config.batch_size:
                print("当前batch size不足，跳出")
                break
            # print(config.batch_size)
            outputs = model(texts)
            # print(outputs)
            for row in range(config.batch_size):
                # _str_ = ""
                # print(row)
                # print(config.pad_size)
                # for column in range(config.pad_size):
                    # print(column)
                    # print(vocab)
                    # print(texts[0][row, column])
                    # _str_ = _str_ + get_key(vocab, texts[0][row, column])
                    # print(labels[row].item())
                    # print(outputs)
                _str_ = lines[num]
                num += 1
                _str_ = _str_.strip('\n')
                _str_ = _str_.replace(_str_[-1], "")
                # print(_str_[-1])
                # _str_ = _str_.replace("", "")
                label = labels[row].item()
                output = torch.argmax(outputs[row], -1).item()
                # print(_str_)
                # print(config.class_list[label])
                # print(config.class_list[output])
                # file.write(_str_ + "\t" + config.class_list[output] + "\t" + config.class_list[label] + "\n")
                csv_file = csv.writer(file)
                csv_file.writerow([_str_, config.class_list[output], config.class_list[label]])
                # print("第{}行已记录".format(row))
            loss = F.cross_entropy(outputs, labels)
            loss_total += loss
            print(labels)
            labels = labels.data.cpu().numpy()
            # print(labels)
            predic = torch.max(outputs.data, 1)[1].cpu().numpy()
            labels_all = np.append(labels_all, labels)
            predict_all = np.append(predict_all, predic)
    file.close()
    label_txt.close()
    acc = metrics.accuracy_score(labels_all, predict_all)
    if test:
        report = metrics.classification_report(labels_all, predict_all, target_names=config.class_list, digits=4)
        confusion = metrics.confusion_matrix(labels_all, predict_all)
        return acc, loss_total / len(data_iter), report, confusion
    return acc, loss_total / len(data_iter)

utils.py

# coding: UTF-8
import os
import torch
import numpy as np
import pickle as pkl
from tqdm import tqdm
import time
from datetime import timedelta


MAX_VOCAB_SIZE = 10000  # 词表长度限制
UNK, PAD = '', ''  # 未知字，padding符号

# 中文
# char-level 词表较小
# word-level 词表会较大

'''
tqdm # python进度条函数

from tqdm import tqdm
import time
d = {'loss':0.2,'learn':0.8}
for i in tqdm(range(50),desc='进行中',ncols=10,postfix=d): #desc设置名称,ncols设置进度条长度.postfix以字典形式传入详细信息
    time.sleep(0.1)
    pass
'''

def build_vocab(file_path, tokenizer, max_size, min_freq):
    vocab_dic = {}
    with open(file_path, 'r', encoding='UTF-8') as f:
        for line in tqdm(f):
            lin = line.strip()
            if not lin:
                continue
            content = lin.split('\t')[0]
            for word in tokenizer(content):
                vocab_dic[word] = vocab_dic.get(word, 0) + 1  # 字典赋值，构建词表
                # get("key", 默认值)，如果没有找到key，则返回默认值
        vocab_list = sorted([_ for _ in vocab_dic.items() if _[1] >= min_freq], key=lambda x: x[1], reverse=True)[:max_size]
        # 如果大于等于最小出现频次min_freq才放进vocab_list，并排好序，只取词表最大长度max_size之前的键值对
        # reverse=True为降序
        vocab_dic = {word_count[0]: idx for idx, word_count in enumerate(vocab_list)}
        # 为词表vocab_list建立索引，并返回一个字典 {'iii': 0, 'sdf': 1} （丢弃了频次数据）
        # 按照频次降序排列，霍夫曼树？

        '''
        tinydict = {'Name': 'Runoob', 'Age': 7}
        tinydict2 = {'Sex': 'female' }
         
        tinydict.update(tinydict2)
        
        >>
        tinydict :  {'Name': 'Runoob', 'Age': 7, 'Sex': 'female'}
        '''
        vocab_dic.update({UNK: len(vocab_dic), PAD: len(vocab_dic) + 1})
        # len(vocab_dic)
        # {UNK: len(vocab_dic), PAD: len(vocab_dic) + 1}
        # '', ''
    return vocab_dic


def build_dataset(config, ues_word):
    if ues_word:
        tokenizer = lambda x: x.split(' ')  # 以空格隔开，word-level
    else:
        tokenizer = lambda x: [y for y in x]  # char-level  # 构建列表tokenizer，该列表遍历了x
    if os.path.exists(config.vocab_path):
        vocab = pkl.load(open(config.vocab_path, 'rb'))  # 语料库，词表
    else:
        vocab = build_vocab(config.train_path, tokenizer=tokenizer, max_size=MAX_VOCAB_SIZE, min_freq=1)
        pkl.dump(vocab, open(config.vocab_path, 'wb'))
    print(f"Vocab size: {len(vocab)}")

    def load_dataset(path, pad_size=32):  # default, pad_size = 32
        contents = []
        with open(path, 'r', encoding='UTF-8') as f:
            for line in tqdm(f):
                lin = line.strip()
                if not lin:
                    continue
                content, label = lin.split('\t')
                words_line = []
                token = tokenizer(content)  # token 列表
                seq_len = len(token)
                if pad_size:
                    if len(token) < pad_size:
                        token.extend([vocab.get(PAD)] * (pad_size - len(token)))
                        '''
                        In[36]: ["hi"]*3
                        Out[36]: ['hi', 'hi', 'hi']
                        '''
                    else:
                        token = token[:pad_size]  # 截断
                        seq_len = pad_size
                # word to id
                for word in token:
                    words_line.append(vocab.get(word, vocab.get(UNK)))
                contents.append((words_line, int(label), seq_len))
        return contents  # [([...], 0, seq_len), ([...], 1, seq_len), ...]
    train = load_dataset(config.train_path, config.pad_size)
    dev = load_dataset(config.dev_path, config.pad_size)
    test = load_dataset(config.test_path, config.pad_size)
    return vocab, train, dev, test


class DatasetIterater(object):
    def __init__(self, batches, batch_size, device):  # batches
        self.batch_size = batch_size
        self.batches = batches
        self.n_batches = len(batches) // batch_size
        self.residue = False  # 记录batch数量是否为整数，True：否，False：是
        if len(batches) % self.n_batches != 0:
            self.residue = True
        self.index = 0
        self.device = device

    def _to_tensor(self, datas):
        x = torch.LongTensor([_[0] for _ in datas]).to(self.device)
        y = torch.LongTensor([_[1] for _ in datas]).to(self.device)

        # pad前的长度(超过pad_size的设为pad_size)
        seq_len = torch.LongTensor([_[2] for _ in datas]).to(self.device)
        return (x, seq_len), y

    def __next__(self):
        if self.residue and self.index == self.n_batches:
            batches = self.batches[self.index * self.batch_size: len(self.batches)]
            self.index += 1
            batches = self._to_tensor(batches)
            return batches

        elif self.index > self.n_batches:
            self.index = 0
            raise StopIteration
        else:
            batches = self.batches[self.index * self.batch_size: (self.index + 1) * self.batch_size]
            self.index += 1
            batches = self._to_tensor(batches)
            return batches

    def __iter__(self):
        return self

    def __len__(self):
        if self.residue:
            return self.n_batches + 1
        else:
            return self.n_batches


def build_iterator(dataset, config):
    iter = DatasetIterater(dataset, config.batch_size, config.device)
    return iter


def get_time_dif(start_time):
    """获取已使用时间"""
    end_time = time.time()
    time_dif = end_time - start_time
    return timedelta(seconds=int(round(time_dif)))


if __name__ == "__main__":
    # 如果执行python utils.py
    # 则运行以下代码

    '''提取预训练词向量'''
    # 下面的目录、文件名按需更改。
    train_dir = "./THUCNews/data/train.txt"
    vocab_dir = "./THUCNews/data/vocab.pkl"
    pretrain_dir = "./THUCNews/data/sgns.sogou.char"
    emb_dim = 300
    filename_trimmed_dir = "./THUCNews/data/embedding_SougouNews"
    if os.path.exists(vocab_dir):
        word_to_id = pkl.load(open(vocab_dir, 'rb'))
    else:
        # tokenizer = lambda x: x.split(' ')  # 以词为单位构建词表(数据集中词之间以空格隔开)
        tokenizer = lambda x: [y for y in x]  # 以字为单位构建词表
        word_to_id = build_vocab(train_dir, tokenizer=tokenizer, max_size=MAX_VOCAB_SIZE, min_freq=1)
        pkl.dump(word_to_id, open(vocab_dir, 'wb'))

    embeddings = np.random.rand(len(word_to_id), emb_dim)
    f = open(pretrain_dir, "r", encoding='UTF-8')
    for i, line in enumerate(f.readlines()):
        # if i == 0:  # 若第一行是标题，则跳过
        #     continue
        lin = line.strip().split(" ")
        if lin[0] in word_to_id:
            idx = word_to_id[lin[0]]
            emb = [float(x) for x in lin[1:301]]
            embeddings[idx] = np.asarray(emb, dtype='float32')
    f.close()
    np.savez_compressed(filename_trimmed_dir, embeddings=embeddings)

models.TextCNN.py

# coding: UTF-8
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np


class Config(object):

    """配置参数"""
    def __init__(self, dataset, embedding):
        self.model_name = 'TextCNN'
        self.train_path = dataset + '/data/train.txt'                                # 训练集
        self.dev_path = dataset + '/data/dev.txt'                                    # 验证集
        self.test_path = dataset + '/data/test.txt'                                  # 测试集
        self.class_list = [x.strip() for x in open(
            dataset + '/data/class.txt').readlines()]                                # 类别名单
        self.vocab_path = dataset + '/data/vocab.pkl'                                # 词表 语料库
        self.save_path = dataset + '/saved_dict/' + self.model_name + '.ckpt'        # 模型训练结果
        self.log_path = dataset + '/log/' + self.model_name
        self.embedding_pretrained = torch.tensor(
            np.load(dataset + '/data/' + embedding)["embeddings"].astype('float32'))\
            if embedding != 'random' else None                                       # 预训练词向量  # \ 续航符
        self.device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')   # 设备

        self.dropout = 0.5                                              # 随机失活
        self.require_improvement = 1000                                 # 若超过1000batch效果还没提升，则提前结束训练
        self.num_classes = len(self.class_list)                         # 类别数
        self.n_vocab = 0                                                # 词表大小，在运行时赋值
        self.num_epochs = 20                                            # epoch数
        self.batch_size = 128                                           # mini-batch大小
        self.pad_size = 32                                              # 每句话处理成的长度(短填长切)
        self.learning_rate = 1e-3                                       # 学习率
        self.embed = self.embedding_pretrained.size(1)\
            if self.embedding_pretrained is not None else 300           # 字向量维度
        self.filter_sizes = (2, 3, 4)                                   # 卷积核尺寸
        self.num_filters = 256                                          # 卷积核数量(channels数)


'''Convolutional Neural Networks for Sentence Classification'''


class Model(nn.Module):
    def __init__(self, config):
        super(Model, self).__init__()
        if config.embedding_pretrained is not None:
            self.embedding = nn.Embedding.from_pretrained(config.embedding_pretrained, freeze=False)
            # 词嵌入
        else:
            self.embedding = nn.Embedding(config.n_vocab, config.embed, padding_idx=config.n_vocab - 1)
        self.convs = nn.ModuleList(
            [nn.Conv2d(1, config.num_filters, (k, config.embed)) for k in config.filter_sizes])
        self.dropout = nn.Dropout(config.dropout)
        self.fc = nn.Linear(config.num_filters * len(config.filter_sizes), config.num_classes)

    def conv_and_pool(self, x, conv):
        x = F.relu(conv(x)).squeeze(3)
        x = F.max_pool1d(x, x.size(2)).squeeze(2)
        return x

    def forward(self, x):
        # print("输入序列：")
        # print(x[0])
        #print (x[0].shape)
        out = self.embedding(x[0])
        # print("词嵌入：")
        # print(out)
        out = out.unsqueeze(1)
        out = torch.cat([self.conv_and_pool(out, conv) for conv in self.convs], 1)
        out = self.dropout(out)
        out = self.fc(out)
        return out

简单CNN二分类Pytorch实现

main.py

from torch.utils.data import DataLoader
from loadDatasets import *
from model import *
import torchvision

import torch
torch.set_default_tensor_type(torch.DoubleTensor)
torch.autograd.set_detect_anomaly = True

gpu_device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

if __name__ == '__main__':

    batch_size = myModel.batch_size

    # 加载数据集
    train_data = myDataLoader('train.csv', 'datasets', transform=torchvision.transforms.ToTensor())
    print("训练集数量{}".format(len(train_data)))
    dev_data = myDataLoader('dev.csv', 'datasets', transform=torchvision.transforms.ToTensor())
    print("验证集数量{}".format(len(dev_data)))

    valid_batch_size = len(dev_data) // (len(train_data)/batch_size)
    valid_batch_size = int(valid_batch_size)

    train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
    dev_loader = DataLoader(dev_data, batch_size=valid_batch_size, shuffle=True)

    # 加载模型
    cnn_model = myModel()

    # 加载预训练参数
    cnn_model.load_state_dict(torch.load("model\\cnn_model.pth"), strict=False)

    cnn_model = cnn_model.to(gpu_device)

    loss_fun = nn.CrossEntropyLoss()
    loss_fun = loss_fun.to(gpu_device)
    # 迭代训练
    epochs = 2
    optimizer = torch.optim.Adam(cnn_model.parameters(),
                lr=1e-3,
                betas=(0.9, 0.999),
                eps=1e-08,
                weight_decay=0,
                amsgrad=False)
    total_train_step = 0
    # valid_size = 0
    # valid_num = 0
    for epoch in range(epochs):
        print("===========第{}轮训练开始===========".format(epoch + 1))
        for trainData, validData in zip(train_loader, dev_loader):
            train_seq, train_label = trainData
            valid_seq, valid_label = validData

            batch_size_train = len(train_seq)
            batch_size_valid = len(valid_seq)
            # print(batch_size_current)
            if batch_size_train < batch_size or batch_size_valid < valid_batch_size:
                print("当前不足一个batch_size，停止训练")
                break

            train_seq = train_seq.to(gpu_device)
            train_label = train_label.to(gpu_device)

            valid_seq = valid_seq.to(gpu_device)
            valid_label = valid_label.to(gpu_device)
            # print(train_seq)
            # print(train_seq.shape)
            # print(train_label)
            # print("调用train model")
            cnn_model.from_type = "train"
            train_output = cnn_model(train_seq)
            train_output = train_output.to(gpu_device)

            # print("调用valid model")
            cnn_model.from_type = "valid"
            cnn_model.valid_batch_size = valid_batch_size

            # valid_output = cnn_model(valid_seq)
            # valid_output = valid_output.to(gpu_device)

            # print(valid_output)
            # print(valid_label)
            # print(valid_output.argmax(1))
            # print("训练集")
            # print(train_output)
            # print(train_output.argmax(1))
            # print(train_label)
            loss = loss_fun(train_output, train_label)
            optimizer.zero_grad()
            loss.backward(retain_graph=True)
            optimizer.step()
            total_train_step += 1
            # valid_size += valid_batch_size
            # valid_num += (valid_output.argmax(1) == valid_label).sum()
            cnn_channel = ""
            if (train_output.argmax(1) == train_label).sum() / batch_size > 0.65:
                if cnn_model.channel["cnn1"]["status"]:
                    cnn_model.channel["cnn1"]["prob"] *= 1.0005
                    cnn_channel = "cnn1"
                else:
                    cnn_model.channel["cnn2"]["prob"] *= 1.0005
                    cnn_channel = "cnn2"
            if total_train_step % 50 == 0:
                print("训练次数{}，当前损失值 --------- {}".format(total_train_step, loss))
                accuracy_train = (train_output.argmax(1) == train_label).sum() / batch_size
                print("batch train-accuracy {}%".format(accuracy_train * 100))
                # accuracy_valid = valid_num / valid_size
                # print("total valid-accuracy {}%".format(accuracy_valid * 100))
                print("model total_num {}".format(cnn_model.total_num))
                print("当前执行的通道 {}".format(cnn_channel))
                prob1 = cnn_model.channel["cnn1"]["prob"] / (cnn_model.channel["cnn1"]["prob"] + cnn_model.channel["cnn2"]["prob"])
                prob2 = cnn_model.channel["cnn2"]["prob"] / (cnn_model.channel["cnn1"]["prob"] + cnn_model.channel["cnn2"]["prob"])
                print("通道1概率值 {} 通道2概率值{}".format(prob1, prob2))

    # 保存模型
    torch.save(cnn_model.state_dict(), "model\\cnn_model.pth")

test.py

# total test-accuracy 91.00260416666667% ×
# total test-accuracy 79.90767045454545%
# total test-accuracy 81.09197443181819%
# total test-accuracy 81.17365056818181% # 加入dropout
# total test-accuracy 77.64382102272727% # 加入通路奖励机制
# total test-accuracy 76.68185763888889% # 通路奖励
# total test-accuracy 76.3367259174312%

import torch
import torchvision
from torch.utils.data import DataLoader
from loadDatasets import myDataLoader
from model import myModel
import csv

torch.set_default_tensor_type(torch.DoubleTensor)
torch.autograd.set_detect_anomaly = True

test_batch_size = 1024
gpu_device = torch.device("cuda:0")

test_data = myDataLoader('test.csv', 'datasets', transform=torchvision.transforms.ToTensor())
print("测试集数量{}".format(len(test_data)))

test_loader = DataLoader(test_data, batch_size=test_batch_size, shuffle=True)

# 加载模型
cnn_model = myModel()
cnn_model.load_state_dict(torch.load("model\\cnn_model.pth"), strict=False)
cnn_model.eval()
cnn_model = cnn_model.to(gpu_device)

test_size = 0
test_num = 0
for testData in test_loader:
    test_seq, test_label = testData

    test_seq = test_seq.to(gpu_device)
    test_label = test_label.to(gpu_device)

    batch_size_test = len(test_seq)
    # print(batch_size_current)
    if batch_size_test < test_batch_size:
        print("当前不足一个batch_size，停止训练")
        break

    cnn_model.from_type = "test"
    # test_output = cnn_model(test_seq)
    test_output = cnn_model.forward(test_seq)
    test_output = test_output.to(gpu_device)

    test_size += test_batch_size
    test_num += (test_output.argmax(1) == test_label).sum()

    result_csv = open("result\\result.csv", "a")
    csv_write = csv.writer(result_csv)
    csv_write.writerow(['概率分布', '预测值', '真实值'])
    for predict, label  in zip(test_output, test_label):
        # predict.to(torch.device("cpu"))
        # label.to(torch.device("cpu"))
        # print(predict)
        probability_distribution = "[" + str(predict[0].to(torch.float64).item()) + "," + str(predict[1].to(torch.float64).item()) + "]"
        # print(str(predict[0].to(torch.float64).item()) + " " + str(predict[1].to(torch.float64).item()))
        # print(label.item())
        label = label.item()
        predict_cls = predict.view(-1, 2).argmax(1)
        # print(predict_cls.item())
        predict_cls = predict_cls.item()
        csv_write.writerow([probability_distribution, predict_cls, label])
    result_csv.close()

accuracy_test = test_num / test_size
print("total test-accuracy {}%".format(accuracy_test * 100))

model.py

import random
import torch.nn as nn
import torch

torch.set_default_tensor_type(torch.DoubleTensor)


class myModel(nn.Module):
    batch_size = 128

    def __init__(self):
        super().__init__()
        self.from_type = ""
        self.total_num = 0
        self.cnn1_num = 0
        self.cnn2_num = 0
        self.current_batch_size = self.batch_size
        self.valid_batch_size = self.batch_size
        self.channel = {
            "cnn1": {
                "prob": 0.5,
                "status": False
            },
            "cnn2": {
                "prob": 0.5,
                "status": False
            }
        }
        self.cnn1 = nn.Sequential(
            nn.Conv2d(1, 3, (1, 2), padding="same"),  # out = 10
            nn.ReLU(inplace=False),
            nn.Conv2d(3, 5, (1, 3)),  # out = 8
            nn.ReLU(inplace=False),
            nn.Conv2d(5, 7, (1, 5)),  # out = 4
            nn.ReLU(inplace=False),
            nn.Flatten(),
        )
        self.cnn2 = nn.Sequential(
            nn.Conv2d(1, 2, (1, 2), padding="same"),  # out = 10
            nn.ReLU(inplace=False),
            nn.Conv2d(2, 4, (1, 3)),  # out = 8
            nn.ReLU(inplace=False),
            nn.Conv2d(4, 7, (1, 5)),  # out = 4
            nn.ReLU(inplace=False),
            nn.Flatten(),
        )
        self.fc = nn.Sequential(
            nn.Linear(28, 14),
            nn.Dropout(0.5),
            nn.Linear(14, 7),
            # nn.Dropout(0.5),
            nn.ReLU(inplace=False),
            nn.Linear(7, 2),
            nn.Softmax()
        )
        # self.relu = nn.Relu()

    def forward(self, x):
        if self.from_type == "valid":
            self.current_batch_size = self.valid_batch_size
        else:
            self.current_batch_size = self.batch_size
        # print("model batch size = {}".format(self.current_batch_size))
        # print(x)
        x = x.to(torch.device("cuda:0"))
        # random_num1 = random.random()
        self.total_num = self.total_num + 1
        self.channel["cnn1"]["status"] = False
        self.channel["cnn2"]["status"] = False
        self.channel["cnn1"]["prob"] = self.channel["cnn1"]["prob"] / (
                    self.channel["cnn1"]["prob"] + self.channel["cnn2"]["prob"])
        self.channel["cnn2"]["prob"] = self.channel["cnn2"]["prob"] / (
                    self.channel["cnn1"]["prob"] + self.channel["cnn2"]["prob"])
        # if random_num1 < 0.5:
        #     x1 = torch.relu(self.cnn1(x))
        # else:
        #     x1 = torch.tanh(self.cnn1(x))
        # random_num2 = random.random()
        # if random_num2 < 0.5:
        #     x2 = torch.relu(self.cnn1(x))
        # else:
        #     x2 = torch.sigmoid(self.cnn1(x))

        x1 = torch.zeros([self.batch_size, 28])
        x2 = torch.zeros([self.batch_size, 28])
        x1 = x1.to(torch.device("cuda:0"))
        x2 = x2.to(torch.device("cuda:0"))
        random_num = random.random()
        if random_num < self.channel["cnn1"]["prob"]:
            x1 = torch.relu(self.cnn1(x))
            # print(self.x1.clone() + torch.relu(self.cnn1(x)))
            # print(x1.shape)
            # print(torch.relu(self.cnn1(x)).shape)
            self.channel["cnn1"]["status"] = True
        else:
            x2 = torch.relu(self.cnn2(x))
            # print(self.x2.clone() + torch.relu(self.cnn2(x)))
            # print(x2.shape)
            # print(torch.relu(self.cnn2(x)).shape)
            self.channel["cnn2"]["status"] = True

        x = (x1 * self.channel["cnn1"]["prob"] + x2 * self.channel["cnn2"]["prob"])
        # x = x.view(batch_size, -1, 28)
        # print(x.shape)
        x = self.fc(x)
        # print(x.shape)
        # print("共执行了{}次".format(self.total_num))
        return x

loadDatasets.py

import os
import pandas as pd
import torch
from torch.utils.data import Dataset
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


class myDataLoader(Dataset):

    def __init__(self, annotations_file, root_dir, transform=None, target_transform=None):
        full_path = os.path.join(root_dir, annotations_file)
        self.csv_data = pd.read_csv(full_path)
        # csv_data.drop(labels=None,axis=0, index=0, columns=None, inplace=True)
        del self.csv_data['Unnamed: 0']

        # Step7：样本不均衡问题
        X = self.csv_data.drop('SeriousDlqin2yrs', axis=1)
        y = self.csv_data['SeriousDlqin2yrs']
        # sns.countplot(x='SeriousDlqin2yrs', data=self.csv_data)
        # plt.show()
        # 使用SMOTE方法进行过抽样处理
        from imblearn.over_sampling import SMOTE  # 过抽样处理库SMOTE
        model_smote = SMOTE()  # 建立SMOTE模型对象
        X, y = model_smote.fit_resample(X, y)  # 输入数据并作过抽样处理
        self.csv_data = pd.concat([y, X], axis=1)  # 按列合并数据框
        # print(smote_resampled.head(5))
        # groupby_data_smote = smote_resampled.groupby('SeriousDlqin2yrs').count()  # 对label做分类汇总
        # print(groupby_data_smote)  # 打印输出经过SMOTE处理后的数据集样本分类分布
        # sns.countplot(x='SeriousDlqin2yrs', data=smote_resampled)
        # plt.show()

        # 该方法导致AUC低于0.8



        self.length = len(self.csv_data)

        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return self.length

    def __getitem__(self, idx):
        seq = self.csv_data.iloc[idx, 1:]
        # 转换类型
        seq = np.array(seq)
        seq = torch.tensor(seq)
        seq = seq.reshape(1, 1, 10)

        label = self.csv_data.iloc[idx][0]
        label = torch.tensor(label).long().item()
        # label = torch.Tensor(label)
        return seq, label

卷积层参数量计算：

杂记

（爬虫，普遍问题）正文格式混乱，标点符号特殊字符 ‘\t’ ‘\n’ ’ ‘，如果直接删除或者替换为’‘，怎么样才能不影响段落之间的关系？段落之间的’\n’怎么才能被识别出来？
本来以为词嵌入之后向量内的数值和为1，并且、等词向量应该具有某些特征，但是直接观察搜狗300-d的词向量好像并不是这样。
细想一下，概率平均分布的那个向量是在经过多层感知机和softmax分类之后得到的，而并不是这里的词向量。
词向量：
词向量：

冷门新闻
 还是说soft prompt只是一种寻找prompt的方法，一旦训练好，便可一直使用，并且BERT模型的参数还是会在下游任务进行中得到微调的？（这种可能性比较小，因为引入的参数矩阵要适配下游任务，当然也可以将多个任务同时进行训练，感觉可能难以实现，但可以试试）

自定义损失函数踩坑

修改损失函数之后，发生了grad==none的情况（train_output.grad）：
通过grad_input = torch.autograd.grad(loss, [train_output], retain_graph=True)返回变量可以打印出来梯度，但此处应该只是计算出了中间变量梯度的值，并不会对反向传播起到作用：

虽然本项目的训练没有出现问题，最终损失值可以下降，但在另一个项目里边发生了损失不收敛的问题，所以目前无法确定是修改损失函数之后导致模型不收敛还是梯度没有反向传播回去（目前认为前者可能性更大一些，准备重新定义一个模型了）。

因为没有更多精力排查问题，所以现在暂时先避开可能导致出现这种问题的修改方式。建议对于模型输出的修改全部在model类的forward()中进行，尽量不要在损失函数中定义。

批归一化

num_features：特征的维度 (N,L) -> L ; (N,C,L) -> C：

class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True) [source]

num_features：特征的维度 (N,C,X,Y) -> C：

class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)[source]

Con1d和Conv2d

Con1d和Conv2d的区别
图像的数据一般是三维的 $W * H * C$ ，文本的数据一般是二维的 $L * D$

$C$ 代表图像的通道数， $D$ 代表词向量的维度。

$kernel\_size$ ：卷积核的尺寸
在Conv2D中，是一个二维的元组 $w * h$ ，当然也可以传入整数，代表 $w == h$ ；
在Conv1D中，是整数 $l$ 。

Conv2d：

如图，输入为 $7 * 7 * 3$ 的图片，卷积核大小为 $3 * 3$ ，卷积核个数为 $2$ ，参数量为 $3 * 3 * 3 * 2$
Conv1d：

如图，输入序列为 $3 * 3$ 的文本，卷积核大小为 $2$ ，个数为 $1$ ，参数量为 $3 * 2 * 1$
shape

[1,
 2,
 3,
 4]
 # [1, 2, 3, 4]
 # torch.Size([4])

[[12,45],
 [33,58],
 [60,17],
 [10,82]]
 # torch.Size([4, 2])

torch.tensor([[12,45]]).shape
Out[31]: torch.Size([1, 2])

torch.tensor(
[[1],
 [2],
 [3],
 [4]]).shape
Out[32]: torch.Size([4, 1])

torch.tensor(
[[[12,45],
  [33,58],
  [60,17],
  [10,82]]]).shape
Out[34]: torch.Size([1, 4, 2])

膨胀卷积

普通卷积：

stride = 2, output_size = 3
膨胀卷积：

output_size = 3
参数量一致，输出大小不变，但增大了感受野。

这种效果类似于在卷积层之前添加了池化层，但膨胀卷积的作法可以在不增加参数量的情况下，保证输出维度不变。

稀疏Attention

膨胀注意力：
Atrous Self Attention就是启发于“膨胀卷积（Atrous Convolution）”，如下右图所示，它对相关性进行了约束，强行要求每个元素只跟它相对距离为k,2k,3k,…的元素关联，其中k>1是预先设定的超参数。
$O(n^2 / k)$
Local Self Attention

显然Local Self Attention则要放弃全局关联，重新引入局部关联。具体来说也很简单，就是约束每个元素只与前后 $k$ 个元素以及自身有关联，如下图所示：

Sparse Self Attention

Atrous Self Attention是带有一些洞的，而Local Self Attention正好填补了这些洞，所以一个简单的方式就是将Local Self Attention和Atrous Self Attention交替使用，两者累积起来，理论上也可以学习到全局关联性，也省了显存。

Prompt Learning初探

PL VS 传统的下游任务微调法：
fine-tuning做分类的为[CLS]位置的输出，而prompt learning为[mask]位置的输出，易演顶针，鉴定为醍醐灌顶。

另外，既然本词表是uncased，那[mask]和[MASK]，[CLS]和[cls]应该是一样的。

固定bert参数的方法

参数！直接给我冻结！！

手动反向传播和优化器

点击这里查看

$loss=(x*w-y)^2$
$grad_w=2*(x*w-y)*x$

Visual Prompt Tuning (VPT)

固定住bert参数试一试

优点：新产生的参数量少
d一般为某数的平方（如果输入图像为正方形）
Swin多头？
d_model / h

Adversarial Multi-task Learning for Text Classification

节省空间

reStructured Pre-training

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer「谷歌：T5」

智慧的眼神？

$a + b)_{min} = a_{min} + b_{min}$

Next
自然语言处理NLP文本分类顶会论文阅读笔记（二）

以防笔记丢失，先发布为妙(●’◡’●)周更ing…

你可能感兴趣的:(NLP,深度学习,自然语言处理,人工智能,python,nlp)

如何使用Visual Studio Code调试PHP CLI应用和Web应用
Python中的class体内定义方法时，如果没有显式地包含self参数，有时候依然可以被调用。这是一个非常有趣的话题，因为它涉及到对Python中类与对象之间关系的更深理解。要理解为什么这种情况下方法依然能够被调用，我们需要逐步拆解Python类的构造方式以及方法绑定的原理。
理解 Python 的 Dataclasses Cater Chen python
理解Python的Dataclasseshttps://zhuanlan.zhihu.com/p/59657729
Python中dataclass库 SteveKenny #Python第三方库 python 开发语言哈希算法
文章目录dataclass语法一、简介二、装饰器参数三、数据属性1、参数2、使用示例3、注意事项四、其他1、常用函数2、继承3、总结dataclass语法一、简介官方文档的地址为：https://docs.python.org/3.9/library/dataclasses.htmldataclass的定义位于PEP-557，根据定义一个dataclass是指“一个带有默认值的可变的namedtu
深度学习(1) 浅忆へ梦微凉深度学习人工智能深度学习学习方法 python
一、torch的安装基于直接设备情况，选择合适的torch版本，有显卡的建议安装GPU版本，可以通过nvidia-smi命令来查看显卡驱动的版本，在官网中根据cuda版本，选择合适的版本号，下面是安装示例代码GPU：pipinstalltorch==2.5.0torchvision==0.20.0torchaudio==2.5.0--index-urlhttps://download.pytorc
提升数据科学工作流效率的10个Jupyter Notebook高级特性
JupyterNotebooks已成为数据科学家、机器学习工程师和Python开发人员的核心开发工具。其核心优势在于提供了一个集成式环境，支持代码执行、文本编辑和数据可视化的无缝整合。尽管大多数用户熟悉其基本功能，但许多能显著提升工作效率的高级特性往往被忽视。本文将介绍一些高级功能，帮助您在数据科学项目中充分发挥JupyterNotebooks的潜力。1、Magic命令：高效的命令行接口Jupyt
Python 数据建模完整流程指南木觞清 3天入门Python python 开发语言
在数据科学和机器学习中，建模是一个至关重要的过程。通过有效的数据建模，我们能够从原始数据中提取有用的洞察，并为预测或分类任务提供支持。在本篇博客中，我们将通过Python展示数据建模的完整流程，包括数据准备、建模、评估和优化等步骤。1.导入必要的库在进行任何数据分析或建模之前，首先需要导入必需的Python库。这些库提供了各种工具和算法，帮助我们更高效地完成任务。importnumpyasnpim
python安装包提示Requirement already satisfied 可爱的的编辑人员配置 python numpy 开发语言
python安装cnocr提示Requirementalreadysatisfied错误信息如下：Requirementalreadysatisfied:pyreadline3inc:\python310\lib\site-packages(fromhumanfriendly>=9.1->coloredlogs->onnxruntime->cnocr)(3.4.1)Installingcollec
Python3 打包成.exe（附IndexError: tuple index out of range错误解决） zdxdxd python python
1.首先下载PyInstaller官网地址入口已经安装了pip，直接在命令行输入：pipinstallpyinstaller2.进行打包进入需要被打包目录，输入：pyinstaller-Fyourprogram.py//-F打包成一个.exe文件另外，可以在官方的说明里看到pyinstaller目前并不支持python3.6，如下图：所以，在打包python3.6的程序时，会出现“:IndexEr
nlp培训重点-3 heine162 自然语言处理人工智能
1.文本匹配分类：loader:#-*-coding:utf-8-*-importjsonimportreimportosimporttorchimportrandomimportloggingfromtorch.utils.dataimportDataset,DataLoaderfromcollectionsimportdefaultdictfromtransformersimportBertT
Python爬虫爬取1万首音乐代码 EasySoft易软 python
importrequestsbase_url=“http://music.163.com/song/media/outer/url?id=”start_id=200000end_id=210000formusic_idinrange(start_id,end_id+1):song_url=base_url+str(music_id)response=requests.get(song_url,st
深度学习常用格式转化脚本xml2yolo/coco2yolo/bdd2yolo/frame2video等 qq1309399183 计算机视觉实战项目集合深度学习人工智能格式转化脚本 voc2yolo格式转化数据集格式转换 xml2yolo coco2yolo
文章目录1.**数据集格式转换脚本**`coco2yolo.py`示例注释：注释说明：`xml2yolo.py`示例注释：注释说明：2.**数据集可视化与统计**`vis_yolo_files.py`示例注释：注释说明：3.**其他工具脚本**`frames2video.py`示例注释：注释说明：该项目提供了一系列用于深度学习的数据处理工具，主要功能包括：数据集格式转换：提供多种脚本，将不同格式的
NVIDIA下一代Hopper架构曝光，采用5nm工艺晶体管超1400亿 Java小海. 架构人工智能后端程序人生 java
据媒体报道，NVIDIA下一代主要面向高性能计算、人工智能等Hopper架构，将会采用5nm工艺制程，晶体管多达1400亿个，面积核心达到了900平方毫米，是有史以来最大的GPU。作为参考，NVIDIA自家旗舰Ampere架构的A100为542亿个晶体管（每平方毫米约为6560万个晶体管），AMD阵营中采用Aldebaran架构的InstinctMI200系列为582亿个晶体管（每平方米约为736
NVIDIA Hopper解说白总Server redis 数据库缓存 rust mongodb 大数据数据仓库
NVIDIAHopper架构是NVIDIA推出的面向高性能计算（HPC）和人工智能（AI）的最新一代GPU架构。它代表了NVIDIA在加速计算领域的最新进展，旨在为AI训练和推理、HPC应用提供前所未有的性能提升。NVIDIAHopper架构和基于该架构的GPU产品H100的详细信息：NVIDIAHopper架构技术特点：第四代TensorCore：Hopper架构引入了第四代TensorCore
LLMs，即大型语言模型 maopig AI 语言模型人工智能自然语言处理
LLMs，即大型语言模型，是一类基于深度学习的人工智能模型，它们通过海量的数据和大量的计算资源进行训练，可以理解和生成自然语言。LLMs的核心架构是Transformer，其关键在于自注意力机制，使得模型能够同时对输入的所有位置进行“关注”，从而更好地捕捉长距离的语义依赖关系。LLMs在众多领域都有广泛的应用，如自然语言理解（NLU），语言生成，以及语音识别和合成等。例如，它们能够理解人类的语言
第79期 | GPTSecurity周报云起无垠 GPTSecurity AIGC gpt
GPTSecurity是一个涵盖了前沿学术研究和实践经验分享的社区，集成了生成预训练Transformer（GPT）、人工智能生成内容（AIGC）以及大语言模型（LLM）等安全领域应用的知识。在这里，您可以找到关于GPT/AIGC/LLM最新的研究论文、博客文章、实用的工具和预设指令（Prompts）。现为了更好地知悉近一周的贡献内容，现总结如下。SecurityPapers1.TrojanWhi
第60期 | GPTSecurity周报云起无垠 GPTSecurity 人工智能语言模型网络安全
GPTSecurity是一个涵盖了前沿学术研究和实践经验分享的社区，集成了生成预训练Transformer（GPT）、人工智能生成内容（AIGC）以及大语言模型（LLM）等安全领域应用的知识。在这里，您可以找到关于GPT/AIGC/LLM最新的研究论文、博客文章、实用的工具和预设指令（Prompts）。现为了更好地知悉近一周的贡献内容，现总结如下。SecurityPapers1.映射你的模型：评估
利用python向modbus RTU设备（RS485串口通信）发送16进制指令无名小白12138 python 开发语言
importserialimporttime#配置串口通信参数serial_port='COM1'baudrate=9600#创建串口连接ser=serial.Serial(port=serial_port,baudrate=baudrate)#检查串口是否已打开ifser.is_open:try:hex_data=bytes.fromhex('01100010000102000564C3')se
Python的简介-课前甜点 cheese-liang Python实用小技巧 python 开发语言
Python的简介-课前甜点1.`Python`需求的任务2.Python代码的实现3.代码修改的位置4.运行结果5.注意事项6.其他文章链接快来试试吧Python的简介点击这里也可以查看1.Python需求的任务如果您的工作主要是用电脑完成的，总有一天您会想能不能自动执行一些任务。比如，对大量文本文件执行查找、替换操作；利用复杂的规则重命名、重排序一堆照片文件；也可能您想编写一个小型数据库、或开
【第十章——数据可视化之地图构建】【最新！黑马程序员Python自学课程笔记】课上笔记+案例源码+作业源码嗯哈！信息可视化 python 笔记 pycharm
第十章-数据可视化之地图构建10.1数据可视化-地图-基础地图使用注意！！！现在的版本，需要加：省，市"""演示地图可视化的基本使用"""frompyecharts.chartsimportMapfrompyecharts.optionsimportVisualMapOpts#准备地图对象map=Map()#准备数据data=[("北京市",9),("上海市",8),("湖南省",5),("台湾省
【LLM】大语言模型（LLMs）林九生人工智能语言模型人工智能自然语言处理
大型语言模型（LLMs）1.什么是大型语言模型？大型语言模型（LargeLanguageModel，LLM）是基于深度学习的自然语言处理模型，能够理解和生成自然语言文本。它们通过在大规模文本数据上进行训练，学习语言的语法、语义和各种语言特征，从而可以执行诸如文本生成、翻译、总结、问答等多种语言任务。以下是大型语言模型的定义和基本原理：1.1定义大型语言模型是由大量参数组成的神经网络，这些参数通过在
Python phonenumbers 库详解：号码解析与验证的利器萧鼎 python基础到进阶教程 python
Pythonphonenumbers库详解：手机号解析与验证的利器在开发涉及电话号码的应用时，尤其是全球化的应用，处理电话号码是一个常见的需求。不同国家的电话格式各异，如何有效地验证、格式化、解析这些号码呢？phonenumbers库就是一个专为此目的设计的Python库，可以帮助我们轻松处理电话号码的验证和格式化。1.phonenumbers是什么？phonenumbers是一个Python库，
【Python】python dataclass使用指南菜菜2022 Python学习笔记 Python dataclass 字段继承数据结构
https://www.cnblogs.com/apocelipes/p/10284346.html定义一个dataclass深入dataclass装饰器数据类的基石——dataclasses.field一些常用函数dataclass继承参考https://docs.python.org/3.7/library/dataclasses.htmlhttps://www.python.org/dev/
全新 Hopper 架构的Transformer 引擎有什么特点？扫地的小何尚人工智能
Transformer引擎是全新Hopper架构的一部分，将显著提升AI性能和功能，并助力在几天或几小时内训练大型模型。Transformer模型是当今广泛使用的语言模型（例如asBERT和GPT-3）的支柱。Transformer模型最初针对自然语言处理用例而开发，但因其通用性，现在逐步应用于计算机视觉、药物研发等领域。与此同时，模型大小不断呈指数级增长，现在已达到数万亿个参数。由于计算量巨大，
大语言模型（LLMs）入门教程（非常详细）从零基础入门到精通，看完这一篇就够了大模型零基础教程语言模型人工智能自然语言处理大模型
大语言模型（LLMs）作为人工智能（AI）领域的一项突破性发展，已经改变了自然语言处理（NLP）和机器学习（ML）应用的面貌。这些模型，包括OpenAI的GPT-4o和Google的gemini系列等，已经展现出了在理解和生成类人文本方面的令人印象深刻的能力，使它们成为各行各业的宝贵工具。如下这份指南将涵盖LLMs的基础知识、训练过程、用例和未来趋势……一.WhatareLargeLanguage
数据类库 Dataclasses 深入指南陆或愉
数据类库Dataclasses深入指南dataclassDataclassesforTypeScript&JavaScript项目地址:https://gitcode.com/gh_mirrors/da/dataclass项目介绍数据类库（Dataclasses）是Python标准库中的一个模块，自Python3.7起被引入。它通过装饰器@dataclass简化了自定义类的创建过程，自动添加了属性
BERT详解 comli_cn 大模型笔记 bert 人工智能深度学习
1.背景结构1.1基础知识BERT（BidirectionalEncoderRepresentationsfromTransformers）是谷歌提出，作为一个Word2Vec的替代者，其在NLP领域的11个方向大幅刷新了精度，可以说是前几年来自残差网络最优突破性的一项技术了。论文的主要特点以下几点：使用了双向Transformer作为算法的主要框架，之前的模型是从左向右输入一个文本序列，或者将l
华为OD机试E卷 - 跳格子3 （Java & Python& JS & C++ & C ）算法大师最新华为OD机试华为od java python c语言 javascript c++华为OD机试E卷 -跳格子3
最新华为OD机试真题目录：点击查看目录华为OD面试真题精选：点击立即查看题目描述小明和朋友们一起玩跳格子游戏，每个格子上有特定的分数score=[1,-1,-6,7,-17,7]，从起点score[0]开始，每次最大的步长为k，请你返回小明跳到终点score[n-1]时，能得到的最大得分。输入描述第一行输入总的格子数量n第二行输入每个格子的分数score[i]第三行输入最大跳的步长k备注格子的总长
华为OD机试E卷 - 查找接口成功率最优时间段（Java & Python& JS & C++ & C ）算法大师最新华为OD机试华为od java python 华为OD机试E卷 javascript c++
最新华为OD机试真题目录：点击查看目录华为OD面试真题精选：点击立即查看题目描述服务之间交换的接口成功率作为服务调用关键质量特性，某个时间段内的接口失败率使用一个数组表示，数组中每个元素都是单位时间内失败率数值，数组中的数值为0~100的整数，给定一个数值(minAverageLost)表示某个时间段内平均失败率容忍值，即平均失败率小于等于minAverageLost，找出数组中最长时间段，如果未
华为OD机试CD卷- 跳格子3（Java & Python& JS & C++ & C ）算法大师最新华为OD机试华为od c语言 c++java javascript
题目描述小明和朋友们一起玩跳格子游戏，每个格子上有特定的分数score=[1,-1,-6,7,-17,7]，从起点score[0]开始，每次最大的步长为k，请你返回小明跳到终点score[n-1]时，能得到的最大得分。输入描述第一行输入总的格子数量n第二行输入每个格子的分数score[i]第三行输入最大跳的步长k输出描述输出最大得分备注格子的总长度n和步长k的区间在[1,100000]每个格子的分
chatgpt赋能python：Python：免费下载音乐的神器 atest166 ChatGpt python chatgpt 开发语言计算机
Python：免费下载音乐的神器Python是一种优秀的编程语言，在各个领域都有广泛的应用。如果你是一个音乐爱好者，那么Python可以帮助你轻松免费下载你喜欢的音乐。介绍在过去，许多网站和应用程序提供免费下载音乐的服务，但现在这些服务已经不复存在。然而，Python可以帮助你免费下载音乐，并且速度非常快。Python有许多库可以帮助你从网上下载免费音乐。其中，最著名的库是"youtube_dl"
html页面js获取参数值 0624chenhong html
1.js获取参数值js function GetQueryString(name) { var reg = new RegExp("(^|&)"+ name +"=([^&]*)(&|$)"); var r = windo
MongoDB 在多线程高并发下的问题 BigCat2013 mongodb DB 高并发重复数据
最近项目用到 MongoDB , 主要是一些读取数据及改状态位的操作. 因为是结合了最近流行的 Storm进行大数据的分析处理，并将分析结果插入Vertica数据库，所以在多线程高并发的情境下, 会发现 Vertica 数据库中有部分重复的数据. 这到底是什么原因导致的呢？笔者开始也是一筹莫展，重复去看 MongoDB 的 API , 终于有了新发现： com.mongodb.DB 这个类有
c++ 用类模版实现链表(c++语言程序设计第四版示例代码) CrazyMizzz 数据结构 C++
#include<iostream> #include<cassert> using namespace std; template<class T> class Node { private: Node<T> * next; public: T data;
最近情况麦田的设计者感慨考试生活
在五月黄梅天的岁月里，一年两次的软考又要开始了。到目前为止，我已经考了多达三次的软考，最后的结果就是通过了初级考试（程序员）。人啊，就是不满足，考了初级就希望考中级，于是，这学期我就报考了中级，明天就要考试。感觉机会不大，期待奇迹发生吧。这个学期忙于练车，写项目，反正最后是一团糟。后天还要考试科目二。这个星期真的是很艰难的一周，希望能快点度过。
linux系统中用pkill踢出在线登录用户被触发 linux
由于linux服务器允许多用户登录，公司很多人知道密码，工作造成一定的障碍所以需要有时踢出指定的用户 1/#who 查出当前有那些终端登录（用 w 命令更详细） # who root pts/0 2010-10-28 09:36 (192
仿QQ聊天第二版肆无忌惮_ qq
在第一版之上的改进内容: 第一版链接: http://479001499.iteye.com/admin/blogs/2100893 用map存起来号码对应的聊天窗口对象,解决私聊的时候所有消息发到一个窗口的问题. 增加ViewInfo类,这个是信息预览的窗口,如果是自己的信息,则可以进行编辑. 信息修改后上传至服务器再告诉所有用户,自己的窗口
java读取配置文件知了ing
1，java读取.properties配置文件 InputStream in; try { in = test.class.getClassLoader().getResourceAsStream("config/ipnetOracle.properties");//配置文件的路径 Properties p = new Properties()
__attribute__ 你知多少？矮蛋蛋 C++gcc
原文地址: http://www.cnblogs.com/astwish/p/3460618.html GNU C 的一大特色就是__attribute__ 机制。__attribute__ 可以设置函数属性（Function Attribute ）、变量属性（Variable Attribute ）和类型属性（Type Attribute ）。 __attribute__ 书写特征是：
jsoup使用笔记 alleni123 java 爬虫 JSoup
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.7.3</version> </dependency> 2014/08/28 今天遇到这种形式，
JAVA中的集合 Collectio 和Map的简单使用及方法百合不是茶 list map set
List ,set ,map的使用方法和区别 java容器类类库的用途是保存对象，并将其分为两个概念： Collection集合：一个独立的序列，这些序列都服从一条或多条规则;List必须按顺序保存元素，set不能重复元素；Queue按照排队规则来确定对象产生的顺序（通常与他们被插入的
杀LINUX的JOB进程 bijian1013 linux unix
今天发现数据库一个JOB一直在执行，都执行了好几个小时还在执行，所以想办法给删除掉系统环境： ORACLE 10G Linux操作系统操作步骤如下：第一步.查询出来那个job在运行，找个对应的SID字段 select * from dba_jobs_running--找到job对应的sid &n
Spring AOP详解 bijian1013 java spring AOP
最近项目中遇到了以下几点需求，仔细思考之后，觉得采用AOP来解决。一方面是为了以更加灵活的方式来解决问题，另一方面是借此机会深入学习Spring AOP相关的内容。例如，以下需求不用AOP肯定也能解决，至于是否牵强附会，仁者见仁智者见智。 1.对部分函数的调用进行日志记录，用于观察特定问题在运行过程中的函数调用
[Gson六]Gson类型适配器(TypeAdapter) bit1129 Adapter
TypeAdapter的使用动机 Gson在序列化和反序列化时，默认情况下，是按照POJO类的字段属性名和JSON串键进行一一映射匹配，然后把JSON串的键对应的值转换成POJO相同字段对应的值，反之亦然，在这个过程中有一个JSON串Key对应的Value和对象之间如何转换(序列化/反序列化)的问题。以Date为例，在序列化和反序列化时，Gson默认使用java.
【spark八十七】给定Driver Program，如何判断哪些代码在Driver运行，哪些代码在Worker上执行 bit1129 driver
Driver Program是用户编写的提交给Spark集群执行的application，它包含两部分作为驱动： Driver与Master、Worker协作完成application进程的启动、DAG划分、计算任务封装、计算任务分发到各个计算节点(Worker)、计算资源的分配等。计算逻辑本身，当计算任务在Worker执行时，执行计算逻辑完成application的计算任务
nginx 经验总结 ronin47 nginx 总结
　　　深感nginx的强大，只学了皮毛，把学下的记录。　　　获取Header 信息，一般是以$http_XX（ＸＸ是小写）获取body,通过接口，再展开，根据Ｋ取Ｖ　　　获取uri,以$arg_XX &n
轩辕互动-1.求三个整数中第二大的数2.整型数组的平衡点 bylijinnan 数组
import java.util.ArrayList; import java.util.Arrays; import java.util.List; public class ExoWeb { public static void main(String[] args) { ExoWeb ew=new ExoWeb(); System.out.pri
Netty源码学习-Java-NIO-Reactor bylijinnan java 多线程 netty
Netty里面采用了NIO-based Reactor Pattern 了解这个模式对学习Netty非常有帮助参考以下两篇文章： http://jeewanthad.blogspot.com/2013/02/reactor-pattern-explained-part-1.html http://gee.cs.oswego.edu/dl/cpjslides/nio.pdf
AOP通俗理解 cngolon spring AOP
1.我所知道的aop 初看aop,上来就是一大堆术语，而且还有个拉风的名字，面向切面编程，都说是OOP的一种有益补充等等。一下子让你不知所措，心想着：怪不得很多人都和我说aop多难多难。当我看进去以后，我才发现：它就是一些java基础上的朴实无华的应用，包括ioc，包括许许多多这样的名词，都是万变不离其宗而已。 2.为什么用aop&nb
cursor variable 实例 ctrain variable
create or replace procedure proc_test01 as type emp_row is record( empno emp.empno%type, ename emp.ename%type, job emp.job%type, mgr emp.mgr%type, hiberdate emp.hiredate%type, sal emp.sal%t
shell报bash: service: command not found解决方法 daizj linux shell service jps
今天在执行一个脚本时，本来是想在脚本中启动hdfs和hive等程序，可以在执行到service hive-server start等启动服务的命令时会报错，最终解决方法记录一下：脚本报错如下： ./olap_quick_intall.sh: line 57: service: command not found ./olap_quick_intall.sh: line 59
40个迹象表明你还是PHP菜鸟 dcj3sjt126com 设计模式 PHP 正则表达式 oop
你是PHP菜鸟，如果你：1. 不会利用如phpDoc 这样的工具来恰当地注释你的代码2. 对优秀的集成开发环境如Zend Studio 或Eclipse PDT 视而不见3. 从未用过任何形式的版本控制系统，如Subclipse4. 不采用某种编码与命名标准，以及通用约定，不能在项目开发周期里贯彻落实5. 不使用统一开发方式6. 不转换（或）也不验证某些输入或SQL查询串（译注：参考PHP相关函
Android逐帧动画的实现 dcj3sjt126com android
一、代码实现： private ImageView iv; private AnimationDrawable ad; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout
java远程调用linux的命令或者脚本 eksliang linux ganymed-ssh2
转载请出自出处： http://eksliang.iteye.com/blog/2105862 Java通过SSH2协议执行远程Shell脚本(ganymed-ssh2-build210.jar) 使用步骤如下： 1.导包官网下载: http://www.ganymed.ethz.ch/ssh2/ ma
adb端口被占用问题 gqdy365 adb
最近重新安装的电脑，配置了新环境，老是出现： adb server is out of date. killing... ADB server didn't ACK * failed to start daemon * 百度了一下，说是端口被占用，我开个eclipse，然后打开cmd，就提示这个，很烦人。一个比较彻底的解决办法就是修改
ASP.NET使用FileUpload上传文件 hvt .net C#hovertree asp.net webform
前台代码： <asp:FileUpload ID="fuKeleyi" runat="server" /> <asp:Button ID="BtnUp" runat="server" onclick="BtnUp_Click" Text="上传" />
代码之谜（四）- 浮点数（从惊讶到思考） justjavac 浮点数精度代码之谜 IEEE
在『代码之谜』系列的前几篇文章中，很多次出现了浮点数。浮点数在很多编程语言中被称为简单数据类型，其实，浮点数比起那些复杂数据类型（比如字符串）来说，一点都不简单。单单是说明 IEEE浮点数就可以写一本书了，我将用几篇博文来简单的说说我所理解的浮点数，算是抛砖引玉吧。一次面试记得多年前我招聘 Java 程序员时的一次关于浮点数、二分法、编码的面试，多年以后，他已经称为了一名很出色的
数据结构随记_1 lx.asymmetric 数据结构笔记
第一章 1.数据结构包括数据的逻辑结构、数据的物理/存储结构和数据的逻辑关系这三个方面的内容。 2.数据的存储结构可用四种基本的存储方法表示，它们分别是顺序存储、链式存储、索引存储和散列存储。 3.数据运算最常用的有五种，分别是查找/检索、排序、插入、删除、修改。 4.算法主要有以下五个特性：输入、输出、可行性、确定性和有穷性。 5.算法分析的
linux的会话和进程组网络接口 linux
会话：一个或多个进程组。起于用户登录，终止于用户退出。此期间所有进程都属于这个会话期。会话首进程：调用setsid创建会话的进程1.规定组长进程不能调用setsid，因为调用setsid后，调用进程会成为新的进程组的组长进程.如何保证？先调用fork，然后终止父进程，此时由于子进程的进程组ID为父进程的进程组ID，而子进程的ID是重新分配的，所以保证子进程不会是进程组长，从而子进程可以调用se
二维数组元素的连续求解 1140566087 二维数组 ACM
import java.util.HashMap; public class Title { public static void main(String[] args){ f(); } // 二位数组的应用 //12、二维数组中，哪一行或哪一列的连续存放的0的个数最多，是几个0。注意，是“连续”。 public static void f(){
也谈什么时候Java比C++快 windshome java C++
刚打开iteye就看到这个标题“Java什么时候比C++快”，觉得很好笑。你要比，就比同等水平的基础上的相比，笨蛋写得C代码和C++代码，去和高手写的Java代码比效率，有什么意义呢？我是写密码算法的，深刻知道算法C和C++实现和Java实现之间的效率差，甚至也比对过C代码和汇编代码的效率差，计算机是个死的东西，再怎么优化，Java也就是和C