华师数据学院·王嘉宁

NLP常用损失函数代码实现——SoftMax/Contrastive/Triplet/Similarity

NLP常用损失函数代码实现

NLP常用的损失函数主要包括多类分类（SoftMax + CrossEntropy）、对比学习（Contrastive Learning）、三元组损失（Triplet Loss）和文本相似度（Sentence Similarity）。其中分类和文本相似度是非常常用的两个损失函数，对比学习和三元组损失则是近两年比较新颖的自监督损失函数。

本文不是对损失函数的理论讲解，只是简单对这四个损失函数进行了实现，方便在模型实验中快速嵌入损失函数模块。为了能够快速直观地看到损失函数的执行过程和结果，本文基于HuggingFace-BERT实现简单的演示（没有训练过程）。读者可以在自己的模型框架中直接嵌套相应的损失函数。

一、分类损失——SoftMax+CrossEntropy

分类损失表示输入一个句子（或一个句子对），对齐进行多类分类。代码如下所示：

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 16:25
# @Author  : Jianing Wang
# @Email   : [email protected]
# @File    : SoftmaxLayerWithLoss.py
# !/usr/bin/env python
# coding=utf-8

import torch
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig

class SoftmaxLayerWithLoss(nn.Module):
    """
    This loss aims to calculate softmax between input sentences (pairs) with labels

    @:param hidden_dim: The hidden dimension
    @:param num_labels: The number of labels
    @:param is_sentence_pair: (bool) Whether to feed sentence pair
    @:param combine_type: The type of combination of sentence pair:
    - cat: rep = torch.cat([rep_a, rep_b], -1)
    - diff: rep = rep_a - rep_b
    - mul: rep = rep_a * rep_b
    - avg: rep =  (rep_a + rep_b) / 2.0
    - sum: rep = rep_a + rep_b



    """
    def __init__(self,
                 hidden_dim: int,
                 num_labels: int,
                 is_sentence_pair=False,
                 combine_type='cat', # cat / diff / mul / avg / sum
                 ):
        super(SoftmaxLayerWithLoss, self).__init__()
        self.hidden_dim = hidden_dim
        self.num_labels = num_labels
        self.is_sentence_pair = is_sentence_pair
        self.combine_type = combine_type
        assert self.combine_type in ['cat', 'diff', 'mul', 'avg', 'sum']
        if self.combine_type == 'cat':
            self.hidden_dim = self.hidden_dim * 2

        self.classifier = nn.Linear(self.hidden_dim, num_labels)

    def forward(self, rep_a, rep_b=None, label: Tensor=None):
        # rep_a: [batch_size, hidden_dim]
        # rep_b: [batch_size, hidden_dim]
        rep = None
        if self.combine_type == 'cat':
            rep = torch.cat([rep_a, rep_b], -1)

        if self.combine_type == 'diff':
            rep = rep_a - rep_b

        if self.combine_type == 'mul':
            rep = rep_a * rep_b

        if self.combine_type == 'avg':
            rep = (rep_a + rep_b) / 2

        if self.combine_type == 'sum':
            rep = rep_a + rep_b

        output = self.classifier(rep)
        loss_fct = nn.CrossEntropyLoss()

        if label is not None:
            loss = loss_fct(output, label.view(-1))
            return loss
        else:
            return rep, output


if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    examples1 = ['This is the book.', 'Disney film is well seeing for us.']
    examples2 = ['I love to read it.', 'I don\'t want to have a try due to the hardness.']
    label = [1, 0]
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    features1 = tokenizer(examples1, add_special_tokens=True, padding=True)
    features2 = tokenizer(examples2, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 16
    features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}
    features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}
    label = torch.Tensor(label).long()
    # obtain sentence embedding by averaged pooling
    rep_a = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_b = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_a = torch.mean(rep_a, -1)  # [batch_size, hidden_dim]
    rep_b = torch.mean(rep_b, -1)  # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = SoftmaxLayerWithLoss(hidden_dim=rep_a.shape[-1], num_labels=2, is_sentence_pair=True, combine_type='cat')
    loss = loss_fn(rep_a=rep_a, rep_b=rep_b, label=label)
    print(loss) # tensor(0.6986, grad_fn=)

二、文本相似度损失

文本相似度旨在对两个句子计算其余弦相似度。余弦相似度作为概率值，损失函数则为MSE，代码如下所示：

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 16:55
# @Author  : Jianing Wang
# @Email   : [email protected]
# @File    : SimilarityLoss.py
# !/usr/bin/env python
# coding=utf-8

import torch
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig


class CosineSimilarityLoss(nn.Module):
    """
    CosineSimilarityLoss expects, that the InputExamples consists of two texts and a float label.

    It computes the vectors u = model(input_text[0]) and v = model(input_text[1]) and measures the cosine-similarity between the two.
    By default, it minimizes the following loss: ||input_label - cos_score_transformation(cosine_sim(u,v))||_2.

    :param loss_fct: Which pytorch loss function should be used to compare the cosine_similartiy(u,v) with the input_label? By default, MSE:  ||input_label - cosine_sim(u,v)||_2
    :param cos_score_transformation: The cos_score_transformation function is applied on top of cosine_similarity. By default, the identify function is used (i.e. no change).

    """
    def __init__(self, loss_fct = nn.MSELoss(), cos_score_transformation=nn.Identity()):
        super(CosineSimilarityLoss, self).__init__()
        self.loss_fct = loss_fct
        self.cos_score_transformation = cos_score_transformation

    def forward(self, rep_a, rep_b, label: Tensor):
        # rep_a: [batch_size, hidden_dim]
        # rep_b: [batch_size, hidden_dim]
        output = self.cos_score_transformation(torch.cosine_similarity(rep_a, rep_b))
        # print(output) # tensor([0.9925, 0.5846], grad_fn=), tensor(0.1709, grad_fn=)
        return self.loss_fct(output, label.view(-1))

if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    examples1 = ['Beijing is one of the biggest city in China.', 'Disney film is well seeing for us.']
    examples2 = ['Shanghai is the largest city in east of China.', 'ACL 2021 will be held in line due to COVID-19.']
    label = [1, 0]
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    features1 = tokenizer(examples1, add_special_tokens=True, padding=True)
    features2 = tokenizer(examples2, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 24
    features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}
    features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}
    label = torch.Tensor(label).long()
    # obtain sentence embedding by averaged pooling
    rep_a = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_b = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_a = torch.mean(rep_a, -1)  # [batch_size, hidden_dim]
    rep_b = torch.mean(rep_b, -1)  # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = CosineSimilarityLoss()
    loss = loss_fn(rep_a=rep_a, rep_b=rep_b, label=label)
    print(loss) # tensor(0.1709, grad_fn=)

三、对比损失

对比学习（Contrastive Learning）指的是给定一个anchor以及若干候选项。anchor表示一个确定的特征向量，或由神经网络（例如BERT）表征的向量，candidate则是一组候选项，其中包含positive（与anchor同类）和若干negative（与anchor不同类）。对比学习的目标是尽可能让同类的相似度更大，不同类的相似度越小。详细可看如下代码以及实例：

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 14:50
# @Author  : Jianing Wang
# @Email   : [email protected]
# @File    : ContrastiveLoss.py
# !/usr/bin/env python
# coding=utf-8

from enum import Enum
import torch
import torch.nn.functional as F
from torch import nn, Tensor
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig

class SiameseDistanceMetric(Enum):
    """
    The metric for the contrastive loss
    """
    EUCLIDEAN = lambda x, y: F.pairwise_distance(x, y, p=2)
    MANHATTAN = lambda x, y: F.pairwise_distance(x, y, p=1)
    COSINE_DISTANCE = lambda x, y: 1-F.cosine_similarity(x, y)


class ContrastiveLoss(nn.Module):
    """
    Contrastive loss. Expects as input two texts and a label of either 0 or 1. If the label == 1, then the distance between the
    two embeddings is reduced. If the label == 0, then the distance between the embeddings is increased.

    @:param distance_metric: The distance metric function
    @:param margin: (float) The margin distance
    @:param size_average: (bool) Whether to get averaged loss

    Input example of forward function:
        rep_anchor: [[0.2, -0.1, ..., 0.6], [0.2, -0.1, ..., 0.6], ..., [0.2, -0.1, ..., 0.6]]
        rep_candidate: [[0.3, 0.1, ...m -0.3], [-0.8, 1.2, ..., 0.7], ..., [-0.9, 0.1, ..., 0.4]]
        label: [0, 1, ..., 1]

    Return example of forward function:
        0.015 (averged)
        2.672 (sum)
    """

    def __init__(self, distance_metric=SiameseDistanceMetric.COSINE_DISTANCE, margin: float = 0.5, size_average:bool = False):
        super(ContrastiveLoss, self).__init__()
        self.distance_metric = distance_metric
        self.margin = margin
        self.size_average = size_average

    def forward(self, rep_anchor, rep_candidate, label: Tensor):
        # rep_anchor: [batch_size, hidden_dim] denotes the representations of anchors
        # rep_candidate: [batch_size, hidden_dim] denotes the representations of positive / negative
        # label: [batch_size, hidden_dim] denotes the label of each anchor - candidate pair

        distances = self.distance_metric(rep_anchor, rep_candidate)
        losses = 0.5 * (label.float() * distances.pow(2) + (1 - label).float() * F.relu(self.margin - distances).pow(2))
        return losses.mean() if self.size_average else losses.sum()


if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    examples1 = ['This is the sentence anchor 1.', 'It is the second sentence in this article named Section D.']
    examples2 = ['It is the same as anchor 1.', 'I think it is different with Section D.']
    label = [1, 0]
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    features1 = tokenizer(examples1, add_special_tokens=True, padding=True)
    features2 = tokenizer(examples2, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 16
    features1 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features1.items()}
    features2 = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in features2.items()}
    label = torch.Tensor(label).long()
    # obtain sentence embedding by averaged pooling
    rep_anchor = model(**features1)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_candidate = model(**features2)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_anchor = torch.mean(rep_anchor, -1) # [batch_size, hidden_dim]
    rep_candidate = torch.mean(rep_candidate, -1) # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = ContrastiveLoss()
    loss = loss_fn(rep_anchor=rep_anchor, rep_candidate=rep_candidate, label=label)
    print(loss) # tensor(0.0869, grad_fn=)

四、三元组损失

三元组损失（Triplet Loss）与对比学习比较类似，其旨在拉近anchor与positive的距离，拉开anchor与negative的距离。不同之处在于Triplet Loss考虑到anchor与其他表征向量的最小距离margin值，损失函数则是margin loss。代码如下所示：

# -*- coding: utf-8 -*-
# @Time    : 2022/03/23 15:25
# @Author  : Jianing Wang
# @Email   : [email protected]
# @File    : TripletLoss.py
# !/usr/bin/env python
# coding=utf-8

from enum import Enum
import torch
from torch import nn, Tensor
import torch.nn.functional as F
from transformers.models.bert.modeling_bert import BertModel
from transformers import BertTokenizer, BertConfig

class TripletDistanceMetric(Enum):
    """
    The metric for the triplet loss
    """
    COSINE = lambda x, y: 1 - F.cosine_similarity(x, y)
    EUCLIDEAN = lambda x, y: F.pairwise_distance(x, y, p=2)
    MANHATTAN = lambda x, y: F.pairwise_distance(x, y, p=1)

class TripletLoss(nn.Module):
    """
    This class implements triplet loss. Given a triplet of (anchor, positive, negative),
    the loss minimizes the distance between anchor and positive while it maximizes the distance
    between anchor and negative. It compute the following loss function:

    loss = max(||anchor - positive|| - ||anchor - negative|| + margin, 0).

    Margin is an important hyperparameter and needs to be tuned respectively.

    @:param distance_metric: The distance metric function
    @:param triplet_margin: (float) The margin distance

    Input example of forward function:
        rep_anchor: [[0.2, -0.1, ..., 0.6], [0.2, -0.1, ..., 0.6], ..., [0.2, -0.1, ..., 0.6]]
        rep_candidate: [[0.3, 0.1, ...m -0.3], [-0.8, 1.2, ..., 0.7], ..., [-0.9, 0.1, ..., 0.4]]
        label: [0, 1, ..., 1]

    Return example of forward function:
        0.015 (averged)
        2.672 (sum)

    """
    def __init__(self, distance_metric=TripletDistanceMetric.EUCLIDEAN, triplet_margin: float = 0.5):
        super(TripletLoss, self).__init__()
        self.distance_metric = distance_metric
        self.triplet_margin = triplet_margin


    def forward(self, rep_anchor, rep_positive, rep_negative):
        # rep_anchor: [batch_size, hidden_dim] denotes the representations of anchors
        # rep_positive: [batch_size, hidden_dim] denotes the representations of positive, sometimes, it canbe dropout
        # rep_negative: [batch_size, hidden_dim] denotes the representations of negative
        # label: [batch_size, hidden_dim] denotes the label of each anchor - candidate pair
        distance_pos = self.distance_metric(rep_anchor, rep_positive)
        distance_neg = self.distance_metric(rep_anchor, rep_negative)

        losses = F.relu(distance_pos - distance_neg + self.triplet_margin)
        return losses.mean()


if __name__ == "__main__":
    # configure for huggingface pre-trained language models
    config = BertConfig.from_pretrained('bert-base-cased')
    # tokenizer for huggingface pre-trained language models
    tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
    # pytorch_model.bin for huggingface pre-trained language models
    model = BertModel.from_pretrained('bert-base-cased')
    # obtain two batch of examples, each corresponding example is a pair
    anchor_example = ['I am an anchor, which is the source example sampled from corpora.'] # anchor sentence
    positive_example = [
        'I am an anchor, which is the source example.',
        'I am the source example sampled from corpora.'
    ] # positive, which randomly dropout or noise from anchor
    negative_example = [
        'It is different with the anchor.',
        'My name is Jianing Wang, please give me some stars, thank you!'
    ] # negative, which randomly sampled from corpora
    # convert each example for feature
    # {'input_ids': xxx, 'attention_mask': xxx, 'token_tuype_ids': xxx}
    anchor_feature = tokenizer(anchor_example, add_special_tokens=True, padding=True)
    positive_feature = tokenizer(positive_example, add_special_tokens=True, padding=True)
    negative_feature = tokenizer(negative_example, add_special_tokens=True, padding=True)
    # padding and convert to feature batch
    max_seq_lem = 24
    anchor_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in anchor_feature.items()}
    positive_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in positive_feature.items()}
    negative_feature = {key: torch.Tensor([value + [0] * (max_seq_lem - len(value)) for value in values]).long() for key, values in negative_feature.items()}
    # obtain sentence embedding by averaged pooling
    rep_anchor = model(**anchor_feature)[0] # [1, max_seq_len, hidden_dim]
    rep_positive = model(**positive_feature)[0] # [batch_size, max_seq_len, hidden_dim]
    rep_negative = model(**negative_feature)[0] # [batch_size, max_seq_len, hidden_dim]
    # repeat
    rep_anchor = torch.mean(rep_anchor, -1) # [1, hidden_dim]
    rep_positive = torch.mean(rep_positive, -1) # [batch_size, hidden_dim]
    rep_negative = torch.mean(rep_negative, -1) # [batch_size, hidden_dim]
    # obtain contrastive loss
    loss_fn = TripletLoss()
    loss = loss_fn(rep_anchor=rep_anchor, rep_positive=rep_positive, rep_negative=rep_negative)
    print(loss) # tensor(0.5001, grad_fn=)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
如何在 Fork 的 GitHub 项目中保留自己的修改并同步上游更新？github_fork_update iBaoxing github
如何在Fork的GitHub项目中保留自己的修改并同步上游更新？在GitHub上Fork了一个项目后，你可能会对项目进行一些修改，同时原作者也在不断更新。如果想要在保留自己修改的基础上，同步原作者的最新更新，很多人会不知所措。本文将详细讲解如何在不丢失自己改动的情况下，将上游仓库的更新合并到自己的仓库中。问题描述假设你在GitHub上Fork了一个项目，并基于该项目做了一些修改，随后你发现原作者对
Python教程：一文了解使用Python处理XPath 旦莫 Python进阶 python 开发语言
目录1.环境准备1.1安装lxml1.2验证安装2.XPath基础2.1什么是XPath？2.2XPath语法2.3示例XML文档3.使用lxml解析XML3.1解析XML文档3.2查看解析结果4.XPath查询4.1基本路径查询4.2使用属性查询4.3查询多个节点5.XPath的高级用法5.1使用逻辑运算符5.2使用函数6.实战案例6.1从网页抓取数据6.1.1安装Requests库6.1.2代
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
四章-32-点要素的聚合彩云飘过
本文基于腾讯课堂老胡的课《跟我学Openlayers--基础实例详解》做的学习笔记，使用的openlayers5.3.xapi。源码见1032.html，对应的官网示例https://openlayers.org/en/latest/examples/cluster.htmlhttps://openlayers.org/en/latest/examples/earthquake-clusters.
高端密码学院笔记285 柚子_b4b4
高端幸福密码学院（高级班）幸福使者：李华第（598）期《幸福》之回归内在深层生命原动力基础篇——揭秘“激励”成长的喜悦心理案例分析主讲：刘莉一，知识扩充:成功=艰苦劳动+正确方法+少说空话。贪图省力的船夫，目标永远下游。智者的梦再美，也不如愚人实干的脚印。幸福早课堂2020.10.16星期五一笔记:1，重视和珍惜的前提是知道它的价值非常重要，当你珍惜了，你就真正定下来，真正的学到身上。2，大家需要
从0到500+，我是如何利用自媒体赚钱？一列脚印
运营公众号半个多月，从零基础的小白到现在慢慢懂了一些运营的知识。做好公众号是很不容易的，要做很多事情；排版、码字、引流…通通需要自己解决，业余时间全都花费在这上面涨这么多粉丝是真的不容易，对比知乎大佬来说，我们这种没资源，没人脉，还没钱的小透明来说，想要一个月涨粉上万，怕是今天没睡醒（不过你有的方法，算我piapia打脸）至少我是清醒的，自己慢慢努力，实现我的万粉目标！大家快来围观、支持我吧！孩子
探索OpenAI和LangChain的适配器集成：轻松切换模型提供商 nseejrukjhad langchain easyui 前端 python
#探索OpenAI和LangChain的适配器集成：轻松切换模型提供商##引言在人工智能和自然语言处理的世界中，OpenAI的模型提供了强大的能力。然而，随着技术的发展，许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具，集成了多种模型提供商，通过提供适配器，简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成，以便轻松切换模型提供商
使用Apify加载Twitter消息以进行微调的完整指南 nseejrukjhad twitter easyui 前端 python
#使用Apify加载Twitter消息以进行微调的完整指南##引言在自然语言处理领域，微调模型以适应特定任务是提升模型性能的常见方法。本文将介绍如何使用Apify从Twitter导出聊天信息，以便进一步进行微调。##主要内容###使用Apify导出推文首先，我们需要从Twitter导出推文。Apify可以帮助我们做到这一点。通过Apify的强大功能，我们可以批量抓取和导出数据，适用于各类应用场景。
深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
数组去重好奇的猫猫猫
整理自js中基础数据结构数组去重问题思考？如何去除数组中重复的项例如数组：[1,3,4,3,5]我们在做去重的时候，一开始想到的肯定是，逐个比较，外面一层循环，内层后一个与前一个一比较，如果是久不将当前这一项放进新的数组，挨个比较完之后返回一个新的去过重复的数组不好的实践方式上述方法效率极低，代码量还多，思考？有没有更好的方法这时候不禁一想当然有了！！！hashtable啊，通过对象的hash办法
关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript 二挡起步 web前端期末大作业 javascript html css 旅游风景
⛵源码获取文末联系✈Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业，Web大学生网页HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip
HTML网页设计制作大作业（div+css）云南我的家乡旅游景点带文字滚动二挡起步 web前端期末大作业 web设计网页规划与设计 html css javascript dreamweaver 前端
Web前端开发技术描述网页设计题材，DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作HTML期末大学生网页设计作业HTML：结构CSS：样式在操作方面上运用了html5和css3，采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScript：做与用户的交互行为文章目录前端学习路线
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
大伟说成语之唉声叹气求索大伟
＊大伟说成语＊【唉声叹气】叹气：因心里不痛快或不如意而吐出长气，发出声音。因为痛苦、憋闷或感伤而发出叹息的声音。【大伟说】情绪外露，非人类所特有，动物亦有情绪，悲哀和欢乐所表示的情绪亦是不一样的，会嗷嗷大叫也会低吟痛哭。不同的是，人类的情绪更复杂，更多样，更丰富。唉声叹气，可以说是最基础的情绪，因为无奈而举足无措，不知该如何如何化解，只有独自一人慢慢承受，长吁短叹不知如何是好，其实是无能无力的表现
Python快速入门 —— 第三节：类与对象孤华暗香 Python快速入门 python 开发语言
第三节：类与对象目标：了解面向对象编程的基础概念，并学会如何定义类和创建对象。内容：类与对象：定义类：class关键字。类的构造函数：__init__()。类的属性和方法。对象的创建与使用。示例：classStudent:def__init__(self,name,age,major):self.name&#
MongoDB Oplog 窗口喝醉酒的小白 MongoDB 运维
在MongoDB中，oplog（操作日志）是一个特殊的日志系统，用于记录对数据库的所有写操作。oplog允许副本集成员（通常是从节点）应用主节点上已经执行的操作，从而保持数据的一致性。它是MongoDB副本集实现数据复制的基础。MongoDBOplog窗口oplog窗口是指在MongoDB副本集中，从节点可以用来同步数据的时间范围。这个窗口通常由以下因素决定：Oplog大小：oplog的大小是有限
node.js学习小猿L node.js node.js 学习 vim
node.js学习实操及笔记温故node.js，node.js学习实操过程及笔记~node.js学习视频node.js官网node.js中文网实操笔记githubcsdn笔记为什么学node.js可以让别人访问我们编写的网页为后续的框架学习打下基础，三大框架vuereactangular离不开node.jsnode.js是什么官网：node.js是一个开源的、跨平台的运行JavaScript的运行
数据仓库——维度表一致性墨染丶eye 背诵数据仓库
数据仓库基础笔记思维导图已经整理完毕，完整连接为：数据仓库基础知识笔记思维导图维度一致性问题从逻辑层面来看，当一系列星型模型共享一组公共维度时，所涉及的维度称为一致性维度。当维度表存在不一致时，短期的成功难以弥补长期的错误。维度时确保不同过程中信息集成起来实现横向钻取货活动的关键。造成横向钻取失败的原因维度结构的差别，因为维度的差别，分析工作涉及的领域从简单到复杂，但是都是通过复杂的报表来弥补设计
高级 ECharts 技巧：自定义图表主题与样式 SnowMan1993 echarts 信息可视化数据分析
ECharts是一个强大的数据可视化库，提供了多种内置主题和样式，但你也可以根据项目的设计需求，自定义图表的主题与样式。本文将介绍如何使用ECharts自定义图表主题，以提升数据可视化的吸引力和一致性。1.什么是ECharts主题？ECharts的主题是指定义图表样式的配置项，包括颜色、字体、线条样式等。通过预设主题，你可以快速更改图表的整体风格，而自定义主题则允许你在此基础上进行个性化设置。2.
01-Git初识 Meereen Git git
01-Git初识概念：一个免费开源，分布式的代码版本控制系统，帮助开发团队维护代码作用：记录代码内容。切换代码版本，多人开发时高效合并代码内容如何学：个人本机使用：Git基础命令和概念多人共享使用：团队开发同一个项目的代码版本管理Git配置用户信息配置：用户名和邮箱，应用在每次提交代码版本时表明自己的身份命令：查看git版本号git-v配置用户名gitconfig--globaluser.name
ARM驱动学习之基础小知识 JT灬新一 ARM 嵌入式 arm开发学习
ARM驱动学习之基础小知识•sch原理图工程师工作内容–方案–元器件选型–采购（能不能买到，价格）–原理图（涉及到稳定性）•layout画板工程师–layout（封装、布局，布线，log）（涉及到稳定性）–焊接的一部分工作（调试阶段板子的焊接）•驱动工程师–驱动，原理图，layout三部分的交集容易发生矛盾•PCB研发流程介绍–方案，原理图(网表)–layout工程师（gerber文件）–PCB板
Rust基础知识 GRKF15 rust 开发语言后端
1.Rust语言简介1.1基础语法变量声明：let关键字用于声明变量，可以指定或不指定类型，如leta=10;和letmutc=30i32;。函数定义：使用fn关键字定义函数，并指定参数类型及返回类型，如fnadd(i:i32,j:i32)->i32{i+j}。控制流：包括if、else等，控制语句后需要使用;来结束语句。1.2数据类型整数类型：i8、i16、i32、i64、i128，以及无符号的
18、架构-可观测性之聚合度量大树~~ 架构 java python 后端架构
聚合度量聚合度量是指对系统运行时产生的各种指标数据进行收集、聚合和分析，以了解系统的健康状况和性能表现。聚合度量是可观测性的关键组成部分，通过对度量数据的分析，可以及时发现系统中的异常和瓶颈。以下是对聚合度量各个方面的详细解析，并结合具体的数据案例和技术支撑。指标收集收集系统运行时产生的各种指标数据是聚合度量的基础。常见的指标包括CPU使用率、内存使用率、请求处理时间、请求数、错误率等。以下是指标
Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
ExpRe[25] bash外的其它shell：zsh和fish tritone ExpRe bash linux ubuntu shell
文章目录zsh基础配置实用特性插件`autojump`语法高亮自动补全fish优点缺点时效性本篇撰写时间为2021.12.15，由于计算机技术日新月异，博客中所有内容都有时效和版本限制，具体做法不一定总行得通，链接可能改动失效，各种软件的用法可能有修改。但是其中透露的思想往往是值得学习的。本篇前置：ExpRe[10]Ubuntu[2]准备神秘软件、备份恢复软件https://www.cnblogs
网络编程基础记得开心一点啊网络
目录♫什么是网络编程♫Socket套接字♪什么是Socket套接字♪数据报套接字♪流套接字♫数据报套接字通信模型♪数据报套接字通讯模型♪DatagramSocket♪DatagramPacket♪实现UDP的服务端代码♪实现UDP的客户端代码♫流套接字通信模型♪流套接字通讯模型♪ServerSocket♪Socket♪实现TCP的服务端代码♪实现TCP的客户端代码♫什么是网络编程网络编程，指网络上
2021-01-24 9ce517ee104c
【打卡素材】《香帅金融学讲义》【标题】公司治理：怎样同床异梦地过下去【日期】2021.1.24【字数】公司本质上是一连串的合约关系。降低合同执行中的各种摩擦是公司正常有效运行的基础。协同各方的利益、制衡各方的权力是关键。为解决利益冲突问题、协同各方利益，进行权力制衡的机制设计就是公司治理机制。001什么是公司治理治理是管理的基础，治理机制越好，权、责、利就越清晰，管理的目标也就会更容易实现。002
自然语言处理_tf-idf _feivirus_ 算法机器学习和数学自然语言处理 tf-idf 逆文档频率词频
importpandasaspdimportmath1.数据预处理docA="Thecatsatonmyface"docB="Thedogsatonmybed"wordsA=docA.split("")wordsB=docB.split("")wordsSet=set(wordsA).union(set(wordsB))print(wordsSet){'on','my','face','sat',
如何在心上用功？余超林AIA财富管家
思考：如何在心上用功？学习心得：心-道-德-事的理解心-道-德-事这四部曲，本质上就是一个人的思维智慧的四个层面：事是最底层，这是所有人在这个社会谋求生存的基础，一个人能够把事情彻底做好，保质保量的完成，才会有真正的结果，但是这个层面要获得真正成功很困难，因为会做事的人很多，最终会出现恶性竞争；德是第三层，如果说整个社会做事的竞争激烈程度为100%，那么上升到德上的竞争激烈程度降低为80%，德是一
ztree异步加载 3213213333332132 JavaScript Ajax json Web ztree
相信新手用ztree的时候,对异步加载会有些困惑，我开始的时候也是看了API花了些时间才搞定了异步加载，在这里分享给大家。我后台代码生成的是json格式的数据，数据大家按各自的需求生成，这里只给出前端的代码。设置setting，这里只关注async属性的配置 var setting = { //异步加载配置
thirft rpc 具体调用流程 BlueSkator 中间件 rpc thrift
Thrift调用过程中，Thrift客户端和服务器之间主要用到传输层类、协议层类和处理类三个主要的核心类，这三个类的相互协作共同完成rpc的整个调用过程。在调用过程中将按照以下顺序进行协同工作：（1）将客户端程序调用的函数名和参数传递给协议层（TProtocol），协议
异或运算推导, 交换数据 dcj3sjt126com PHP 异或 ^
/* * 5 0101 * 9 1010 * * 5 ^ 5 * 0101 * 0101 * ----- * 0000 * 得出第一个规律: 相同的数进行异或, 结果是0 * * 9 ^ 5 ^ 6 * 1010 * 0101 * ---- * 1111 * * 1111 * 0110 * ---- * 1001
事件源对象周华华 JavaScript
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
MySql配置及相关命令 g21121 mysql
MySQL安装完毕后我们需要对它进行一些设置及性能优化，主要包括字符集设置，启动设置，连接优化，表优化，分区优化等等。一修改MySQL密码及用户
[简单]poi删除excel 2007超链接 53873039oycg Excel
采用解析sheet.xml方式删除超链接，缺点是要打开文件2次,代码如下: public void removeExcel2007AllHyperLink(String filePath) throws Exception { OPCPackage ocPkg = OPCPac
Struts2添加 open flash chart 云端月影
准备以下开源项目： 1. Struts 2.1.6 2. Open Flash Chart 2 Version 2 Lug Wyrm Charmer (28th, July 2009) 3. jofc2，这东西不知道是没做好还是什么意思，好像和ofc2不怎么匹配，最好下源码，有什么问题直接改。 4. log4j 用eclipse新建动态网站，取名OFC2Demo，将Struts2 l
spring包详解 aijuans spring
下载的spring包中文件及各种包众多，在项目中往往只有部分是我们必须的，如果不清楚什么时候需要什么包的话，看看下面就知道了。 aspectj目录下是在Spring框架下使用aspectj的源代码和测试程序文件。Aspectj是java最早的提供AOP的应用框架。 dist 目录下是Spring 的发布包，关于发布包下面会详细进行说明。 docs&nb
网站推广之seo概念 antonyup_2006 算法 Web 应用服务器搜索引擎 Google
持续开发一年多的b2c网站终于在08年10月23日上线了。作为开发人员的我在修改bug的同时，准备了解下网站的推广分析策略。所谓网站推广，目的在于让尽可能多的潜在用户了解并访问网站，通过网站获得有关产品和服务等信息，为最终形成购买决策提供支持。网站推广策略有很多，seo，email，adv
单例模式,sql注入,序列百合不是茶单例模式序列 sql注入预编译
序列在前面写过有关的博客,也有过总结,但是今天在做一个JDBC操作数据库的相关内容时需要使用序列创建一个自增长的字段居然不会了,所以将序列写在本篇的前面 1,序列是一个保存数据连续的增长的一种方式; 序列的创建; CREATE SEQUENCE seq_pro 2 INCREMENT BY 1 -- 每次加几个 3
Mockito单元测试实例 bijian1013 单元测试 mockito
Mockito单元测试实例： public class SettingServiceTest { private List<PersonDTO> personList = new ArrayList<PersonDTO>(); @InjectMocks private SettingPojoService settin
精通Oracle10编程SQL(9)使用游标 bijian1013 oracle 数据库 plsql
/* *使用游标 */ --显示游标 --在显式游标中使用FETCH...INTO语句 DECLARE CURSOR emp_cursor is select ename,sal from emp where deptno=1; v_ename emp.ename%TYPE; v_sal emp.sal%TYPE; begin ope
【Java语言】动态代理 bit1129 java语言
JDK接口动态代理 JDK自带的动态代理通过动态的根据接口生成字节码(实现接口的一个具体类)的方式，为接口的实现类提供代理。被代理的对象和代理对象通过InvocationHandler建立关联 package com.tom; import com.tom.model.User; import com.tom.service.IUserService;
Java通信之URL通信基础白糖_ java jdk webservice 网络协议 ITeye
java对网络通信以及提供了比较全面的jdk支持，java.net包能让程序员直接在程序中实现网络通信。在技术日新月异的现在，我们能通过很多方式实现数据通信，比如webservice、url通信、socket通信等等，今天简单介绍下URL通信。学习准备：建议首先学习java的IO基础知识 URL是统一资源定位器的简写，URL可以访问Internet和www，可以通过url
博弈Java讲义 - Java线程同步 (1) boyitech java 多线程同步锁
在并发编程中经常会碰到多个执行线程共享资源的问题。例如多个线程同时读写文件，共用数据库连接，全局的计数器等。如果不处理好多线程之间的同步问题很容易引起状态不一致或者其他的错误。同步不仅可以阻止一个线程看到对象处于不一致的状态，它还可以保证进入同步方法或者块的每个线程，都看到由同一锁保护的之前所有的修改结果。处理同步的关键就是要正确的识别临界条件（cri
java-给定字符串，删除开始和结尾处的空格，并将中间的多个连续的空格合并成一个。 bylijinnan java
public class DeleteExtraSpace { /** * 题目：给定字符串，删除开始和结尾处的空格，并将中间的多个连续的空格合并成一个。 * 方法1.用已有的String类的trim和replaceAll方法 * 方法2.全部用正则表达式，这个我不熟 * 方法3.“重新发明轮子”，从头遍历一次 */ public static v
An error has occurred.See the log file错误解决！ Kai_Ge MyEclipse
今天早上打开MyEclipse时，自动关闭！弹出An error has occurred.See the log file错误提示！很郁闷昨天启动和关闭还好着！！！打开几次依然报此错误，确定不是眼花了！打开日志文件！找到当日错误文件内容： --------------------------------------------------------------------------
[矿业与工业]修建一个空间矿床开采站要多少钱? comsci
地球上的钛金属矿藏已经接近枯竭........... 我们在冥王星的一颗卫星上面发现一些具有开采价值的矿床..... 那么,现在要编制一个预算,提交给财政部门..
解析Google Map Routes dai_lm google api
为了获得从A点到B点的路劲，经常会使用Google提供的API，例如 [url] http://maps.googleapis.com/maps/api/directions/json?origin=40.7144,-74.0060&destination=47.6063,-122.3204&sensor=false [/url] 从返回的结果上，大致可以了解应该怎么走，但
SQL还有多少“理所应当”？ datamachine sql
转贴存档，原帖地址：http://blog.chinaunix.net/uid-29242841-id-3968998.html、http://blog.chinaunix.net/uid-29242841-id-3971046.html！ ------------------------------------华丽的分割线--------------------------------
Yii使用Ajax验证时，如何设置某些字段不需要验证 dcj3sjt126com Ajax yii
经常像你注册页面,你可能非常希望只需要Ajax去验证用户名和Email,而不需要使用Ajax再去验证密码,默认如果你使用Yii 内置的ajax验证Form,例如: $form=$this->beginWidget('CActiveForm', array( 'id'=>'usuario-form',&
使用git同步网站代码 dcj3sjt126com crontab git
转自:http://ued.ctrip.com/blog/?p=3646?tn=gongxinjun.com 管理一网站，最开始使用的虚拟空间，采用提供商支持的ftp上传网站文件，后换用vps，vps可以自己搭建ftp的，但是懒得搞，直接使用scp传输文件到服务器，现在需要更新文件到服务器，使用scp真的很烦。发现本人就职的公司，采用的git+rsync的方式来管理、同步代码，遂
sql基本操作蕃薯耀 sql sql基本操作 sql常用操作
sql基本操作 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月1日 17:30:33 星期一 &
Spring4+Hibernate4+Atomikos3.3多数据源事务管理 hanqunfeng Hibernate4
Spring3+后不再对JTOM提供支持，所以可以改用Atomikos管理多数据源事务。Spring2.5+Hibernate3+JTOM参考：http://hanqunfeng.iteye.com/blog/1554251Atomikos官网网站：http://www.atomikos.com/ 一.pom.xml <dependency> <
jquery中两个值得注意的方法one()和trigger()方法 jackyrong trigger
在jquery中，有两个值得注意但容易忽视的方法，分别是one()方法和trigger()方法,这是从国内作者<<jquery权威指南》一书中看到不错的介绍 1） one方法 one方法的功能是让所选定的元素绑定一个仅触发一次的处理函数，格式为 one(type,${data},fn) &nb
拿工资不仅仅是让你写代码的 lampcy 工作面试咨询
这是我对团队每个新进员工说的第一件事情。这句话的意思是，我并不关心你是如何快速完成任务的，哪怕代码很差，只要它像救生艇通气门一样管用就行。这句话也是我最喜欢的座右铭之一。这个说法其实很合理：我们的工作是思考客户提出的问题，然后制定解决方案。思考第一，代码第二，公司请我们的最终目的不是写代码，而是想出解决方案。话粗理不粗。付你薪水不是让你来思考的，也不是让你来写代码的，你的目的是交付产品
架构师之对象操作----------对象的效率复制和判断是否全为空 nannan408 架构师
1.前言。如题。 2.代码。 (1)对象的复制，比spring的beanCopier在大并发下效率要高，利用net.sf.cglib.beans.BeanCopier Src src=new Src(); BeanCopier beanCopier = BeanCopier.create(Src.class, Des.class, false);
ajax 被缓存的解决方案 Rainbow702 JavaScript jquery Ajax cache 缓存
使用jquery的ajax来发送请求进行局部刷新画面，各位可能都做过。今天碰到一个奇怪的现象，就是，同一个ajax请求，在chrome中，不论发送多少次，都可以发送至服务器端，而不会被缓存。但是，换成在IE下的时候，发现，同一个ajax请求，会发生被缓存的情况，只有第一次才会被发送至服务器端，之后的不会再被发送。郁闷。解决方法如下： ① 直接使用 JQuery提供的 “cache”参数，
修改date.toLocaleString()的警告 tntxia String
我们在写程序的时候，经常要查看时间，所以我们经常会用到date.toLocaleString()，但是date.toLocaleString()是一个过时的API，代替的方法如下： package com.tntxia.htmlmaker.util; import java.text.SimpleDateFormat; import java.util.
项目完成后的小总结 xiaomiya js 总结项目
项目完成了，突然想做个总结但是有点无从下手了。做之前对于客户端给的接口很模式。然而定义好了格式要求就如此的愉快了。先说说项目主要实现的功能吧 1，按键精灵 2，获取行情数据 3，各种input输入条件判断 4，发送数据（有json格式和string格式） 5，获取预警条件列表和预警结果列表， 6，排序， 7，预警结果分页获取 8，导出文件（excel，text等） 9，修