风路丞

飞桨NLP学习

详细代码见：https://gitee.com/chfengyiliu/paddlenlp_learn。涉及实体提取、关系抽取、文字生成图片。

这里备注下自己做实体提取的finetune的训练笔记：

(1) input_ids中加入了提示语prompt信息；

(2) 一个样本对应的output是：当前样本中“每个词状态=是实体start位置“、“每个词状态=是实体end位置“ 的分布得分。所以每条样本的output size = (1, max_len)，batch size个样本集的outputs size =(batch_size, max_len)（见代码中举例）

(3) 损失函数是每个词的实体位置预测得分与真实值分布的交叉熵损失值。

备注：
实体提取和关系抽取的代码是一模一样的，不同的是喂入样本的prompt提示语 :
实体提取：提示语只有实体类型，如：人名、地名、公司名等；
关系抽取：提示语既有实体类型，也有关系类型，如：实体类-人名、地名，关系类-公司的高管、奶奶、孙子

（1）finetune.py

每个输入样本的编码实现：

batch的获取在 utils.py 的 convert_example(xx) 方法中实现。

该方法底层主要通过以下类实现编码encode：

(1）site-packages\paddlenlp\transformers\tokenizer_utils.py中的_batch_encode_plus(xx)方法；

(2）site-packages\paddlenlp\transformers\tokenizer_utils_base.py中的prepare_for_model(xx)方法。

# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import time
import os
from functools import partial

import paddle
from paddle.utils.download import get_path_from_url
from paddlenlp.datasets import load_dataset
from paddlenlp.transformers import AutoTokenizer
from paddlenlp.metrics import SpanEvaluator
from paddlenlp.utils.log import logger

from model import UIE
from evaluate import evaluate
from utils import set_seed, convert_example, reader, MODEL_MAP, create_data_loader



def do_train():
    paddle.set_device(args.device)
    rank = paddle.distributed.get_rank()
    if paddle.distributed.get_world_size() > 1:
        paddle.distributed.init_parallel_env()

    set_seed(args.seed)

    resource_file_urls = MODEL_MAP[args.model]['resource_file_urls']

    logger.info("Downloading resource files...")
    for key, val in resource_file_urls.items():
        file_path = os.path.join(args.model, key)
        if not os.path.exists(file_path):
            get_path_from_url(val, args.model)

    tokenizer = AutoTokenizer.from_pretrained(args.model)
    model = UIE.from_pretrained(args.model)

    train_ds = load_dataset(reader,
                            data_path=args.train_path,
                            max_seq_len=args.max_seq_len,
                            lazy=False)
    dev_ds = load_dataset(reader,
                          data_path=args.dev_path,
                          max_seq_len=args.max_seq_len,
                          lazy=False)

    trans_fn = partial(convert_example,
                       tokenizer=tokenizer,
                       max_seq_len=args.max_seq_len)

    train_data_loader = create_data_loader(train_ds,
                                           mode="train",
                                           batch_size=args.batch_size,
                                           trans_fn=trans_fn)
    dev_data_loader = create_data_loader(dev_ds,
                                         mode="dev",
                                         batch_size=args.batch_size,
                                         trans_fn=trans_fn)

    if args.init_from_ckpt and os.path.isfile(args.init_from_ckpt):
        state_dict = paddle.load(args.init_from_ckpt)
        model.set_dict(state_dict)

    if paddle.distributed.get_world_size() > 1:
        model = paddle.DataParallel(model)

    optimizer = paddle.optimizer.AdamW(learning_rate=args.learning_rate,
                                       parameters=model.parameters())

    criterion = paddle.nn.BCELoss()
    metric = SpanEvaluator()

    loss_list = []
    global_step = 0
    best_f1 = 0
    tic_train = time.time()
    for epoch in range(1, args.num_epochs + 1):
        for batch in train_data_loader:
            # todo 模型输入：
            #  batch的获取在 utils.py 的 convert_example(xx) 方法中实现。
            #  该方法底层主要通过以下类实现编码encode：
            #  （1）site-packages\paddlenlp\transformers\tokenizer_utils.py中的_batch_encode_plus(xx)方法；
            #  （2）site-packages\paddlenlp\transformers\tokenizer_utils_base.py中的prepare_for_model(xx)方法。

            #  在以下基础上根据 max_len截断或补0：
            #  input_ids = cls + prompt + sep + content + sep。 记录实体的属性值（不是实体，如张三的属性是人名，这里记录的是人名，不是张三）+文本描述
            #  token_type_ids = [0]*len(cls + prompt + sep) + [1]*len(content + sep)
            #  att_mask = [1]*len(input_ids)
            #  pos_ids = range(input_ids)

            #  todo 样本标注结构举例
            #   {
            #       "content": "网易公司首席架构设计师,丁磊1997年6月创立网易公司。",
            #       "result_list": [{"text": "网易公司", "start": 0, "end": 4}, {"text": "网易公司", "start": 20, "end": 24}],
            #       "prompt": "公司"
            #   }
            #  转换成模型输入序列 = [cls]+公司+[sep]+网,易,公,司,首,席,架,构,设,计,师,，,丁,磊,1997,年,6,月,创,立,网,易,公,司,。,+[sep]
            #
            #  start_ids = 当前标注样本结构里result_list中所有 text 的 start 处=1，其他处=0。 记录实体的开始位置。
            #  按输入序列编码后start_ids = [0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0]
            #
            #  end_ids = 当前标注样本结构里result_list中所有 text 的 end 处=1，其他处=0 。 记录实体的结束位置。
            #  按输入序列编码后遍码 = [0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0]
            input_ids, token_type_ids, att_mask, pos_ids, start_ids, end_ids = batch

  
             # todo 模型输出： 这里batch_size=8,max_len=30
            #  start_prob -- 输出每个样本里每个字作为起始位置的得分，每个样本输出shape=[1,max_len]，按batch后shape=[batch_size, max_len]
            #  end_prob -- 输出每个样本里每个字作为结束位置的概率，每个样本输出shape=[1,max_len]，按batch后shape=[batch_size, max_len]
            #  例如：Tensor(shape=[8, 30], dtype=float32, place=Place(gpu:0), stop_gradient=False,
            #        [[0.00000031, 0.00000129, 0.00000024, 0.00000031, 0.00000042, 0.00070048,
            #          0.00000877, 0.00002038, 0.00000138, 0.00001625, 0.00000029, 0.00008086,
            #          0.00000202, 0.00000990, 0.00000406, 0.00016828, 0.00001869, 0.00021678,
            #          0.00000937, 0.00046841, 0.00002430, 0.00000023, 0.00006322, 0.00000595,
            #          0.00000095, 0.00000166, 0.00000030, 0.00000033, 0.00000023, 0.00000073],
            #         [0.00000190, 0.00000188, 0.00000061, 0.00000031, 0.00000152, 0.00057717,
            #          0.00000707, 0.00004766, 0.00000537, 0.00017263, 0.00000520, 0.00002636,
            #          0.00000283, 0.00066136, 0.00001277, 0.00100498, 0.00003792, 0.00072994,
            #          0.00085996, 0.00004832, 0.00000354, 0.00000241, 0.00000104, 0.00000189,
            #          0.00000262, 0.00000088, 0.00000112, 0.00000747, 0.00000136, 0.00000190],
            #         [0.00000082, 0.00000135, 0.00000011, 0.00000184, 0.00000276, 0.01416136,
            #          0.00006575, 0.00002398, 0.00433168, 0.00007143, 0.00002933, 0.00001502,
            #          0.00000795, 0.00002079, 0.00000122, 0.00001549, 0.00000252, 0.00002578,
            #          0.00000451, 0.00000495, 0.00000058, 0.00000076, 0.00000072, 0.00000065,
            #          0.00000183, 0.00000128, 0.00000089, 0.00000143, 0.00000092, 0.00000059],
            #         [0.00000310, 0.00000056, 0.00000012, 0.00000066, 0.98270828, 0.00001563,
            #          0.00001566, 0.00013158, 0.00000047, 0.00003079, 0.00000261, 0.00002862,
            #          0.00000128, 0.00000287, 0.00000035, 0.00000161, 0.00000160, 0.00000046,
            #          0.00000067, 0.00000049, 0.00000047, 0.00000088, 0.00000070, 0.00000102,
            #          0.00000038, 0.00000045, 0.00000058, 0.00000048, 0.00000076, 0.00000086],
            #         [0.00000288, 0.00000049, 0.00000015, 0.00000016, 0.00000197, 0.00002671,
            #          0.00000357, 0.00000193, 0.00000425, 0.00000676, 0.00000180, 0.00009420,
            #          0.05136376, 0.00011208, 0.00007415, 0.02065694, 0.00003858, 0.00000352,
            #          0.00000202, 0.00004982, 0.00000858, 0.00003040, 0.00002816, 0.00000214,
            #          0.00000188, 0.00000202, 0.00000312, 0.00000156, 0.00000234, 0.00000251],
            #         [0.00000035, 0.00000123, 0.00000029, 0.00000053, 0.71992922, 0.00012994,
            #          0.00003325, 0.00004523, 0.00003941, 0.00000144, 0.00001109, 0.00000611,
            #          0.00000129, 0.00012184, 0.00004307, 0.00000224, 0.00001191, 0.00002946,
            #          0.00000537, 0.00008847, 0.00002351, 0.00000039, 0.00000065, 0.00000039,
            #          0.00000026, 0.00000050, 0.00000176, 0.00000064, 0.00000034, 0.00000079],
            #         [0.00000126, 0.00000964, 0.00000142, 0.00000091, 0.00000283, 0.01225603,
            #          0.00028963, 0.00000450, 0.00007187, 0.00000975, 0.00098875, 0.00000235,
            #          0.00004167, 0.00000309, 0.00036950, 0.00002466, 0.00126630, 0.00023501,
            #          0.00562316, 0.00004048, 0.00000703, 0.00022977, 0.00001388, 0.00000123,
            #          0.00000202, 0.00000168, 0.00000258, 0.00000113, 0.00000218, 0.00000248],
            #         [0.00000040, 0.00000045, 0.00000008, 0.00000032, 0.00022130, 0.00000234,
            #          0.00001500, 0.00006374, 0.76793706, 0.00115055, 0.00060874, 0.38059372,
            #          0.00022558, 0.00005922, 0.00001344, 0.00001371, 0.00000058, 0.00000036,
            #          0.00000137, 0.00000051, 0.00000054, 0.00000072, 0.00000063, 0.00000039,
            #          0.00000051, 0.00000096, 0.00000066, 0.00000111, 0.00000035, 0.00000046]])
            start_prob, end_prob = model(input_ids, token_type_ids, att_mask, pos_ids)
            start_ids = paddle.cast(start_ids, 'float32')
            end_ids = paddle.cast(end_ids, 'float32')
            loss_start = criterion(start_prob, start_ids)
            loss_end = criterion(end_prob, end_ids)
            loss = (loss_start + loss_end) / 2.0
            loss.backward()
            optimizer.step()
            optimizer.clear_grad()
            loss_list.append(float(loss))

            global_step += 1
            if global_step % args.logging_steps == 0 and rank == 0:
                time_diff = time.time() - tic_train
                loss_avg = sum(loss_list) / len(loss_list)
                logger.info(
                    "global step %d, epoch: %d, loss: %.5f, speed: %.2f step/s"
                    % (global_step, epoch, loss_avg,
                       args.logging_steps / time_diff))
                tic_train = time.time()

            if global_step % args.valid_steps == 0 and rank == 0:
                save_dir = os.path.join(args.save_dir, "model_%d" % global_step)
                if not os.path.exists(save_dir):
                    os.makedirs(save_dir)
                model_to_save = model._layers if isinstance(
                    model, paddle.DataParallel) else model
                model_to_save.save_pretrained(save_dir)
                logger.disable()
                tokenizer.save_pretrained(save_dir)
                logger.enable()

                precision, recall, f1 = evaluate(model, metric, dev_data_loader)
                logger.info(
                    "Evaluation precision: %.5f, recall: %.5f, F1: %.5f" %
                    (precision, recall, f1))
                if f1 > best_f1:
                    logger.info(
                        f"best F1 performence has been updated: {best_f1:.5f} --> {f1:.5f}"
                    )
                    best_f1 = f1
                    save_dir = os.path.join(args.save_dir, "model_best")
                    model_to_save = model._layers if isinstance(
                        model, paddle.DataParallel) else model
                    model_to_save.save_pretrained(save_dir)
                    logger.disable()
                    tokenizer.save_pretrained(save_dir)
                    logger.enable()
                tic_train = time.time()


if __name__ == "__main__":
    # yapf: disable
    parser = argparse.ArgumentParser()

    # todo 新增路径前缀
    # path_prefix = './self-mark'  # 自己标注的样本：包括实体和关系
    path_prefix = './relation'  # 关系抽取【目标：同时完成实体提取和关系抽取，所以标注时prompt即有实体的提示语，如“人名”，也有关系的提示语，如“小米公司的高管”】
    # path_prefix = './ner'  # 实体识别
    '''
    执行run没报错，但显示“进程已结束,退出代码-1073740791 (0xC0000409)”， 可能是因为gpu显存不足，可以尝试把batch_size缩小。
    这里batch_size从16减到2就可以正常运行了！！
    '''
    # parser.add_argument("--batch_size", default=16, type=int, help="Batch size per GPU/CPU for training.")
    # todo batch_size实体识别default=8,关系抽取default=2
    parser.add_argument("--batch_size", default=2, type=int, help="Batch size per GPU/CPU for training.")
    parser.add_argument("--learning_rate", default=1e-5, type=float, help="The initial learning rate for Adam.")
    # parser.add_argument("--train_path", default='./data/train.txt', type=str, help="The path of train set.")
    parser.add_argument("--train_path", default=path_prefix + '/data/train.txt', type=str, help="The path of train set.")
    # parser.add_argument("--dev_path", default='./data/dev.txt', type=str, help="The path of dev set.")
    parser.add_argument("--dev_path", default=path_prefix + '/data/dev.txt', type=str, help="The path of dev set.")
    # parser.add_argument("--save_dir", default='./checkpoint_gx', type=str, help="The output directory where the model checkpoints will be written.")
    parser.add_argument("--save_dir", default=path_prefix + '/checkpoint', type=str, help="The output directory where the model checkpoints will be written.")
    # todo 默认是512，debug时可以设置小点，方便看矩阵里的值，比如实体提取=30，关系抽取=100。
    parser.add_argument("--max_seq_len", default=100, type=int, help="The maximum input sequence length. "
        "Sequences longer than this will be split automatically.")
    # parser.add_argument("--num_epochs", default=100, type=int, help="Total number of training epochs to perform.")
    # todo epochs实体识别default=10,关系抽取default=10或15
    parser.add_argument("--num_epochs", default=10, type=int, help="Total number of training epochs to perform.")
    parser.add_argument("--seed", default=1000, type=int, help="Random seed for initialization")
    parser.add_argument("--logging_steps", default=10, type=int, help="The interval steps to logging.")
    # parser.add_argument("--valid_steps", default=100, type=int, help="The interval steps to evaluate model performance.")
    # todo valid_steps实体识别default=10,关系抽取default=4。
    #  默认模型保存的step间隔等于--valid_steps，即实体识别每10步保存一次模型，关系抽取则是每4步。
    parser.add_argument("--valid_steps", default=4, type=int, help="The interval steps to evaluate model performance.")
    parser.add_argument('--device', choices=['cpu', 'gpu'], default="gpu", help="Select which device to train model, defaults to gpu.")
    parser.add_argument("--model", choices=["uie-base", "uie-tiny", "uie-medium", "uie-mini", "uie-micro", "uie-nano"], default="uie-base", type=str, help="Select the pretrained model for few-shot learning.")
    parser.add_argument("--init_from_ckpt", default=None, type=str, help="The path of model parameters for initialization.")

    args = parser.parse_args()
    # yapf: enable

    do_train()

（2）utils.py

# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import re
import math
import json
import random
from tqdm import tqdm

import numpy as np
import paddle
from paddlenlp.utils.log import logger

MODEL_MAP = {
    # vocab.txt/special_tokens_map.json/tokenizer_config.json are common to the default model.
    "uie-base": {
        "resource_file_urls": {
            "model_state.pdparams":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base_v1.0/model_state.pdparams",
            "model_config.json":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/model_config.json",
            "vocab_file":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/vocab.txt",
            "special_tokens_map":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/special_tokens_map.json",
            "tokenizer_config":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/tokenizer_config.json"
        }
    },
    "uie-medium": {
        "resource_file_urls": {
            "model_state.pdparams":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_medium_v1.0/model_state.pdparams",
            "model_config.json":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_medium/model_config.json",
            "vocab_file":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/vocab.txt",
            "special_tokens_map":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/special_tokens_map.json",
            "tokenizer_config":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/tokenizer_config.json"
        }
    },
    "uie-mini": {
        "resource_file_urls": {
            "model_state.pdparams":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_mini_v1.0/model_state.pdparams",
            "model_config.json":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_mini/model_config.json",
            "vocab_file":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/vocab.txt",
            "special_tokens_map":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/special_tokens_map.json",
            "tokenizer_config":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/tokenizer_config.json"
        }
    },
    "uie-micro": {
        "resource_file_urls": {
            "model_state.pdparams":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_micro_v1.0/model_state.pdparams",
            "model_config.json":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_micro/model_config.json",
            "vocab_file":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/vocab.txt",
            "special_tokens_map":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/special_tokens_map.json",
            "tokenizer_config":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/tokenizer_config.json"
        }
    },
    "uie-nano": {
        "resource_file_urls": {
            "model_state.pdparams":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_nano_v1.0/model_state.pdparams",
            "model_config.json":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_nano/model_config.json",
            "vocab_file":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/vocab.txt",
            "special_tokens_map":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/special_tokens_map.json",
            "tokenizer_config":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_base/tokenizer_config.json"
        }
    },
    # Rename to `uie-medium` and the name of `uie-tiny` will be deprecated in future.
    "uie-tiny": {
        "resource_file_urls": {
            "model_state.pdparams":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_tiny_v0.1/model_state.pdparams",
            "model_config.json":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_tiny/model_config.json",
            "vocab_file":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_tiny/vocab.txt",
            "special_tokens_map":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_tiny/special_tokens_map.json",
            "tokenizer_config":
            "https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_tiny/tokenizer_config.json"
        }
    }
}


def set_seed(seed):
    paddle.seed(seed)
    random.seed(seed)
    np.random.seed(seed)


def create_data_loader(dataset, mode="train", batch_size=1, trans_fn=None):
    """
    Create dataloader.
    Args:
        dataset(obj:`paddle.io.Dataset`): Dataset instance.
        mode(obj:`str`, optional, defaults to obj:`train`): If mode is 'train', it will shuffle the dataset randomly.
        batch_size(obj:`int`, optional, defaults to 1): The sample number of a mini-batch.
        trans_fn(obj:`callable`, optional, defaults to `None`): function to convert a data sample to input ids, etc.
    Returns:
        dataloader(obj:`paddle.io.DataLoader`): The dataloader which generates batches.
    """
    if trans_fn:
        dataset = dataset.map(trans_fn)

    shuffle = True if mode == 'train' else False
    if mode == "train":
        sampler = paddle.io.DistributedBatchSampler(dataset=dataset,
                                                    batch_size=batch_size,
                                                    shuffle=shuffle)
    else:
        sampler = paddle.io.BatchSampler(dataset=dataset,
                                         batch_size=batch_size,
                                         shuffle=shuffle)
    dataloader = paddle.io.DataLoader(dataset,
                                      batch_sampler=sampler,
                                      return_list=True)
    return dataloader


def convert_example(example, tokenizer, max_seq_len):
    """
    example: {
        title
        prompt
        content
        result_list
    }
    """
    encoded_inputs = tokenizer(text=[example["prompt"]],
                               text_pair=[example["content"]],
                               truncation=True,
                               max_seq_len=max_seq_len,
                               pad_to_max_seq_len=True,
                               return_attention_mask=True,
                               return_position_ids=True,
                               return_dict=False,
                               return_offsets_mapping=True)
    encoded_inputs = encoded_inputs[0]
    # todo #"【cls】出,发,地【sep】深,大,到,双,龙,28,块,钱,4,月,24,号,交,通,费", 在maxlength=30时对应的offetmapping =
    #  [(0, 0), (0, 1), (1, 2), (2, 3),
    #   (0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 13), (13, 14), (14, 15), (15, 16), (16, 17),
    #   (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0)]
    offset_mapping = [list(x) for x in encoded_inputs["offset_mapping"]]
    bias = 0
    for index in range(1, len(offset_mapping)):
        mapping = offset_mapping[index]
        if mapping[0] == 0 and mapping[1] == 0 and bias == 0:
            bias = offset_mapping[index - 1][1] + 1  # Includes [SEP] token
        if mapping[0] == 0 and mapping[1] == 0:
            continue
        offset_mapping[index][0] += bias
        offset_mapping[index][1] += bias
    start_ids = [0 for x in range(max_seq_len)]
    end_ids = [0 for x in range(max_seq_len)]
    for item in example["result_list"]:
        start = map_offset(item["start"] + bias, offset_mapping)
        end = map_offset(item["end"] - 1 + bias, offset_mapping)
        start_ids[start] = 1.0
        end_ids[end] = 1.0

    tokenized_output = [
        encoded_inputs["input_ids"], encoded_inputs["token_type_ids"],
        encoded_inputs["position_ids"], encoded_inputs["attention_mask"],
        start_ids, end_ids
    ]
    tokenized_output = [np.array(x, dtype="int64") for x in tokenized_output]
    return tuple(tokenized_output)


def map_offset(ori_offset, offset_mapping):
    """
    map ori offset to token offset
    """
    for index, span in enumerate(offset_mapping):
        if span[0] <= ori_offset < span[1]:
            return index
    return -1


def reader(data_path, max_seq_len=512):
    """
    read json
    """
    with open(data_path, 'r', encoding='utf-8') as f:
        for line in f:
            json_line = json.loads(line)
            content = json_line['content'].strip()
            prompt = json_line['prompt']
            # Model Input is aslike: [CLS] Prompt [SEP] Content [SEP]
            # It include three summary tokens.
            if max_seq_len <= len(prompt) + 3:
                raise ValueError(
                    "The value of max_seq_len is too small, please set a larger value"
                )
            max_content_len = max_seq_len - len(prompt) - 3
            if len(content) <= max_content_len:
                yield json_line
            else:
                if result['end'] - result['start'] > max_content_len:
                    logger.warn(
                        "result['end '] - result ['start'] exceeds max_content_len, which will result in no valid instance being returned"
                    )
                result_list = json_line['result_list']
                json_lines = []
                accumulate = 0
                while True:
                    cur_result_list = []

                    for result in result_list:
                        if result['start'] + 1 <= max_content_len < result[
                                'end'] and result['end'] - result[
                                    'start'] <= max_content_len:
                            max_content_len = result['start']
                            break

                    cur_content = content[:max_content_len]
                    res_content = content[max_content_len:]

                    while True:
                        if len(result_list) == 0:
                            break
                        elif result_list[0]['end'] <= max_content_len:
                            if result_list[0]['end'] > 0:
                                cur_result = result_list.pop(0)
                                cur_result_list.append(cur_result)
                            else:
                                cur_result_list = [
                                    result for result in result_list
                                ]
                                break
                        else:
                            break

                    json_line = {
                        'content': cur_content,
                        'result_list': cur_result_list,
                        'prompt': prompt
                    }
                    json_lines.append(json_line)

                    for result in result_list:
                        if result['end'] <= 0:
                            break
                        result['start'] -= max_content_len
                        result['end'] -= max_content_len
                    accumulate += max_content_len
                    max_content_len = max_seq_len - len(prompt) - 3
                    if len(res_content) == 0:
                        break
                    elif len(res_content) < max_content_len:
                        json_line = {
                            'content': res_content,
                            'result_list': result_list,
                            'prompt': prompt
                        }
                        json_lines.append(json_line)
                        break
                    else:
                        content = res_content

                for json_line in json_lines:
                    yield json_line


def unify_prompt_name(prompt):
    # The classification labels are shuffled during finetuning, so they need
    # to be unified during evaluation.
    if re.search(r'\[.*?\]$', prompt):
        prompt_prefix = prompt[:prompt.find("[", 1)]
        cls_options = re.search(r'\[.*?\]$', prompt).group()[1:-1].split(",")
        cls_options = sorted(list(set(cls_options)))
        cls_options = ",".join(cls_options)
        prompt = prompt_prefix + "[" + cls_options + "]"
        return prompt
    return prompt


def get_relation_type_dict(relation_data):

    def compare(a, b):
        a = a[::-1]
        b = b[::-1]
        res = ''
        for i in range(min(len(a), len(b))):
            if a[i] == b[i]:
                res += a[i]
            else:
                break
        if res == "":
            return res
        elif res[::-1][0] == "的":
            return res[::-1][1:]
        return ""

    relation_type_dict = {}
    added_list = []
    for i in range(len(relation_data)):
        added = False
        if relation_data[i][0] not in added_list:
            for j in range(i + 1, len(relation_data)):
                match = compare(relation_data[i][0], relation_data[j][0])
                if match != "":
                    match = unify_prompt_name(match)
                    if relation_data[i][0] not in added_list:
                        added_list.append(relation_data[i][0])
                        relation_type_dict.setdefault(match, []).append(
                            relation_data[i][1])
                    added_list.append(relation_data[j][0])
                    relation_type_dict.setdefault(match, []).append(
                        relation_data[j][1])
                    added = True
            if not added:
                added_list.append(relation_data[i][0])
                suffix = relation_data[i][0].rsplit("的", 1)[1]
                suffix = unify_prompt_name(suffix)
                relation_type_dict.setdefault(suffix,
                                              []).append(relation_data[i][1])
    return relation_type_dict


def add_entity_negative_example(examples, texts, prompts, label_set,
                                negative_ratio):
    negative_examples = []
    positive_examples = []
    with tqdm(total=len(prompts)) as pbar:
        for i, prompt in enumerate(prompts):
            redundants = list(set(label_set) ^ set(prompt))
            redundants.sort()

            num_positive = len(examples[i])
            if num_positive != 0:
                actual_ratio = math.ceil(len(redundants) / num_positive)
            else:
                # Set num_positive to 1 for text without positive example
                num_positive, actual_ratio = 1, 0

            if actual_ratio <= negative_ratio or negative_ratio == -1:
                idxs = [k for k in range(len(redundants))]
            else:
                idxs = random.sample(range(0, len(redundants)),
                                     negative_ratio * num_positive)

            for idx in idxs:
                negative_result = {
                    "content": texts[i],
                    "result_list": [],
                    "prompt": redundants[idx]
                }
                negative_examples.append(negative_result)
            positive_examples.extend(examples[i])
            pbar.update(1)
    return positive_examples, negative_examples


def add_relation_negative_example(redundants, text, num_positive, ratio):
    added_example = []
    rest_example = []

    if num_positive != 0:
        actual_ratio = math.ceil(len(redundants) / num_positive)
    else:
        # Set num_positive to 1 for text without positive example
        num_positive, actual_ratio = 1, 0

    all_idxs = [k for k in range(len(redundants))]
    if actual_ratio <= ratio or ratio == -1:
        idxs = all_idxs
        rest_idxs = []
    else:
        idxs = random.sample(range(0, len(redundants)), ratio * num_positive)
        rest_idxs = list(set(all_idxs) ^ set(idxs))

    for idx in idxs:
        negative_result = {
            "content": text,
            "result_list": [],
            "prompt": redundants[idx]
        }
        added_example.append(negative_result)

    for rest_idx in rest_idxs:
        negative_result = {
            "content": text,
            "result_list": [],
            "prompt": redundants[rest_idx]
        }
        rest_example.append(negative_result)

    return added_example, rest_example


def add_full_negative_example(examples, texts, relation_prompts, predicate_set,
                              subject_goldens):
    with tqdm(total=len(relation_prompts)) as pbar:
        for i, relation_prompt in enumerate(relation_prompts):
            negative_sample = []
            for subject in subject_goldens[i]:
                for predicate in predicate_set:
                    # The relation prompt is constructed as follows:
                    # subject + "的" + predicate
                    prompt = subject + "的" + predicate
                    if prompt not in relation_prompt:
                        negative_result = {
                            "content": texts[i],
                            "result_list": [],
                            "prompt": prompt
                        }
                        negative_sample.append(negative_result)
            examples[i].extend(negative_sample)
            pbar.update(1)
    return examples


def generate_cls_example(text, labels, prompt_prefix, options):
    random.shuffle(options)
    cls_options = ",".join(options)
    prompt = prompt_prefix + "[" + cls_options + "]"

    result_list = []
    example = {"content": text, "result_list": result_list, "prompt": prompt}
    for label in labels:
        start = prompt.rfind(label) - len(prompt) - 1
        end = start + len(label)
        result = {"text": label, "start": start, "end": end}
        example["result_list"].append(result)
    return example


def convert_cls_examples(raw_examples,
                         prompt_prefix="情感倾向",
                         options=["正向", "负向"]):
    """
    Convert labeled data export from doccano for classification task.
    """
    examples = []
    logger.info(f"Converting doccano data...")
    with tqdm(total=len(raw_examples)) as pbar:
        for line in raw_examples:
            items = json.loads(line)
            # Compatible with doccano >= 1.6.2
            if "data" in items.keys():
                text, labels = items["data"], items["label"]
            else:
                text, labels = items["text"], items["label"]
            example = generate_cls_example(text, labels, prompt_prefix, options)
            examples.append(example)
    return examples


def convert_ext_examples(raw_examples,
                         negative_ratio,
                         prompt_prefix="情感倾向",
                         options=["正向", "负向"],
                         separator="##",
                         is_train=True):
    """
    Convert labeled data export from doccano for relation and aspect-level classification task.
    """

    def _sep_cls_label(label, separator):
        label_list = label.split(separator)
        if len(label_list) == 1:
            return label_list[0], None
        return label_list[0], label_list[1:]

    texts = []
    entity_examples = []
    relation_examples = []
    entity_cls_examples = []
    entity_prompts = []
    relation_prompts = []
    entity_label_set = []
    entity_name_set = []
    predicate_set = []
    subject_goldens = []
    inverse_relation_list = []
    predicate_list = []

    logger.info(f"Converting doccano data...")
    with tqdm(total=len(raw_examples)) as pbar:
        for line in raw_examples:
            items = json.loads(line)
            entity_id = 0
            if "data" in items.keys():
                relation_mode = False
                if isinstance(items["label"],
                              dict) and "entities" in items["label"].keys():
                    relation_mode = True
                text = items["data"]
                entities = []
                relations = []
                if not relation_mode:
                    # Export file in JSONL format which doccano < 1.7.0
                    # e.g. {"data": "", "label": [ [0, 2, "ORG"], ... ]}
                    for item in items["label"]:
                        entity = {
                            "id": entity_id,
                            "start_offset": item[0],
                            "end_offset": item[1],
                            "label": item[2]
                        }
                        entities.append(entity)
                        entity_id += 1
                else:
                    # Export file in JSONL format for relation labeling task which doccano < 1.7.0
                    # e.g. {"data": "", "label": {"relations": [ {"id": 0, "start_offset": 0, "end_offset": 6, "label": "ORG"}, ... ], "entities": [ {"id": 0, "from_id": 0, "to_id": 1, "type": "foundedAt"}, ... ]}}
                    entities.extend(
                        [entity for entity in items["label"]["entities"]])
                    if "relations" in items["label"].keys():
                        relations.extend([
                            relation for relation in items["label"]["relations"]
                        ])
            else:
                # Export file in JSONL format which doccano >= 1.7.0
                # e.g. {"text": "", "label": [ [0, 2, "ORG"], ... ]}
                if "label" in items.keys():
                    text = items["text"]
                    entities = []
                    for item in items["label"]:
                        entity = {
                            "id": entity_id,
                            "start_offset": item[0],
                            "end_offset": item[1],
                            "label": item[2]
                        }
                        entities.append(entity)
                        entity_id += 1
                    relations = []
                else:
                    # Export file in JSONL (relation) format
                    # e.g. {"text": "", "relations": [ {"id": 0, "start_offset": 0, "end_offset": 6, "label": "ORG"}, ... ], "entities": [ {"id": 0, "from_id": 0, "to_id": 1, "type": "foundedAt"}, ... ]}
                    text, relations, entities = items["text"], items[
                        "relations"], items["entities"]
            texts.append(text)

            entity_example = []
            entity_prompt = []
            entity_example_map = {}
            entity_map = {}  # id to entity name
            for entity in entities:
                entity_name = text[entity["start_offset"]:entity["end_offset"]]
                entity_map[entity["id"]] = {
                    "name": entity_name,
                    "start": entity["start_offset"],
                    "end": entity["end_offset"]
                }

                entity_label, entity_cls_label = _sep_cls_label(
                    entity["label"], separator)

                # Define the prompt prefix for entity-level classification
                entity_cls_prompt_prefix = entity_name + "的" + prompt_prefix
                if entity_cls_label is not None:
                    entity_cls_example = generate_cls_example(
                        text, entity_cls_label, entity_cls_prompt_prefix,
                        options)

                    entity_cls_examples.append(entity_cls_example)

                result = {
                    "text": entity_name,
                    "start": entity["start_offset"],
                    "end": entity["end_offset"]
                }
                if entity_label not in entity_example_map.keys():
                    entity_example_map[entity_label] = {
                        "content": text,
                        "result_list": [result],
                        "prompt": entity_label
                    }
                else:
                    entity_example_map[entity_label]["result_list"].append(
                        result)

                if entity_label not in entity_label_set:
                    entity_label_set.append(entity_label)
                if entity_name not in entity_name_set:
                    entity_name_set.append(entity_name)
                entity_prompt.append(entity_label)

            for v in entity_example_map.values():
                entity_example.append(v)

            entity_examples.append(entity_example)
            entity_prompts.append(entity_prompt)

            subject_golden = []  # Golden entity inputs
            relation_example = []
            relation_prompt = []
            relation_example_map = {}
            inverse_relation = []
            predicates = []
            for relation in relations:
                predicate = relation["type"]
                subject_id = relation["from_id"]
                object_id = relation["to_id"]
                # The relation prompt is constructed as follows:
                # subject + "的" + predicate
                prompt = entity_map[subject_id]["name"] + "的" + predicate
                if entity_map[subject_id]["name"] not in subject_golden:
                    subject_golden.append(entity_map[subject_id]["name"])
                result = {
                    "text": entity_map[object_id]["name"],
                    "start": entity_map[object_id]["start"],
                    "end": entity_map[object_id]["end"]
                }

                inverse_negative = entity_map[object_id][
                    "name"] + "的" + predicate
                inverse_relation.append(inverse_negative)
                predicates.append(predicate)

                if prompt not in relation_example_map.keys():
                    relation_example_map[prompt] = {
                        "content": text,
                        "result_list": [result],
                        "prompt": prompt
                    }
                else:
                    relation_example_map[prompt]["result_list"].append(result)

                if predicate not in predicate_set:
                    predicate_set.append(predicate)
                relation_prompt.append(prompt)

            for v in relation_example_map.values():
                relation_example.append(v)

            relation_examples.append(relation_example)
            relation_prompts.append(relation_prompt)
            subject_goldens.append(subject_golden)
            inverse_relation_list.append(inverse_relation)
            predicate_list.append(predicates)
            pbar.update(1)

    logger.info(f"Adding negative samples for first stage prompt...")
    positive_examples, negative_examples = add_entity_negative_example(
        entity_examples, texts, entity_prompts, entity_label_set,
        negative_ratio)
    if len(positive_examples) == 0:
        all_entity_examples = []
    else:
        all_entity_examples = positive_examples + negative_examples

    all_relation_examples = []
    if len(predicate_set) != 0:
        logger.info(f"Adding negative samples for second stage prompt...")
        if is_train:

            positive_examples = []
            negative_examples = []
            per_n_ratio = negative_ratio // 3

            with tqdm(total=len(texts)) as pbar:
                for i, text in enumerate(texts):
                    negative_example = []
                    collects = []
                    num_positive = len(relation_examples[i])

                    # 1. inverse_relation_list
                    redundants1 = inverse_relation_list[i]

                    # 2. entity_name_set ^ subject_goldens[i]
                    redundants2 = []
                    if len(predicate_list[i]) != 0:
                        nonentity_list = list(
                            set(entity_name_set) ^ set(subject_goldens[i]))
                        nonentity_list.sort()

                        redundants2 = [
                            nonentity + "的" +
                            predicate_list[i][random.randrange(
                                len(predicate_list[i]))]
                            for nonentity in nonentity_list
                        ]

                    # 3. entity_label_set ^ entity_prompts[i]
                    redundants3 = []
                    if len(subject_goldens[i]) != 0:
                        non_ent_label_list = list(
                            set(entity_label_set) ^ set(entity_prompts[i]))
                        non_ent_label_list.sort()

                        redundants3 = [
                            subject_goldens[i][random.randrange(
                                len(subject_goldens[i]))] + "的" + non_ent_label
                            for non_ent_label in non_ent_label_list
                        ]

                    redundants_list = [redundants1, redundants2, redundants3]

                    for redundants in redundants_list:
                        added, rest = add_relation_negative_example(
                            redundants,
                            texts[i],
                            num_positive,
                            per_n_ratio,
                        )
                        negative_example.extend(added)
                        collects.extend(rest)

                    num_sup = num_positive * negative_ratio - len(
                        negative_example)
                    if num_sup > 0 and collects:
                        if num_sup > len(collects):
                            idxs = [k for k in range(len(collects))]
                        else:
                            idxs = random.sample(range(0, len(collects)),
                                                 num_sup)
                        for idx in idxs:
                            negative_example.append(collects[idx])

                    positive_examples.extend(relation_examples[i])
                    negative_examples.extend(negative_example)
                    pbar.update(1)
            all_relation_examples = positive_examples + negative_examples
        else:
            relation_examples = add_full_negative_example(
                relation_examples, texts, relation_prompts, predicate_set,
                subject_goldens)
            all_relation_examples = [
                r for relation_example in relation_examples
                for r in relation_example
            ]
    return all_entity_examples, all_relation_examples, entity_cls_examples

（3）evaluate.py

# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import os
from functools import partial

import paddle
from paddlenlp.datasets import load_dataset, MapDataset
from paddlenlp.transformers import AutoTokenizer
from paddlenlp.metrics import SpanEvaluator
from paddlenlp.utils.log import logger

from model import UIE
from utils import convert_example, reader, unify_prompt_name, get_relation_type_dict, create_data_loader


@paddle.no_grad()
def evaluate(model, metric, data_loader):
    """
    Given a dataset, it evals model and computes the metric.
    Args:
        model(obj:`paddle.nn.Layer`): A model to classify texts.
        metric(obj:`paddle.metric.Metric`): The evaluation metric.
        data_loader(obj:`paddle.io.DataLoader`): The dataset loader which generates batches.
    """
    model.eval()
    metric.reset()
    for batch in data_loader:
        input_ids, token_type_ids, att_mask, pos_ids, start_ids, end_ids = batch
        start_prob, end_prob = model(input_ids, token_type_ids, att_mask,
                                     pos_ids)
        start_ids = paddle.cast(start_ids, 'float32')
        end_ids = paddle.cast(end_ids, 'float32')
        # todo 举例说明
        # metric = SpanEvaluator()
        # 在metric.compute(start_prob, end_prob, start_ids, end_ids)方法中，
        #  todo step1：对每条样本，筛选出其预测得分集里大于阈值的index。
        #   假设某条样本(maxlength=30)输出 “状态 = 是某个实体的start” 的预测得分集 = [
        #             0.00000031, 0.00000129, 0.00000024, 0.00000031, 0.50000042, 0.00070048,
        #             0.00000877, 0.00002038, 0.00000138, 0.00001625, 0.00000029, 0.00008086,
        #             0.00000202, 0.00000990, 0.00000406, 0.00016828, 0.00001869, 0.00021678,
        #             0.00000937, 0.00046841, 0.00002430, 0.00000023, 0.00006322, 0.00000595,
        #             0.00000095, 0.00000166, 0.00000030, 0.00000033, 0.00000023, 0.00000073]
        #   则得分大于0.5的index=[4]，即预测当前样本 index=4 处可能是一个实体的开始位置。
        #   依次对batch中每个样本的预测得分集做以上处理，得到每个样本的pre-start和pre-end。综合按batch输出，即：
        # pre-start = [[4],  [],  [14],  [4],  [8],   [],  [5, 8], [5, 8]]  # 可以看到当前批次的第7，8条样本中各有2个位置处的得分 > 阈值0.5。
        # pre_end =   [[7],  [],   [],   [7],  [9],   [],   [6],   [6, 9]]
        #
        # gold_start= [[4], [14], [14],  [4],  [8],  [13],   [5],    [8]]
        # glod_end =  [[7], [14], [14],  [7],  [8],  [14],   [6],    [9]]

        #  todo step2：沿文本序列方向组合(pre_start,pre_end)、(gold_start,gold_end)，得到
        # pre： (4,7),   (),     (),  (4,7),(8,9),   (),  (5,6), ((5,6),(8,9))
        # gold：(4,7),(14,14),(14,14),(4,7),(8,8),(13,14),(5,6),     (8,9)
        #
        #  todo step3：pre、gold对应组合求与&运算，得到：
        # result: 1,     0,      0,     1,    1,     0,     1,         1
        # pre： (4,7),   (),     (),  (4,7),(8,9),   (),  (5,6), ((5,6),(8,9))
        # gold：(4,7),(14,14),(14,14),(4,7),(8,8),(13,14),(5,6),     (8,9)

        # todo step4：统计预测组pre、真实组gold、正确组correct的数量
        # correct: 1,    0,      0,     1,    1,     0,     1,         1           --num_correct=5
        # pre： (4,7),   (),     (),  (4,7),(8,9),   (),  (5,6), ((5,6),(8,9))     --num_infer=6
        # gold：(4,7),(14,14),(14,14),(4,7),(8,8),(13,14),(5,6),     (8,9)         --num_label=8

        # todo step5：计算precious、recall、F1
        # precious = num_correct/num_infer
        # recall=num_correct/num_label
        # F1=2*precious*recall/(precious+recall)
        num_correct, num_infer, num_label = metric.compute(
            start_prob, end_prob, start_ids, end_ids)
        metric.update(num_correct, num_infer, num_label)
    precision, recall, f1 = metric.accumulate()
    model.train()
    return precision, recall, f1


def do_eval():
    tokenizer = AutoTokenizer.from_pretrained(args.model_path)
    model = UIE.from_pretrained(args.model_path)

    test_ds = load_dataset(reader,
                           data_path=args.test_path,
                           max_seq_len=args.max_seq_len,
                           lazy=False)
    class_dict = {}
    relation_data = []
    if args.debug:
        for data in test_ds:
            class_name = unify_prompt_name(data['prompt'])
            # Only positive examples are evaluated in debug mode
            if len(data['result_list']) != 0:
                if "的" not in data['prompt']:
                    class_dict.setdefault(class_name, []).append(data)
                else:
                    relation_data.append((data['prompt'], data))
        relation_type_dict = get_relation_type_dict(relation_data)
    else:
        class_dict["all_classes"] = test_ds

    trans_fn = partial(convert_example,
                       tokenizer=tokenizer,
                       max_seq_len=args.max_seq_len)

    for key in class_dict.keys():
        if args.debug:
            test_ds = MapDataset(class_dict[key])
        else:
            test_ds = class_dict[key]

        test_data_loader = create_data_loader(test_ds,
                                              mode="test.txt",
                                              batch_size=args.batch_size,
                                              trans_fn=trans_fn)

        metric = SpanEvaluator()
        precision, recall, f1 = evaluate(model, metric, test_data_loader)
        logger.info("-----------------------------")
        logger.info("Class Name: %s" % key)
        logger.info("Evaluation Precision: %.5f | Recall: %.5f | F1: %.5f" %
                    (precision, recall, f1))

    if args.debug and len(relation_type_dict.keys()) != 0:
        for key in relation_type_dict.keys():
            test_ds = MapDataset(relation_type_dict[key])

            test_data_loader = create_data_loader(test_ds,
                                                  mode="test.txt",
                                                  batch_size=args.batch_size,
                                                  trans_fn=trans_fn)

            metric = SpanEvaluator()
            precision, recall, f1 = evaluate(model, metric, test_data_loader)
            logger.info("-----------------------------")
            logger.info("Class Name: X的%s" % key)
            logger.info("Evaluation Precision: %.5f | Recall: %.5f | F1: %.5f" %
                        (precision, recall, f1))


if __name__ == "__main__":
    # yapf: disable
    parser = argparse.ArgumentParser()

    parser.add_argument("--model_path", type=str, default=None, help="The path of saved model that you want to load.")
    parser.add_argument("--test_path", type=str, default=None, help="The path of test.txt set.")
    parser.add_argument("--batch_size", type=int, default=16, help="Batch size per GPU/CPU for training.")
    # todo 默认是512，debug时可以设置小点，方便看矩阵里的值，比如30。
    # parser.add_argument("--max_seq_len", type=int, default=512, help="The maximum total input sequence length after tokenization.")
    parser.add_argument("--max_seq_len", type=int, default=30, help="The maximum total input sequence length after tokenization.")
    parser.add_argument("--debug", action='store_true', help="Precision, recall and F1 score are calculated for each class separately if this option is enabled.")

    args = parser.parse_args()
    # yapf: enable

    do_eval()

你可能感兴趣的:(算法学习,paddlepaddle,学习,人工智能)

【LLM】预训练的具体流程 FOUR_A LLM python 人工智能深度学习大模型
分词器训练预训练模型：就像你已经学会了一些基础知识的“大脑”，我们可以在这个基础上继续学习新东西。比如，有些模型已经学会了英语，但中文学得不够好。中文预训练：为了让这个“大脑”更好地理解中文，我们需要用大量的中文数据继续训练它。分词器（Tokenizer）：它的作用是把一句话拆分成一个个小单元（比如词语或字）。比如，“我喜欢学习”会被拆成“我/喜欢/学习”。这些拆分后的单元会被转换成数字，方便模型
递推和递归_一文学会递归递推 HR刀姐递推和递归
递归算法和递推算法无论是在ACM竞赛还是项目工程上都有着极为广泛的应用，但想要完全掌握两者的思想并不容易，对于刚刚接触编程的人来说更是这样，我在初次接触递归递推时就吃了很多的苦头，除了当时对编程语言不太熟悉之外，最大的原因就是难以理解其中的思想，本文将二者结合代码分别讲解，力求以"理论+实践"的方式使读者明白两种算法。一箭双雕，一文双递。一.递归和递推的区别学习递归递推的一个容易遇到的问题就是混淆
ZooKeeper学习总结（1）——ZooKeeper入门介绍一杯甜酒 ZooKeeper学习总结 Zookeeper
1.概述Zookeeper是Hadoop的一个子项目，它是分布式系统中的协调系统，可提供的服务主要有：配置服务、名字服务、分布式同步、组服务等。它有如下的一些特点：简单Zookeeper的核心是一个精简的文件系统，它支持一些简单的操作和一些抽象操作，例如，排序和通知。丰富Zookeeper的原语操作是很丰富的，可实现一些协调数据结构和协议。例如，分布式队列、分布式锁和一组同级别节点中的“领导者选举
Zookeeper+kafka学习笔记 CHR_YTU Zookeeper
Zookeeper是Apache的一个java项目，属于Hadoop系统，扮演管理员的角色。配置管理分布式系统都有好多机器，比如我在搭建hadoop的HDFS的时候，需要在一个主机器上（Master节点）配置好HDFS需要的各种配置文件，然后通过scp命令把这些配置文件拷贝到其他节点上，这样各个机器拿到的配置信息是一致的，才能成功运行起来HDFS服务。Zookeeper提供了这样的一种服务：一种集
autoMate - AI实现电脑任务自动化的本地工具小众AI AI开源人工智能自动化运维
GitHub：https://github.com/yuruotong1/autoMate更多AI开源软件：发现分享好用的AI工具、AI开源软件、AI模型、AI变现-小众AIautoMate是一款由开源开发的本地自动化工具，以AI+RPA（人工智能+机器人流程自动化）为核心特色。它将大型语言模型的智能理解与RPA的流程执行能力结合，用户只需用自然语言描述任务，如“整理桌面文件”或“生成周报”，即可
Zookeeper【概念（集中式到分布式、什么是分布式、CAP定理、什么是Zookeeper、应用场景、为什么选择Zookeeper 、基本概念）】(一)-全面详解（学习总结---从入门到深化）童小纯中间件大全---全面详解 zookeeper 分布式
作者简介：大家好，我是小童，Java开发工程师，CSDN博客博主，Java领域新星创作者系列专栏：前端、Java、Java中间件大全、微信小程序、微信支付、若依框架、Spring全家桶如果文章知识点有错误的地方，请指正！和大家一起学习，一起进步如果感觉博主的文章还不错的话，请三连支持一下博主哦博主正在努力完成2023计划中：以梦为马，扬帆起航，2023追梦人目录Zookeeper概念_集中式到分布
深度学习：马氏距离壹十壹深度学习深度学习人工智能
马氏距离（MahalanobisDistance）是一种用于计算不同维度数据点之间距离的度量方法。它考虑了数据的协方差结构，因此在处理具有相关性的多维数据时更加有效。与欧氏距离不同，马氏距离不仅考虑了各个变量的量纲，还考虑了它们之间的相关性。公式马氏距离计算两个向量(x)和(y)之间的距离，定义为：DM(x,y)=(x−y)TS−1(x−y)\D_M(x,y)=\sqrt{(x-y)^TS^{-1
深度学习：CPU和GPU算力壹十壹深度学习深度学习 gpu算力人工智能
一、算力“算力”（ComputingPower）通常是指计算机或计算系统执行计算任务的能力。它是衡量系统处理数据、运行算法以及执行计算任务效率的重要指标。根据上下文，算力可以在以下几种场景中具体化：1.单机算力CPU算力：中央处理器的计算能力，通常用核心数量（cores）、时钟频率（GHz）、以及每秒浮点运算次数（FLOPS）等指标衡量。GPU算力：图形处理单元用于并行处理的能力，尤其是在深度学习
深度学习：偏差和方差壹十壹深度学习深度学习人工智能 python 机器学习
偏差（Bias）偏差衡量了模型预测值的平均值与真实值之间的差距。换句话说，偏差描述了模型预测的准确度。一个高偏差的模型容易出现欠拟合，即模型无法捕捉数据中的真实关系，因为它对数据的特征做出了错误的假设。特征：高偏差的模型通常是过于简单的模型，无法对数据中的复杂关系进行准确建模。高偏差模型的训练误差和测试误差可能都较高。解决方法：增加模型复杂度：例如增加多项式的阶数、增加神经网络的层数等。使用更多的
HarmonyNext实战案例：基于ArkTS的高性能音视频处理应用开发 harmonyos-next
HarmonyNext实战案例：基于ArkTS的高性能音视频处理应用开发引言在HarmonyNext生态系统中，ArkTS作为新一代的编程语言，为开发者提供了强大的工具来构建高性能、跨平台的应用。本文将深入探讨如何使用ArkTS12+语法开发一个高性能的音视频处理应用，涵盖从基础概念到高级技巧的全面讲解。通过本案例，您将学习到如何利用HarmonyNext的特性，结合ArkTS的强大功能，实现复杂
Solana中的程序派生地址（PDAs）：是什么，为什么，以及如何？ GTokenTool发币平台区块链
程序派生地址(PDA)在Solana中的应用：什么、为什么和如何？在学习Solana时，你会经常听到关于程序派生地址(PDAs)的讨论。它们就像这样——强大、多功能，而且最重要的是，稍微被误解。如果你是一个开发者，试图理解它们，不用担心。我们将在本文中一起揭开PDAs的面纱。在本文中，我将从基础开始解释PDAs，假设你刚刚开始接触Solana。因此，不需要任何先前的知识——让我们开始吧。什么是PD
从零开始构建大模型(LLM)应用和老莫一起学AI 人工智能 ai 大模型语言模型 llm 自然语言处理学习
大模型（LLM）已经成为当前人工智能的重要部分。但是，在这个领域还没有固定的操作标准，开发者们往往没有明确的指导，需要不断尝试和摸索。在过去两年中，我帮助了许多公司利用LLM来开发了很多创新的应用产品。基于这些经验，我形成了一套实用的方法，并准备在这篇文章中与大家分享。这套方法将提供一些步骤，帮助需要的小伙伴在LLM应用开发的复杂环境中找到方向。从最初的构思到PoC、评估再到产品化，了解如何将创意
Zookeeper与Kafka学习笔记上海研博数据 zookeeper kafka 学习
一、Zookeeper核心要点1.核心特性分布式协调服务，用于维护配置/命名/同步等元数据采用层次化数据模型（Znode树结构），每个节点可存储<1MB数据典型应用场景：HadoopNameNode高可用HBase元数据管理Kafka集群选举与状态管理2.设计限制内存型存储，不适合大数据量场景数据变更通过版本号（Version）控制，实现乐观锁机制采用ZAB协议保证数据一致性二、Kafka核心架构
Zookeeper学习种豆走天下 zookeeper 学习分布式
Zookeeper是一个开源的分布式协调框架，它主要用于处理分布式系统中的一些常见问题，如同步、配置管理、命名服务和集群管理等。Zookeeper是由Apache提供的，并且广泛应用于各种分布式应用中，特别是在高可用、高可靠性和高性能的系统中。Zookeeper的主要功能分布式协调：Zookeeper提供了协调多个节点（服务器）间行为的机制。例如，分布式锁、选举、配置管理等。命名服务：Zookee
机器学习之线性代数珠峰日记 AI理论与实践机器学习线性代数人工智能
文章目录一、引言：线性代数为何是AI的基石二、向量：AI世界的基本构建块（一）向量的定义（二）向量基础操作（三）重要概念三、矩阵：AI数据的强大容器（一）矩阵的定义（二）矩阵运算（三）矩阵特性（四）矩阵分解（五）Python示例（使用NumPy库）四、线性代数在AI中的应用（一）数据表示（二）降维：PCA（三）线性回归（四）计算机视觉（五）自然语言处理一、引言：线性代数为何是AI的基石在人工智能领
GO语言学习笔记螺旋式上升abc golang 学习笔记
一、viper笔记【七米】https://liwenzhou.com/posts/Go/viper/二、优雅关机和平滑重启https://liwenzhou.com/posts/Go/graceful-shutdown/三、gin使用zaphttps://liwenzhou.com/posts/Go/zap-in-gin/四、flag用于命令行传参https://liwenzhou.com/pos
《Quick Start Kubernetes》读后感 python
一、为什么选择这本书？面试的时候经常被问到kubernetes(下称k8s)，所以打算学习k8s。看到《QuickStartKubernetes》的作者对自己所写的书持续地更新，被这种认真打动了，外加这本书只有100多页，所以选择了这本书作为入门k8s的教材。二、这本书写了什么？这本书介绍了什么是k8s,k8s的组成结构(controlplanenode,workernode)，演示了在Windo
有趣的学习Python-第十篇：Python的“魔法宝库”：标准库之旅王盼达有趣的学习Python 学习 python 开发语言
Python不仅是一门强大的编程语言，更像是一座充满宝藏的“魔法宝库”，里面装满了各种各样的“魔法工具”（标准库）。这些“魔法工具”可以帮助你轻松地完成各种任务，从文件操作到网络编程，从数据处理到性能优化。接下来，让我们一起探索Python的“魔法宝库”，看看这些“魔法工具”到底有多神奇！10.1操作系统接口：与“魔法世界”互动os模块就像是一个“魔法接口”，可以帮助你与操作系统进行互动。你可以用
有趣的学习Python-第八篇：Python的“魔法盾牌”：错误与异常处理王盼达有趣的学习Python 学习 python 开发语言
在Python的魔法世界里，即使是经验丰富的魔法师也可能遇到一些“魔法失误”。这些失误分为两种：语法错误和异常。别担心，Python为你准备了一面强大的“魔法盾牌”，帮助你应对这些挑战。8.1语法错误：魔法咒语写错了语法错误就像是你在念魔法咒语时，不小心说错了单词。这是学习Python过程中最常见的问题。比如，你可能忘记在while循环后面加上冒号：whileTrueprint('Hellowor
基于transformer实现机器翻译(日译中) 小白_laughter 课程学习 transformer 机器翻译深度学习
文章目录一、引言二、使用编码器—解码器和注意力机制来实现机器翻译模型2.0含注意力机制的编码器—解码器2.1读取和预处理数据2.2含注意力机制的编码器—解码器2.3训练模型2.4预测不定长的序列2.5评价翻译结果三、使用Transformer架构和PyTorch深度学习库来实现的日中机器翻译模型3.1、导入必要的库3.2、数据集准备3.3、准备分词器3.4、构建TorchText词汇表对象，并将句
【NLP 39、激活函数 ⑤ Swish激活函数】 L_cl NLP 自然语言处理人工智能
我的孤独原本是座荒岛，直到你称成潮汐，原来爱是让个体失序的永恒运动——25.2.25Swish激活函数是一种近年来在深度学习中广泛应用的激活函数，由GoogleBrain团队在2017年提出。其核心设计结合了Sigmoid门控机制和线性输入的乘积，通过引入平滑性和非单调性来提升模型性能。一、数学定义与变体1.基础形式Swish的标准表达式为：Swish(x)=x⋅σ(βx)其中：σ(x)是Sigm
机器学习(Machine Learning) 七指琴魔御清绝大数据学习
原文链接：http://blog.csdn.net/zhoubl668/article/details/42921187希望转载的朋友，你可以不用联系我．但是一定要保留原文链接，因为这个项目还在继续也在不定期更新．希望看到文章的朋友能够学到更多．《BriefHistoryofMachineLearning》介绍:这是一篇介绍机器学习历史的文章，介绍很全面，从感知机、神经网络、决策树、SVM、Ada
web前端期末大作业：婚纱网页主题网站设计——唯一旅拍婚纱公司网站HTML+CSS+JavaScript IT-司马青衫前端课程设计 html
‍静态网站的编写主要是用HTMLDⅣV+CSSJS等来完成页面的排版设计‍，一般的网页作业需要融入以下知识点：div布局、浮动定位、高级css、表格、表单及验证、js轮播图、音频视频Fash的应用、uli、下拉导航栏、鼠标划过效果等知识点，学生网页作业源码，制作水平和原创度都适合学习或交作业用，记得点赞。精彩专栏推荐【作者主页——获取更多优质源码】【web前端期末大作业——毕设项目精品实战案例(1
【Go语言圣经1.1】 Pyroyster golang 开发语言后端
目标学习Go的编译方式、包的组织方式以及工具链的统一调用方式概念与定义packageGo语言通过包来组织代码。包类似于其它语言的库librarries或模块modules，每个包通常对应一个目录，目录中的所有.go文件都属于同一个包。特殊的main包:当代码使用packagemain声明时，表示这是一个可独立执行的程序而非一个库。程序的执行入口就是main函数import通过import语句，编译
大语言模型(LLM)入门学习路线图_llm教程，从零基础到精通，理论与实践结合的最佳路径！ AGI学习社语言模型学习人工智能 LLM 大模型大数据自然语言处理
Github项目上有一个大语言模型学习路线笔记，它全面涵盖了大语言模型的所需的基础知识学习，LLM前沿算法和架构，以及如何将大语言模型进行工程化实践。这份资料是初学者或有一定基础的开发/算法人员入门活深入大型语言模型学习的优秀参考。这份资料重点介绍了我们应该掌握哪些核心知识，并推荐了一系列优质的学习视频和博客，旨在帮助大家系统性地掌握大型语言模型的相关技术。大语言模型（LargeLanguageM
机器学习实战——音乐流派分类（主页有源码）喵了个AI 机器学习实战机器学习分类人工智能
✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨1.简介音乐流派分类是音乐信息检索（MusicInformationRetrieval,MIR）中的一个重要任务，旨在通过分析音频信号的特征，将音乐自动分类到不同的流派（如古典、摇滚、爵士、流行等）。随着数字音乐平台的普及，音乐流派分类技术被广泛应用于音乐推荐、自动标签生成和音乐库管理
Flutter中使用NetworkImage加载网络图片缓存问题学习实践云水-禅心 flutter 缓存
Flutter中默认的NetworkImage会有缓存机制，如果图片的url不变化，但是url的图片已经发生变化，NetworkImage不会下载新的图片deepseek是这么解决问题的，但是在鸿蒙上禁用缓存无效在Flutter中，NetworkImage默认会使用缓存机制来优化性能。如果你想禁用缓存，可以通过以下几种方式实现：1.使用NetworkImage的headers参数你可以通过设置he
什么是XSS 藤原千花的败北 web漏洞 xss 前端 web安全网络安全
文章目录前言1.前端知识2.什么是XSS3.漏洞挖掘4.参考前言之前对XSS的理解就是停留在弹窗，认为XSS这种漏洞真的是漏洞吗？安全学习了蛮久了，也应该对XSS有更进一步的认识了。1.前端知识现代浏览器是一个高度复杂的软件系统，由多个核心组件协同工作，旨在高效、安全地呈现网页内容并执行交互逻辑。对一般用户来讲，其主要功能就是向服务器发出请求，在窗口中展示用户所选择的网络资源。这里所说的资源一般是
Oracle创建表空间、删除、状态、重命名、修改、增加、移动水煮白菜王 Oracle oracle 数据库
目录Oracle基本学习笔记创建表空间1.表空间创建格式3.表空间状态属性4.重命名表空间5.修改表空间数据文件的大小6.删除表空间的数据文件7.修改表空间中数据文件的状态8.表空间中数据文件的移动Oracle基本学习笔记创建表空间需要使用CREATETABLESPACE语句。其基本语法如下:CREATE[TEMPORARYIUNDO]TABLESPACEtablespacename[DATAFI
python catia catalog文件_Python封装的获取文件目录的函数卢新生 python catia catalog文件
获取指定文件夹中文件的函数，网上学习时东拼西凑的结果。注意，其中文件名如1.txt，文件路径如D:\文件夹\1.txt；direct为第一层子级importos#filePath输入文件夹全路径#mode#1递归获取所有文件名;#2递归获取所有文件路径;#3获取direct文件名;#4获取direct文件路径;#5获取direct文件名和direct子文件夹名;#6获取direct文件路径和dir
jvm调优总结（从基本概念到深度优化） oloz java jvm jdk 虚拟机应用服务器
JVM参数详解：http://www.cnblogs.com/redcreen/archive/2011/05/04/2037057.html Java虚拟机中，数据类型可以分为两类：基本类型和引用类型。基本类型的变量保存原始值，即：他代表的值就是数值本身；而引用类型的变量保存引用值。“引用值”代表了某个对象的引用，而不是对象本身，对象本身存放在这个引用值所表示的地址的位置。
【Scala十六】Scala核心十：柯里化函数 bit1129 scala
本篇文章重点说明什么是函数柯里化，这个语法现象的背后动机是什么，有什么样的应用场景，以及与部分应用函数(Partial Applied Function)之间的联系 1. 什么是柯里化函数 A way to write functions with multiple parameter lists. For instance def f(x: Int)(y: Int) is a
HashMap dalan_123 java
HashMap在java中对很多人来说都是熟的；基于hash表的map接口的非同步实现。允许使用null和null键；同时不能保证元素的顺序；也就是从来都不保证其中的元素的顺序恒久不变。 1、数据结构在java中，最基本的数据结构无外乎：数组和引用（指针），所有的数据结构都可以用这两个来构造，HashMap也不例外，归根到底HashMap就是一个链表散列的数据
Java Swing如何实时刷新JTextArea，以显示刚才加append的内容周凡杨 java 更新 swing JTextArea
在代码中执行完textArea.append("message")后，如果你想让这个更新立刻显示在界面上而不是等swing的主线程返回后刷新，我们一般会在该语句后调用textArea.invalidate()和textArea.repaint()。问题是这个方法并不能有任何效果，textArea的内容没有任何变化，这或许是swing的一个bug，有一个笨拙的办法可以实现
servlet或struts的Action处理ajax请求 g21121 servlet
其实处理ajax的请求非常简单，直接看代码就行了： //如果用的是struts //HttpServletResponse response = ServletActionContext.getResponse(); // 设置输出为文字流 response.setContentType("text/plain"); // 设置字符集 res
FineReport的公式编辑框的语法简介老A不折腾 finereport 公式总结
FINEREPORT用到公式的地方非常多，单元格（以=开头的便被解析为公式），条件显示，数据字典，报表填报属性值定义，图表标题，轴定义，页眉页脚，甚至单元格的其他属性中的鼠标悬浮提示内容都可以写公式。简单的说下自己感觉的公式要注意的几个地方： 1.if语句语法刚接触感觉比较奇怪，if(条件式子,值1,值2)，if可以嵌套，if(条件式子1，值1，if(条件式子2，值2，值3)
linux mysql 数据库乱码的解决办法墙头上一根草 linux mysql 数据库乱码
linux 上mysql数据库区分大小写的配置 lower_case_table_names=1 1-不区分大小写 0-区分大小写修改/etc/my.cnf 具体的修改内容如下: [client] default-character-set=utf8 [mysqld] datadir=/var/lib/mysql socket=/va
我的spring学习笔记6-ApplicationContext实例化的参数兼容思想 aijuans Spring 3
ApplicationContext能读取多个Bean定义文件，方法是： ApplicationContext appContext = new ClassPathXmlApplicationContext（ new String[]｛“bean-config1.xml”，“bean-config2.xml”，“bean-config3.xml”，“bean-config4.xml
mysql 基准测试之sysbench annan211 基准测试 mysql基准测试 MySQL测试 sysbench
1 执行如下命令，安装sysbench-0.5： tar xzvf sysbench-0.5.tar.gz cd sysbench-0.5 chmod +x autogen.sh ./autogen.sh ./configure --with-mysql --with-mysql-includes=/usr/local/mysql
sql的复杂查询使用案列与技巧百合不是茶 oracle sql 函数数据分页合并查询
本片博客使用的数据库表是oracle中的scott用户表; ------------------- 自然连接查询查询 smith 的上司(两种方法) &
深入学习Thread类 bijian1013 java thread 多线程 java多线程
一．线程的名字下面来看一下Thread类的name属性，它的类型是String。它其实就是线程的名字。在Thread类中，有String getName()和void setName(String)两个方法用来设置和获取这个属性的值。同时，Thr
JSON串转换成Map以及如何转换到对应的数据类型 bijian1013 java fastjson net.sf.json
在实际开发中，难免会碰到JSON串转换成Map的情况，下面来看看这方面的实例。另外，由于fastjson只支持JDK1.5及以上版本，因此在JDK1.4的项目中可以采用net.sf.json来处理。一.fastjson实例 JsonUtil.java package com.study; impor
【RPC框架HttpInvoker一】HttpInvoker：Spring自带RPC框架 bit1129 spring
HttpInvoker是Spring原生的RPC调用框架，HttpInvoker同Burlap和Hessian一样，提供了一致的服务Exporter以及客户端的服务代理工厂Bean，这篇文章主要是复制粘贴了Hessian与Spring集成一文，【RPC框架Hessian四】Hessian与Spring集成在【RPC框架Hessian二】Hessian 对象序列化和反序列化一文中
【Mahout二】基于Mahout CBayes算法的20newsgroup的脚本分析 bit1129 Mahout
#!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information re
nginx三种获取用户真实ip的方法 ronin47
随着nginx的迅速崛起，越来越多公司将apache更换成nginx. 同时也越来越多人使用nginx作为负载均衡, 并且代理前面可能还加上了CDN加速，但是随之也遇到一个问题：nginx如何获取用户的真实IP地址,如果后端是apache,请跳转到<apache获取用户真实IP地址>，如果是后端真实服务器是nginx，那么继续往下看。实例环境：用户IP 120.22.11.11
java-判断二叉树是不是平衡 bylijinnan java
参考了 http://zhedahht.blog.163.com/blog/static/25411174201142733927831/ 但是用java来实现有一个问题。由于Java无法像C那样“传递参数的地址，函数返回时能得到参数的值”，唯有新建一个辅助类：AuxClass import ljn.help.*; public class BalancedBTree {
BeanUtils.copyProperties VS PropertyUtils.copyProperties 诸葛不亮 PropertyUtils BeanUtils
BeanUtils.copyProperties VS PropertyUtils.copyProperties 作为两个bean属性copy的工具类，他们被广泛使用，同时也很容易误用，给人造成困然；比如：昨天发现同事在使用BeanUtils.copyProperties copy有integer类型属性的bean时，没有考虑到会将null转换为0，而后面的业
[金融与信息安全]最简单的数据结构最安全 comsci 数据结构
现在最流行的数据库的数据存储文件都具有复杂的文件头格式，用操作系统的记事本软件是无法正常浏览的，这样的情况会有什么问题呢？从信息安全的角度来看，如果我们数据库系统仅仅把这种格式的数据文件做异地备份，如果相同版本的所有数据库管理系统都同时被攻击，那么
vi区段删除 Cwind linux vi 区段删除
区段删除是编辑和分析一些冗长的配置文件或日志文件时比较常用的操作。简记下vi区段删除要点备忘。 vi概述引文中并未将末行模式单独列为一种模式。单不单列并不重要，能区分命令模式与末行模式即可。 vi区段删除步骤： 1. 在末行模式下使用:set nu显示行号非必须，随光标移动vi右下角也会显示行号，能够正确找到并记录删除开始行
清除tomcat缓存的方法总结 dashuaifu tomcat 缓存
用tomcat容器，大家可能会发现这样的问题，修改jsp文件后，但用IE打开依然是以前的Jsp的页面。出现这种现象的原因主要是tomcat缓存的原因。解决办法如下: 在jsp文件头加上 <meta http-equiv="Expires" content="0"> <meta http-equiv="kiben&qu
不要盲目的在项目中使用LESS CSS dcj3sjt126com Web less
　如果你还不知道LESS CSS是什么东西，可以看一下这篇文章，是我一朋友写给新人看的《CSS——LESS》　　不可否认，LESS CSS是个强大的工具，它弥补了css没有变量、无法运算等一些“先天缺陷”，但它似乎给我一种错觉，就是为了功能而实现功能。　　比如它的引用功能 ? .rounded_corners{
[入门]更上一层楼 dcj3sjt126com PHP yii2
更上一层楼通篇阅读完整个“入门”部分，你就完成了一个完整 Yii 应用的创建。在此过程中你学到了如何实现一些常用功能，例如通过 HTML 表单从用户那获取数据，从数据库中获取数据并以分页形式显示。你还学到了如何通过 Gii 去自动生成代码。使用 Gii 生成代码把 Web 开发中多数繁杂的过程转化为仅仅填写几个表单就行。本章将介绍一些有助于更好使用 Yii 的资源：
Apache HttpClient使用详解 eksliang httpclient http协议
Http协议的重要性相信不用我多说了，HttpClient相比传统JDK自带的URLConnection，增加了易用性和灵活性（具体区别，日后我们再讨论），它不仅是客户端发送Http请求变得容易，而且也方便了开发人员测试接口（基于Http协议的），即提高了开发的效率，也方便提高代码的健壮性。因此熟练掌握HttpClient是很重要的必修内容，掌握HttpClient后，相信对于Http协议的了解会
zxing二维码扫描功能 gundumw100 android zxing
经常要用到二维码扫描功能现给出示例代码 import com.google.zxing.WriterException; import com.zxing.activity.CaptureActivity; import com.zxing.encoding.EncodingHandler; import android.app.Activity; import an
纯HTML+CSS带说明的黄色导航菜单 ini html Web html5 css hovertree
HoverTree带说明的CSS菜单:纯HTML+CSS结构链接带说明的黄色导航在线体验效果：http://hovertree.com/texiao/css/1.htm代码如下,保存到HTML文件可以看到效果： <!DOCTYPE html > <html > <head> <title>HoverTree
fastjson初始化对性能的影响 kane_xie fastjson 序列化
之前在项目中序列化是用thrift，性能一般，而且需要用编译器生成新的类，在序列化和反序列化的时候感觉很繁琐，因此想转到json阵营。对比了jackson，gson等框架之后，决定用fastjson，为什么呢，因为看名字感觉很快。。。网上的说法： fastjson 是一个性能很好的 Java 语言实现的 JSON 解析器和生成器，来自阿里巴巴的工程师开发。
基于Mybatis封装的增删改查实现通用自动化sql mengqingyu DAO
1.基于map或javaBean的增删改查可实现不写dao接口和实现类以及xml，有效的提高开发速度。 2.支持自定义注解包括主键生成、列重复验证、列名、表名等 3.支持批量插入、批量更新、批量删除 <bean id="dynamicSqlSessionTemplate" class="com.mqy.mybatis.support.Dynamic
js控制input输入框的方法封装(数字，中文，字母，浮点数等) qifeifei javascript js
在项目开发的时候，经常有一些输入框，控制输入的格式，而不是等输入好了再去检查格式，格式错了就报错，体验不好。 /** 数字，中文，字母,浮点数(+/-/.) 类型输入限制，只要在input标签上加上 jInput="number,chinese,alphabet,floating" 备注：floating属性只能单独用*/ funct
java 计时器应用 tangqi609567707 java timer
mport java.util.TimerTask; import java.util.Calendar; public class MyTask extends TimerTask { private static final int
erlang输出调用栈信息 wudixiaotie erlang
在erlang otp的开发中，如果调用第三方的应用，会有有些错误会不打印栈信息，因为有可能第三方应用会catch然后输出自己的错误信息，所以对排查bug有很大的阻碍，这样就要求我们自己打印调用的栈信息。用这个函数：erlang:process_display (self (), backtrace).需要注意这个函数只会输出到标准错误输出。也可以用这个函数：erlang:get_s