qq_19840551

基于Seq2Seq的问答摘要与推理问题方案

1.项目背景：

主题为汽车大师问答摘要与推理。要求使用汽车大师提供的11万条 技师与用户的多轮对话与诊断建议报告 数据建立模型，模型需基于对话文本、用户问题、车型与车系，输出包含摘要与推断的报告文本，综合考验模型的归纳总结与推断能力。

汽车大师是一款通过在线咨询问答为车主解决用车问题的APP，致力于做车主身边靠谱的用车顾问，车主用语音、文字或图片发布汽车问题，系统为其匹配专业技师提供及时有效的咨询服务。由于平台用户基数众多，问题重合度较高，大部分问题在平台上曾得到过解答。重复回答和持续时间长的多轮问询不仅会花去汽修技师大量时间，也使用户获取解决方案的时间变长，对双方来说都存在资源浪费的情况。

为了节省更多人工时间，提高用户获取回答和解决方案的效率，汽车大师希望通过利用机器学习对平台积累的大量历史问答数据进行模型训练，基于汽车大师平台提供的历史多轮问答文本，输出完整的建议报告和回答，让用户在线通过人工智能语义识别即时获得全套解决方案。

2.数据预处理

  train_df = pd.read_csv(train_data_path)
  test_df = pd.read_csv(test_data_path)
  print('train data size {},test data size {}'.format(len(train_df), len(test_df)))
  print(train_df.iloc[:1])

train data size 82943,test data size 20000

QID Brand  Model             Question  \
0  Q1    奔驰  奔驰GL级  方向机重，助力泵，方向机都换了还是一样   

                                            Dialogue Report  
0  技师说：[语音]|车主说：新的都换了|车主说：助力泵，方向机|技师说：[语音]|车主说：换了...   随时联系

2.1空值填充

train_df.dropna(subset=['Question', 'Dialogue', 'Report'], how='any', inplace=True)
test_df.dropna(subset=['Question', 'Dialogue'], how='any', inplace=True)

2.2多线程批量处理数据

from multi_proc_utils import parallelize
train_df = parallelize(train_df, sentences_proc)
test_df = parallelize(test_df, sentences_proc)

清除无用词

def clean_sentence(sentence):
    '''
    特殊符号去除
    :param sentence: 待处理的字符串
    :return: 过滤特殊字符后的字符串
    '''
    if isinstance(sentence, str):
        return re.sub(
            # r'[\s+\-\!\/\[\]\{\}_,.$%^*(+\"\')]+|[:：+——()?【】“”！，。？、~@#￥%……&*（）]+|车主说|技师说|语音|图片|你好|您好',
            r'[\s+\-\/\[\]\{\}_$%^*(+\"\')]+|[+——()【】“”~@#￥%……&*（）]+|你好|您好',
            ' ', sentence)
    else:
        return ' '

过滤停用词

def filter_words(sentence):
    '''
    过滤停用词
    :param seg_list: 切好词的列表 [word1 ,word2 .......]
    :return: 过滤后的停用词
    '''
    words = sentence.split(' ')
    # 去掉多余空字符
    words = [word for word in words if word]
    # 去掉停用词 包括一下标点符号也会去掉
    words = [word for word in words if word not in stop_words]

    return words

2.3合并训练测试集合

将训练数据中['Question', 'Dialogue', 'Report']列和测试数据中['Question', 'Dialogue']合并，并保存

train_df['merged'] = train_df[['Question', 'Dialogue', 'Report']].apply(lambda x: ' '.join(x), axis=1)
test_df['merged'] = test_df[['Question', 'Dialogue']].apply(lambda x: ' '.join(x), axis=1)
print(train_df['merged'].shape)
print(test_df['merged'].shape)
merged_df = pd.concat([train_df[['merged']], test_df[['merged']]], axis=0)
merged_df.to_csv(merger_seg_path, index=None, header=False)

3.生成词向量

使用gensim工具生成初始词向量

from gensim.models.word2vec import LineSentence, Word2Vec

 wv_model = Word2Vec(LineSentence(merger_seg_path),
                        size=embedding_dim,
                        sg=1,
                        workers=8,
                        iter=wv_train_epochs,
                        window=5,
                        min_count=5)

分离数据和标签：['Question', 'Dialogue']当做数据，['Report']当做标签

train_df['X'] = train_df[['Question', 'Dialogue']].apply(lambda x: ' '.join(x), axis=1)
test_df['X'] = test_df[['Question', 'Dialogue']].apply(lambda x: ' '.join(x), axis=1)

填充开始结束符号,未知词填充 oov, 长度填充

def pad_proc(sentence, max_len, vocab):
    '''
    # 填充字段
    < start > < end > < pad > < unk > max_lens
    '''
    # 0.按空格统计切分出词
    words = sentence.strip().split(' ')
    # 1. 截取规定长度的词数
    words = words[:max_len]
    # 2. 填充< unk > ,判断是否在vocab中, 不在填充 < unk >
    sentence = [word if word in vocab else '' for word in words]
    # 3. 填充< start > < end >
    sentence = [''] + sentence + ['']
    # 4. 判断长度，填充　< pad >
    sentence = sentence + [''] * (max_len - len(words))
    return ' '.join(sentence)

   
# 使用GenSim训练得出的vocab
vocab = wv_model.wv.vocab

# 获取适当的最大长度
train_x_max_len = get_max_len(train_df['X'])
test_X_max_len = get_max_len(test_df['X'])
X_max_len = max(train_x_max_len, test_X_max_len)
train_df['X'] = train_df['X'].apply(lambda x: pad_proc(x, X_max_len, vocab))

# 获取适当的最大长度
test_df['X'] = test_df['X'].apply(lambda x: pad_proc(x, X_max_len, vocab))

# 获取适当的最大长度
train_y_max_len = get_max_len(train_df['Report'])
train_df['Y'] = train_df['Report'].apply(lambda x: pad_proc(x, train_y_max_len, vocab))

保存pad oov处理后的,数据和标签

train_df['X'].to_csv(train_x_pad_path, index=None, header=False)
train_df['Y'].to_csv(train_y_pad_path, index=None, header=False)
test_df['X'].to_csv(test_x_pad_path, index=None, header=False)

再次训练词向量，因为新加的符号不在词表 和 词向量矩阵

wv_model.build_vocab(LineSentence(train_x_pad_path), update=True)
wv_model.train(LineSentence(train_x_pad_path), epochs=1, total_examples=wv_model.corpus_count)
print('1/3')
wv_model.build_vocab(LineSentence(train_y_pad_path), update=True)
wv_model.train(LineSentence(train_y_pad_path), epochs=1, total_examples=wv_model.corpus_count)
print('2/3')
wv_model.build_vocab(LineSentence(test_x_pad_path), update=True)
wv_model.train(LineSentence(test_x_pad_path), epochs=1, total_examples=wv_model.corpus_count)

保存词向量模型和、字典和词向量矩阵

    # 保存词向量模型
    wv_model.save(save_wv_model_path)
    print('finish retrain w2v model')
    print('final w2v_model has vocabulary of ', len(wv_model.wv.vocab))

    # 更新vocab
    vocab = {word: index for index, word in enumerate(wv_model.wv.index2word)}
    reverse_vocab = {index: word for index, word in enumerate(wv_model.wv.index2word)}

    # 保存字典
    save_dict(vocab_path, vocab)
    save_dict(reverse_vocab_path, reverse_vocab)

    # 保存词向量矩阵
    embedding_matrix = wv_model.wv.vectors
    np.save(embedding_matrix_path, embedding_matrix)

4.保存训练测试数据

    # 数据集转换 将词转换成索引  [ 方向机 重 ...] -> [32800, 403, 986, 246, 231
    train_ids_x = train_df['X'].apply(lambda x: transform_data(x, vocab))
    train_ids_y = train_df['Y'].apply(lambda x: transform_data(x, vocab))
    test_ids_x = test_df['X'].apply(lambda x: transform_data(x, vocab))

    # 数据转换成numpy数组
    # 将索引列表转换成矩阵 [32800, 403, 986, 246, 231] --> array([[32800,   403,   986 ]]
    train_X = np.array(train_ids_x.tolist())
    train_Y = np.array(train_ids_y.tolist())
    test_X = np.array(test_ids_x.tolist())

    # 保存数据
    np.save(train_x_path, train_X)
    np.save(train_y_path, train_Y)
    np.save(test_x_path, test_X)

5.模型训练

输入通过一个编码器模型，该模型给出了形状的编码器输出（batch_size, max_length, hidden_size）和编码器隐藏形状的状态（batch_size, hidden_size）

FC=全连接层，EO=编码器输出，H=隐藏层状态，X=解码器输入，模型计算过程如下表示：

score = FC(tanh(FC(EO) + FC(H)))
attention weights = softmax(score, axis = 1)
context vector = sum(attention weights * EO, axis = 1)
embedding output=解码器输入X，输入词嵌入层
merged vector=concat(embedding output, context vector)
将merged vector输入到GRU

5.1基本参数设置

# 训练集的长度
BUFFER_SIZE = len(train_X)

# 输入的长度
max_length_inp=train_X.shape[1]
# 输出的长度
max_length_targ=train_Y.shape[1]

BATCH_SIZE = 64

# 训练一轮需要迭代多少步
steps_per_epoch = len(train_X)//BATCH_SIZE

# 词向量维度
embedding_dim = 300
# 隐藏层单元数
units = 1024

# 词表大小
vocab_size = len(vocab)

# 构建训练集
dataset = tf.data.Dataset.from_tensor_slices((train_X, train_Y)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

5.2构建Encoder

class Encoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim ,embedding_matrix , enc_units, batch_sz):
        super(Encoder, self).__init__()
        self.batch_sz = batch_sz
        self.enc_units = enc_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim,weights=[embedding_matrix],trainable=False)
        self.gru = tf.keras.layers.GRU(self.enc_units,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')

    def call(self, x, hidden):
        x = self.embedding(x)
        output, state = self.gru(x, initial_state = hidden)
        return output, state

    def initialize_hidden_state(self):
        return tf.zeros((self.batch_sz, self.enc_units))

测试Encoder：

encoder = Encoder(vocab_size, embedding_dim,embedding_matrix, units, BATCH_SIZE)
# example_input
example_input_batch = tf.ones(shape=(BATCH_SIZE,max_length_inp), dtype=tf.int32)
# sample input
sample_hidden = encoder.initialize_hidden_state()
sample_output, sample_hidden = encoder(example_input_batch, sample_hidden)
print ('Encoder output shape: (batch size, sequence length, units) {}'.format(sample_output.shape))
print ('Encoder Hidden state shape: (batch size, units) {}'.format(sample_hidden.shape))

Encoder output shape: (batch size, sequence length, units) (64, 261, 1024)
Encoder Hidden state shape: (batch size, units) (64, 1024)

5.3构建Attention

定一组向量集合values，以及查询向量query，我们根据query向量去计算values加权和，即成为attention机制。

attention的重点即为求这个集合values中每个value的权值。

class BahdanauAttention(tf.keras.layers.Layer):
    def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units)
        self.W2 = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)

    def call(self, query, values):
        
        # query为上次的GRU隐藏层
        # values为编码器的编码结果enc_output
        # 在seq2seq模型中，St是后面的query向量，而编码过程的隐藏状态hi是values。
        hidden_with_time_axis = tf.expand_dims(query, 1)

        
        # 计算注意力权重值
        score = self.V(tf.nn.tanh(
            self.W1(values) + self.W2(hidden_with_time_axis)))

        # attention_weights shape == (batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(score, axis=1)
        
        # # 使用注意力权重*编码器输出作为返回值，将来会作为解码器的输入
        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * values
        context_vector = tf.reduce_sum(context_vector, axis=1)

        return context_vector, attention_weights

测试Attention：

attention_layer = BahdanauAttention(10)
attention_result, attention_weights = attention_layer(sample_hidden, sample_output)

print("Attention result shape: (batch size, units) {}".format(attention_result.shape))
print("Attention weights shape: (batch_size, sequence_length, 1) {}".format(attention_weights.shape))

Attention result shape: (batch size, units) (64, 1024)
Attention weights shape: (batch_size, sequence_length, 1) (64, 261, 1)

5.4构建Decoder

class Decoder(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim,embedding_matrix, dec_units, batch_sz):
        super(Decoder, self).__init__()
        self.batch_sz = batch_sz
        self.dec_units = dec_units
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim,weights=[embedding_matrix],trainable=False)
        self.gru = tf.keras.layers.GRU(self.dec_units,
                                       return_sequences=True,
                                       return_state=True,
                                       recurrent_initializer='glorot_uniform')
        self.fc = tf.keras.layers.Dense(vocab_size)

        # used for attention
        self.attention = BahdanauAttention(self.dec_units)

    def call(self, x, hidden, enc_output):
        # 使用上次的隐藏层（第一次使用编码器隐藏层）、编码器输出计算注意力权重
        # enc_output shape == (batch_size, max_length, hidden_size)
        context_vector, attention_weights = self.attention(hidden, enc_output)

        # x shape after passing through embedding == (batch_size, 1, embedding_dim)
        x = self.embedding(x)
        
        # 将上一循环的预测结果跟注意力权重值结合在一起作为本次的GRU网络输入
        # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
        x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

        # passing the concatenated vector to the GRU
        output, state = self.gru(x)

        # output shape == (batch_size * 1, hidden_size)
        output = tf.reshape(output, (-1, output.shape[2]))

        # output shape == (batch_size, vocab)
        x = self.fc(output)

        return x, state, attention_weights

测试Decoder:

decoder = Decoder(vocab_size, embedding_dim,embedding_matrix, units, BATCH_SIZE)

sample_decoder_output, _, _ = decoder(tf.random.uniform((64, 1)),
                                      sample_hidden, sample_output)

print ('Decoder output shape: (batch_size, vocab size) {}'.format(sample_decoder_output.shape))

Decoder output shape: (batch_size, vocab size) (64, 32804)

5.5定义优化函数

optimizer = tf.keras.optimizers.Adam()
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')

pad_index=vocab['']

def loss_function(real, pred):
    #找到非对应的位置
    mask = tf.math.logical_not(tf.math.equal(real, pad_index))
    
    loss_ = loss_object(real, pred)

    mask = tf.cast(mask, dtype=loss_.dtype)
    #排除的loss
    loss_ *= mask

    return tf.reduce_mean(loss_)

5.6保存点设置

checkpoint_dir = 'data/checkpoints/training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 encoder=encoder,
                                 decoder=decoder)

6.训练

用到teacher forcing，Decoder阶段的target作为输入,使用先验单词提高收敛速度

@tf.function
def train_step(inp, targ, enc_hidden):
    loss = 0

    with tf.GradientTape() as tape:
        # 1. 构建encoder
        enc_output, enc_hidden = encoder(inp, enc_hidden)
        # 2. 复制
        dec_hidden = enc_hidden
        # 3.  * BATCH_SIZE 
        dec_input = tf.expand_dims([vocab['']] * BATCH_SIZE, 1)

        # Teacher forcing - feeding the target as the next input
        for t in range(1, targ.shape[1]):
            # decoder(x, hidden, enc_output)
            predictions, dec_hidden, _ = decoder(dec_input, dec_hidden, enc_output)
            
            loss += loss_function(targ[:, t], predictions)

            # using teacher forcing
            dec_input = tf.expand_dims(targ[:, t], 1)

        batch_loss = (loss / int(targ.shape[1]))

        variables = encoder.trainable_variables + decoder.trainable_variables

        gradients = tape.gradient(loss, variables)

        optimizer.apply_gradients(zip(gradients, variables))

        return batch_loss

EPOCHS = 10

for epoch in range(EPOCHS):
    start = time.time()
    
    # 初始化隐藏层
    enc_hidden = encoder.initialize_hidden_state()
    total_loss = 0

    for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
        # 
        batch_loss = train_step(inp, targ, enc_hidden)
        total_loss += batch_loss

        if batch % 1 == 0:
            print('Epoch {} Batch {} Loss {:.4f}'.format(epoch + 1,
                                                         batch,
                                                         batch_loss.numpy()))
    # saving (checkpoint) the model every 2 epochs
    if (epoch + 1) % 2 == 0:
        checkpoint.save(file_prefix = checkpoint_prefix)

    print('Epoch {} Loss {:.4f}'.format(epoch + 1,
                                      total_loss / steps_per_epoch))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Epoch 1 Batch 0 Loss 2.6071
Epoch 1 Batch 1 Loss 2.4794
Epoch 1 Batch 2 Loss 2.6189
Epoch 1 Batch 3 Loss 2.3431
Epoch 1 Batch 4 Loss 3.1416
Epoch 1 Batch 5 Loss 2.5989
Epoch 1 Batch 6 Loss 2.8121
Epoch 1 Batch 7 Loss 2.6711
Epoch 1 Batch 8 Loss 2.3373
Epoch 1 Batch 9 Loss 2.2395
Epoch 1 Loss 2.5849
Time taken for 1 epoch 122.1180808544159 sec

Epoch 2 Batch 0 Loss 2.3729
Epoch 2 Batch 1 Loss 2.6925
Epoch 2 Batch 2 Loss 2.6551
Epoch 2 Batch 3 Loss 2.5503
Epoch 2 Batch 4 Loss 2.5938
Epoch 2 Batch 5 Loss 2.2217
Epoch 2 Batch 6 Loss 2.3711
Epoch 2 Batch 7 Loss 2.4233
Epoch 2 Batch 8 Loss 2.3542
Epoch 2 Batch 9 Loss 2.4876
Epoch 2 Loss 2.4722
Time taken for 1 epoch 122.43454432487488 sec

Epoch 3 Batch 0 Loss 2.2104
Epoch 3 Batch 1 Loss 2.3985
Epoch 3 Batch 2 Loss 2.3234
Epoch 3 Batch 3 Loss 2.2387
Epoch 3 Batch 4 Loss 2.2565
Epoch 3 Batch 5 Loss 2.4140
Epoch 3 Batch 6 Loss 1.8723
Epoch 3 Batch 7 Loss 2.1348
Epoch 3 Batch 8 Loss 2.7321
Epoch 3 Batch 9 Loss 2.6459
Epoch 3 Loss 2.3227
Time taken for 1 epoch 122.04101395606995 sec

Epoch 4 Batch 0 Loss 2.1034
Epoch 4 Batch 1 Loss 2.4706
Epoch 4 Batch 2 Loss 2.0830
Epoch 4 Batch 3 Loss 2.2704
Epoch 4 Batch 4 Loss 1.8460
Epoch 4 Batch 5 Loss 2.2392
Epoch 4 Batch 6 Loss 2.1061
Epoch 4 Batch 7 Loss 1.7663
Epoch 4 Batch 8 Loss 2.3976
Epoch 4 Batch 9 Loss 2.1924
Epoch 4 Loss 2.1475
Time taken for 1 epoch 122.41424536705017 sec

Epoch 5 Batch 0 Loss 2.1025
Epoch 5 Batch 1 Loss 1.8814
Epoch 5 Batch 2 Loss 1.8141
Epoch 5 Batch 3 Loss 2.2037
Epoch 5 Batch 4 Loss 1.8383
Epoch 5 Batch 5 Loss 2.0435
Epoch 5 Batch 6 Loss 2.0946
Epoch 5 Batch 7 Loss 1.8200
Epoch 5 Batch 8 Loss 2.0206
Epoch 5 Batch 9 Loss 1.9670
Epoch 5 Loss 1.9786
Time taken for 1 epoch 122.04332089424133 sec

Epoch 6 Batch 0 Loss 1.7780
Epoch 6 Batch 1 Loss 2.0647
Epoch 6 Batch 2 Loss 1.8531
Epoch 6 Batch 3 Loss 1.7665
Epoch 6 Batch 4 Loss 1.7081
Epoch 6 Batch 5 Loss 1.8388
Epoch 6 Batch 6 Loss 1.7397
Epoch 6 Batch 7 Loss 1.8886
Epoch 6 Batch 8 Loss 1.9132
Epoch 6 Batch 9 Loss 1.7469
Epoch 6 Loss 1.8298
Time taken for 1 epoch 122.37464952468872 sec

Epoch 7 Batch 0 Loss 1.5922
Epoch 7 Batch 1 Loss 1.6886
Epoch 7 Batch 2 Loss 1.6815
Epoch 7 Batch 3 Loss 1.7721
Epoch 7 Batch 4 Loss 1.5859
Epoch 7 Batch 5 Loss 1.5939
Epoch 7 Batch 6 Loss 1.6164
Epoch 7 Batch 7 Loss 1.8224
Epoch 7 Batch 8 Loss 1.8248
Epoch 7 Batch 9 Loss 1.7638
Epoch 7 Loss 1.6942
Time taken for 1 epoch 121.93808913230896 sec

Epoch 8 Batch 0 Loss 1.6082
Epoch 8 Batch 1 Loss 1.6250
Epoch 8 Batch 2 Loss 1.4295
Epoch 8 Batch 3 Loss 1.8100
Epoch 8 Batch 4 Loss 1.5442
Epoch 8 Batch 5 Loss 1.4713
Epoch 8 Batch 6 Loss 1.6918
Epoch 8 Batch 7 Loss 1.6335
Epoch 8 Batch 8 Loss 1.6160
Epoch 8 Batch 9 Loss 1.3867
Epoch 8 Loss 1.5816
Time taken for 1 epoch 122.38609957695007 sec

Epoch 9 Batch 0 Loss 1.6677
Epoch 9 Batch 1 Loss 1.4375
Epoch 9 Batch 2 Loss 1.3425
Epoch 9 Batch 3 Loss 1.4567
Epoch 9 Batch 4 Loss 1.3913
Epoch 9 Batch 5 Loss 1.4414
Epoch 9 Batch 6 Loss 1.4123
Epoch 9 Batch 7 Loss 1.4082
Epoch 9 Batch 8 Loss 1.6062
Epoch 9 Batch 9 Loss 1.6140
Epoch 9 Loss 1.4778
Time taken for 1 epoch 121.9051079750061 sec

Epoch 10 Batch 0 Loss 1.1795
Epoch 10 Batch 1 Loss 1.2038
Epoch 10 Batch 2 Loss 1.4458
Epoch 10 Batch 3 Loss 1.4928
Epoch 10 Batch 4 Loss 1.4383
Epoch 10 Batch 5 Loss 1.2833
Epoch 10 Batch 6 Loss 1.6028
Epoch 10 Batch 7 Loss 1.4659
Epoch 10 Batch 8 Loss 1.3429
Epoch 10 Batch 9 Loss 1.4269
Epoch 10 Loss 1.3882
Time taken for 1 epoch 122.23126912117004 sec

7.预测评估

def evaluate(sentence):
    attention_plot = np.zeros((max_length_targ, max_length_inp+2))

    inputs = preprocess_sentence(sentence,max_length_inp,vocab)

    inputs = tf.convert_to_tensor(inputs)

    result = ''
    
    hidden = [tf.zeros((1, units))]
    enc_out, enc_hidden = encoder(inputs, hidden)

    dec_hidden = enc_hidden
    
    dec_input = tf.expand_dims([vocab['']], 0)

    for t in range(max_length_targ):
        predictions, dec_hidden, attention_weights = decoder(dec_input,
                                                             dec_hidden,
                                                             enc_out)
        
        # storing the attention weights to plot later on
        attention_weights = tf.reshape(attention_weights, (-1, ))
        
        attention_plot[t] = attention_weights.numpy()
        predicted_id = tf.argmax(predictions[0]).numpy()

        result += reverse_vocab[predicted_id] + ' '
        if reverse_vocab[predicted_id] == '':
            return result, sentence, attention_plot

        # the predicted ID is fed back into the model
        dec_input = tf.expand_dims([predicted_id], 0)

    return result, sentence, attention_plot

def translate(sentence):
    result, sentence, attention_plot = evaluate(sentence)

    print('Input: %s' % (sentence))
    print('Predicted translation: {}'.format(result))

    attention_plot = attention_plot[:len(result.split(' ')), :len(sentence.split(' '))]
    plot_attention(attention_plot, sentence.split(' '), result.split(' '))

sentence='漏机油具体部位发动机变速器正中间位置拍中间上面上已经看见'

translate(sentence)

Input: 漏机油 具体 部位 发动机 变速器 正中间 位置 拍 中间 上面 上 已经 看见
Predicted translation: 这种 情况 需要 更换

根据attention weights作图：

8.将Encoder-attention-Decoder构建成Baseline Seq2Seq结构


class Seq2Seq(tf.keras.Model):
    def __init__(self, params):
        super(Seq2Seq, self).__init__()
        self.embedding_matrix = load_embedding_matrix()
        self.params = params
        self.encoder = Encoder(params["vocab_size"],
                               params["embed_size"],
                               self.embedding_matrix,
                               params["enc_units"],
                               params["batch_size"])

        self.attention = BahdanauAttention(params["attn_units"])

        self.decoder = Decoder(params["vocab_size"],
                               params["embed_size"],
                               self.embedding_matrix,
                               params["dec_units"],
                               params["batch_size"])

 
  
    def call(self, dec_input, dec_hidden, enc_output, dec_target):
        predictions = []
        attentions = []

        context_vector, _ = self.attention(dec_hidden, enc_output)

        for t in range(1, dec_target.shape[1]):
            _, pred, dec_hidden = self.decoder(dec_input,
                                               dec_hidden,
                                               enc_output,
                                               context_vector)

            context_vector, attn = self.attention(dec_hidden, enc_output)
            # using teacher forcing
            dec_input = tf.expand_dims(dec_target[:, t], 1)

            predictions.append(pred)
            attentions.append(attn)

        return tf.stack(predictions, 1), dec_hidden

9.存在的问题和改进思路：

存在的问题：

1.存在UNK未登录词，导致出现OOV问题；

2.生成的句子存在重复文本的现象

改进思路：

参考论文指针生成网络的pointer generater 和coverage机制

1.pointer generater

Seq2Seq+point

混合了 Baseline seq2seq和PointerNetwork的网络，它具有Baseline seq2seq的生成能力和PointerNetwork的Copy能力。如何权衡一个词应该是生成的还是复制的？原文中引入了一个权重pgen

在Seq2Seq baseline基础上，加上Pointer指针部分记录pgen

class Pointer(tf.keras.layers.Layer):

    def __init__(self):
        super(Pointer, self).__init__()
        self.w_s_reduce = tf.keras.layers.Dense(1)
        self.w_i_reduce = tf.keras.layers.Dense(1)
        self.w_c_reduce = tf.keras.layers.Dense(1)

    def __call__(self, context_vector, dec_hidden, dec_inp):
        return tf.nn.sigmoid(self.w_s_reduce(dec_hidden) + self.w_c_reduce(context_vector) + self.w_i_reduce(dec_inp))

从Baseline seq2seq的模型结构中得到了St和ℎ∗ht∗，和解码器输入 xt 一起来计算 pgen ：

context vector ℎ∗ht∗
decoder input xt
the decoder state

Final_dists

这时，会扩充单词表形成一个更大的单词表--扩充单词表(将原文当中的单词也加入到其中)，该时间步的预测词概率为：

其中 ait 表示的是原文档中的词。我们可以看到解码器一个词的输出概率有其是否拷贝是否生成的概率和决定。当一个词不出现在常规的单词表上时()Pvocab(w) 为0.

这里值得注意的是，如果w是词汇表外（oov）单词，则 Pvocab(w) 为零；。产生OOV单词的能力是指针生成器模型的主要优势之一；通过对比模型，如baseline仅限于其预设词汇，而指针网络则有复制能力，不再局限于预设词汇表 Pvocab.

2.coverage attention

将先前时间步的注意力权重加到一起得到所谓的覆盖向量 ct (coverage vector)，用先前的注意力权重决策来影响当前注意力权重的决策，这样就避免在同一位置重复，从而避免重复生成文本。计算上，先计算coverage vector ct

ct就是一个长度为输入长度的向量
第一项是之前时刻输入第一个词attention权重的叠加和
加这个参数的目的是为了给attention之前生成词的信息，如果之前生成过这些词那么后续要抑制。抑制通过loss函数加惩罚项实现.

Ct 是对源文档单词的分布，它表示到目前为止这些单词从关注机制接收到的覆盖程度。注意，c0 是一个零向量，因为在第一个时间步骤中，没有覆盖任何源文档。覆盖向量用作注意机制，将方程式（1）改为：

改造BahdanauAttention:

class BahdanauAttention(tf.keras.layers.Layer):
    def __init__(self, units):
        super(BahdanauAttention, self).__init__()
        self.W_s = tf.keras.layers.Dense(units)
        self.W_h = tf.keras.layers.Dense(units)
        self.W_c = tf.keras.layers.Dense(units)
        self.V = tf.keras.layers.Dense(1)

    def call(self, dec_hidden, enc_output, enc_pad_mask, use_coverage, prev_coverage):
        # query为上次的GRU隐藏层
        # values为编码器的编码结果enc_output
        # 在seq2seq模型中，St是后面的query向量，而编码过程的隐藏状态hi是values。

        # hidden shape == (batch_size, hidden size)
        # hidden_with_time_axis shape == (batch_size, 1, hidden size)
        # we are doing this to perform addition to calculate the score
        hidden_with_time_axis = tf.expand_dims(dec_hidden, 1)

        if use_coverage and prev_coverage is not None:
            # self.W_s(values) [batch_sz, max_len, units] self.W_h(hidden_with_time_axis) [batch_sz, 1, units]
            # self.W_c(prev_coverage) [batch_sz, max_len, units]  score [batch_sz, max_len, 1]
            score = self.V(tf.nn.tanh(self.W_s(enc_output) + self.W_h(hidden_with_time_axis) + self.W_c(prev_coverage)))
            # attention_weights shape (batch_size, max_len, 1)

            mask = tf.cast(enc_pad_mask, dtype=score.dtype)
            masked_score = tf.squeeze(score, axis=-1) * mask
            masked_score = tf.expand_dims(masked_score, axis=2)

            attention_weights = tf.nn.softmax(masked_score, axis=1)
            coverage = attention_weights + prev_coverage
        else:
            # score shape == (batch_size, max_length, 1)
            # we get 1 at the last axis because we are applying score to self.V
            # the shape of the tensor before applying self.V is (batch_size, max_length, units)
            # 计算注意力权重值
            score = self.V(tf.nn.tanh(
                self.W_s(enc_output) + self.W_h(hidden_with_time_axis)))

            mask = tf.cast(enc_pad_mask, dtype=score.dtype)
            masked_score = tf.squeeze(score, axis=-1) * mask
            masked_score = tf.expand_dims(masked_score, axis=2)

            attention_weights = tf.nn.softmax(masked_score, axis=1)
            # attention_weights = masked_attention(attention_weights)
            if use_coverage:
                coverage = attention_weights

        # attention_weights sha== (batch_size, max_length, 1)
        attention_weights = tf.nn.softmax(score, axis=1)

        # # 使用注意力权重*编码器输出作为返回值，将来会作为解码器的输入
        # context_vector shape after sum == (batch_size, hidden_size)
        context_vector = attention_weights * enc_output
        context_vector = tf.reduce_sum(context_vector, axis=1)

        return context_vector,attention_weights, coverage

其中 Wc 是和v长度一样的学习参数。这确保注意力机制当前的决定（选择下一个注意点）通过提醒前一个决定而得到通知。即在下一次选择之前，应用了以前注意力的信息，这应该使注意力机制更容易避免重复关注同一地点，从而避免生成重复文本。定义一个coverage loss 来对同一地点的重复惩罚:

def mask_coverage_loss(attn_dists, coverages, padding_mask):
    """
    Calculates the coverage loss from the attention distributions.
      Args:
        attn_dists coverages: [max_len_y, batch_sz, max_len_x, 1]
        padding_mask: shape (batch_size, max_len_y).
      Returns:
        coverage_loss: scalar
    """
    cover_losses = []
    # transfer attn_dists coverages to [max_len_y, batch_sz, max_len_x]
    attn_dists = tf.squeeze(attn_dists, axis=3)
    coverages = tf.squeeze(coverages, axis=3)

    
    for t in range(attn_dists.shape[0]):
        cover_loss_ = tf.reduce_sum(tf.minimum(attn_dists[t, :, :], coverages[t, :, :]), axis=-1)  # max_len_x wise
        cover_losses.append(cover_loss_)
    
    # change from[max_len_y, batch_sz] to [batch_sz, max_len_y]
    cover_losses = tf.stack(cover_losses, 1)

    # cover_loss_ [batch_sz, max_len_y]
    mask = tf.cast(padding_mask, dtype=cover_loss_.dtype)
    cover_losses *= mask
    
    # mean loss of each time step and then sum up
    loss = tf.reduce_sum(tf.reduce_mean(cover_losses, axis=0))  
    tf.print('coverage loss(batch sum):', loss)
    return loss

coverage loss是有边界的,我们假设应该有一个大致一对一的转换率；因此，如果最终覆盖向量大于或小于1，则会受到惩罚。我们的损失函数更灵活：因为总结不需要统一的覆盖，我们只惩罚到目前为止每个注意力分布和覆盖之间的重叠-防止重复的注意力。最后，在主损失函数中加入由超参数λ重新加权的coverage loss，得到一个新的复合损失函数：

你可能感兴趣的:(自然语言处理,tensorflow)

深度学习模型开发文档 Ares代码行者深度学习
深度学习模型开发文档1.简介2.深度学习模型开发流程3.数据准备3.1数据加载3.2数据可视化4.构建卷积神经网络(CNN)5.模型训练5.1定义损失函数和优化器5.2训练过程6.模型评估与优化6.1模型评估6.2超参数调优7.模型部署8.总结参考资料1.简介深度学习是人工智能的一个分支，利用多层神经网络从数据中提取特征并进行学习。它被广泛应用于图像识别、自然语言处理、语音识别等领域。本文将以构建
多头潜在注意力（MLA）是怎么来的，什么原理，能用简单的示例解释么百态老人学习
多头潜在注意力（Multi-HeadLatentAttention，简称MLA）是一种改进的注意力机制，旨在提高自然语言处理（NLP）模型的推理效率和性能。其核心思想是通过低秩联合压缩键（Key）和值（Value），减少推理过程中所需的内存和计算资源，从而实现更高效的处理。MLA的原理在传统的多头注意力机制（MHA）中，每个输入token的键和值需要被缓存，这导致了巨大的内存开销。具体来说，对于每
AI人工智能深度学习算法：高并发场景下深度学习代理的性能调优 AI天才研究院计算 AI大模型企业级应用开发实战 ChatGPT 计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍1.1深度学习代理的兴起近年来，随着人工智能技术的飞速发展，深度学习在各个领域都取得了显著的成果。特别是在自然语言处理、图像识别、语音识别等领域，深度学习模型的性能已经超越了传统方法。为了更好地将深度学习技术应用于实际场景，深度学习代理应运而生。深度学习代理是一种将深度学习模型封装起来，并提供对外接口的服务。它可以接收来自客户端的请求，将请求数据输入到深度学习模型中进行推理，并将推理结
【深度学习基础】线性神经网络 | softmax回归的简洁实现 Francek Chen PyTorch深度学习深度学习神经网络回归 softmax 人工智能
【作者主页】FrancekChen【专栏介绍】⌈⌈⌈PyTorch深度学习⌋⌋⌋深度学习(DL,DeepLearning)特指基于深层神经网络模型和方法的机器学习。它是在统计机器学习、人工神经网络等算法模型基础上，结合当代大数据和大算力的发展而发展出来的。深度学习最重要的技术特征是具有自动提取特征的能力。神经网络算法、算力和数据是开展深度学习的三要素。深度学习在计算机视觉、自然语言处理、多模态数据
【人工智能】Python常用库-Keras：高阶深度学习 API IT古董深度学习人工智能 Python 人工智能 python 深度学习
Keras：高阶深度学习APIKeras是一个高效、用户友好的深度学习框架，作为TensorFlow的高级API，支持快速构建和训练深度学习模型。它以模块化、简单和灵活著称，适合研究和生产环境。Keras的发音为[ˈkerəs]，类似于“凯拉斯”或“克拉斯”。这个名字来源于希腊语κέρας(kéras)，意思是“角”或“角质物”。这个词与深度学习的灵感来源——大脑的神经网络结构有一定联系。Kera
深度学习从入门到精通：全面指南 AI天才研究院计算大数据AI人工智能 AI大模型企业级应用开发实战 java python javascript kotlin golang 架构人工智能大厂程序员硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM 系统架构设计软件哲学 Agent 程序员实现财富自由
《深度学习从入门到精通：全面指南》文章目录《深度学习从入门到精通：全面指南》文章关键词文章摘要引言第一部分：深度学习基础入门第1章：深度学习概述1.1深度学习的基本概念1.2深度学习的发展历程1.3深度学习的基本原理神经网络前向传播反向传播第2章：深度学习框架入门2.1TensorFlow入门TensorFlow环境搭建TensorFlow基本数据结构2.2PyTorch入门PyTorch环境搭建
【Python】已解决ModuleNotFoundError: No module named ‘tensorflow‘ 屿小夏 python tensorflow neo4j
个人简介：某不知名博主，致力于全栈领域的优质博客分享|用最优质的内容带来最舒适的阅读体验！文末获取免费IT学习资料！文末获取更多信息精彩专栏推荐订阅收藏专栏系列直达链接相关介绍书籍分享点我跳转书籍作为获取知识的重要途径，对于IT从业者来说更是不可或缺的资源。不定期更新IT图书，并在评论区抽取随机粉丝，书籍免费包邮到家AI前沿点我跳转探讨人工智能技术领域的最新发展和创新，涵盖机器学习、深度学习、自然
【Python】已解决：ModuleNotFoundError: No module named ‘tensorflow‘ 屿小夏 python tensorflow neo4j
个人简介：某不知名博主，致力于全栈领域的优质博客分享|用最优质的内容带来最舒适的阅读体验！文末获取免费IT学习资料！文末获取更多信息精彩专栏推荐订阅收藏专栏系列直达链接相关介绍书籍分享点我跳转书籍作为获取知识的重要途径，对于IT从业者来说更是不可或缺的资源。不定期更新IT图书，并在评论区抽取随机粉丝，书籍免费包邮到家AI前沿点我跳转探讨人工智能技术领域的最新发展和创新，涵盖机器学习、深度学习、自然
Transformer模型全面解析：工作原理、应用与未来展望* 泰山AI AI大模型应用开发 transformer
概述：深入探讨Transformer模型的工作原理，分析其在NLP领域的应用场景，并展望其未来发展趋势。本文为您提供关于Transformer模型的全面指南。正文Transformer模型全面解析：工作原理、应用与未来展望在人工智能的浪潮中，Transformer模型以其强大的性能和广泛的应用场景，成为了自然语言处理（NLP）领域的一颗璀璨明星。本文将对Transformer模型进行深入剖析，从工
一切皆是映射：Transformer架构全面解析 AI天才研究院计算大数据AI人工智能计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
背景介绍自2017年，Transformer（自注意力机制）架构的问世以来，它已经成为自然语言处理（NLP）领域的主流技术之一。Transformer架构的出现，使得自然语言处理的任务变得更加简单、高效，同时也为许多其他领域提供了灵感。通过深入剖析Transformer，我们可以更好地理解其核心概念、原理和实际应用场景。这篇文章将全面解析Transformer架构，从核心概念到实际应用，帮助读者深
大语言模型原理基础与前沿指令生成 AI大模型应用之禅计算机软件编程原理与应用实践 java python javascript kotlin golang 架构人工智能
大语言模型、指令生成、Transformer、BERT、GPT、LLaMA、Fine-tuning、PromptEngineering1.背景介绍近年来，大语言模型（LargeLanguageModels，LLMs）在自然语言处理领域取得了令人瞩目的成就。从文本生成、翻译到问答和代码编写，LLMs展现出强大的能力，深刻地改变了我们与语言交互的方式。指令生成作为LLMs应用的重要方向之一，旨在通过明
使用ChatLlamaCpp和Llama CPP Python进行聊天模型集成 AWsggdrg llama python 开发语言
在这篇文章中，我们将探讨如何使用ChatLlamaCpp和LlamaCPPPython库来搭建一个强大的聊天模型。我们将详细讲解如何进行模型集成，并提供完整的代码示例以帮助您快速上手。技术背景介绍随着自然语言处理技术的不断发展，越来越多的应用需要集成复杂的聊天模型以提高交互能力。ChatLlamaCpp是一个基于LlamaCPPPython库构建的聊天模型，支持多种功能调用和结构化输出，非常适合用
2.6 聚焦：Word Embedding 少林码僧 AI大模型应用实战专栏 word embedding
聚焦：WordEmbeddingWordEmbedding（词嵌入）是一种将词语转化为低维向量表示的技术，使得词语在数学空间中具有语义上的相似性。它是自然语言处理（NLP）中不可或缺的一部分，为文本数据提供了强大的表示能力。与传统的基于词频的词袋模型（Bag-of-Words）相比，WordEmbedding能够捕捉到词语之间更深层的语义和上下文信息。1.词嵌入的定义与作用WordEmbeddin
Transformer入门（1）transformer及其编码器-解码器通信仿真实验室 Google BERT 构建和训练NLP模型 bert transformer 人工智能 NLP 自然语言处理
文章目录1.Transformer简介2.Transformer的编码器-解码器架构3.transformer的编码器1.Transformer简介Transformer模型是一种用于自然语言处理的机器学习模型，它在2017年由Google的研究者提出，并在论文《AttentionisAllYouNeed》中详细描述。Transformer模型的核心创新在于其采用了自注意力（self-attent
Transformer架构原理详解：编码器（Encoder）和解码器（Decoder） AI大模型应用之禅 AI大模型与大数据 java python javascript kotlin golang 架构人工智能
Transformer,编码器,解码器,自注意力机制,多头注意力,位置编码,序列到序列,自然语言处理1.背景介绍近年来，深度学习在自然语言处理（NLP）领域取得了显著进展，其中Transformer架构扮演着至关重要的角色。自2017年谷歌发布了基于Transformer的机器翻译模型BERT以来，Transformer及其变体在各种NLP任务上取得了突破性的成果，例如文本分类、问答系统、文本摘要
深入解析如何进行TensorFlow框架下的算子开发与适配插件开发：基于昇腾AI的完整流程快撑死的鱼华为昇腾 Ascend C的算子开发系统学习人工智能 tensorflow python
深入解析如何进行TensorFlow框架下的算子开发与适配插件开发：基于昇腾AI的完整流程在人工智能领域中，算子（Operator）作为深度学习模型的基础执行单元，决定了整个模型的计算性能和结果准确性。随着硬件平台的多样化，如何将第三方深度学习框架中的算子适配到特定的硬件平台变得至关重要。本文将深入探讨如何在TensorFlow框架下开发适配昇腾AI处理器的算子插件，通过解析算子属性映射、数据排布
深入解析框架适配开发：基于CANN平台的自定义算子开发与第三方框架适配全流程详解快撑死的鱼华为昇腾 Ascend C的算子开发系统学习人工智能
深入解析框架适配开发：基于CANN平台的自定义算子开发与第三方框架适配全流程详解随着深度学习的发展，不同的深度学习框架如TensorFlow、PyTorch、ONNX等在AI开发者社区中占据了重要地位。然而，针对某些硬件平台（如华为昇腾AI处理器），算子库中的算子并非都已经适配了所有主流框架。为了解决这一问题，框架适配开发应运而生，它允许开发者将已存在于算子库中的算子适配到其他未支持的第三方框架上
有趣的python代码实例_Python之路：200个Python有趣的小例子一网打尽 weixin_39845406 有趣的python代码实例
概述博主最近在学习python，看完了一整套学习视频，然后呃呃呃，还是用不太流畅。碰巧在全球最大的同性交友论坛GayHub(呸！是开源代码托管平台Github)上面发现了一个项目，该项目列举了200多个Python小例子，Python基础、Python坑点、Python字符串和正则、Python绘图、Python日期和文件、Web开发、数据科学、机器学习、深度学习、TensorFlow、Pytor
使用Amazon Bedrock API调用Anthropic的Claude模型 dwa46a56w4d easyui 前端 javascript python
在AI模型应用领域，亚马逊的BedrockAPI提供了便捷的方式来访问诸如Anthropic的Claude这样强大的模型。通过AmazonBedrock，开发者可以在云端直接调用Claude进行各种自然语言处理任务。本指南将引导您如何在Python中通过Bedrock来使用Claude模型。技术背景介绍Claude是由Anthropic开发的AI模型，提供强大的自然语言处理能力。通过AmazonB
PyTorch 基础数据集：从理论到实践的深度学习基石那年一路北 Pytorch理论+实践深度学习 pytorch 人工智能
一、引言深度学习作为当今人工智能领域的核心技术，在图像识别、自然语言处理、语音识别等众多领域取得了令人瞩目的成果。而在深度学习的体系中，数据扮演着举足轻重的角色，它是模型训练的基础，如同建筑的基石，决定了模型的性能和泛化能力。PyTorch作为当下最流行的深度学习框架之一，为开发者提供了丰富且强大的工具来处理数据集。本文将深入探讨PyTorch中的基础数据集，从深度学习中数据的重要性出发，详细介绍
matlab程序代编程写做代码图像处理BP神经网络机器深度学习python matlabgoodboy 深度学习 matlab 图像处理
1.安装必要的库首先，确保你已经安装了必要的Python库。如果没有安装，请运行以下命令：bash复制代码pipinstallnumpymatplotlibtensorflowopencv-python2.图像预处理我们将使用OpenCV来加载和预处理图像数据。假设你有一个图像数据集，每个类别的图像存放在单独的文件夹中。python复制代码importosimportcv2importnumpya
使用 LangChain 构建多PDF文档聊天应用 jkgSFS langchain pdf easyui python
随着大型语言模型（LLMs）的普及，如何将它们应用于文档处理成为了热门话题之一。本文将通过一个教程，展示如何使用LangChain构建一个能够处理多个PDF文档并与之对话的应用。技术背景介绍LangChain是一个广受欢迎的库，能够帮助开发者轻松地与LLMs和不同的嵌入技术进行整合。它提供了方便的接口和工具，使得复杂的自然语言处理任务变得简单高效。核心原理解析我们将利用LangChain来读取多个
深度解析：Python与TensorFlow在日平均气温预测中的应用——LSTM神经网络实战 AI_DL_CODE python 神经网络 tensorflow LSTM 气温预测 RNN
文章目录1.引言1.1研究背景与意义1.2研究目标与问题定义2.概念解析2.1Python语言简介2.2TensorFlow框架概述2.3LSTM神经网络原理3.原理详解3.1时间序列分析基础3.1.1时间序列的组成3.1.2时间序列分析方法3.2LSTM在时间序列分析中的应用3.2.1LSTM的优势3.2.2LSTM的结构3.3日平均气温预测的数学模型3.3.1ARIMA模型3.3.2LSTM模
大模型的RAG微调与Agent：提升智能代理的效率与效果 WeeJot 人工智能人工智能
目录编辑引言RAG模型概述检索阶段生成阶段RAG模型的微调数据集选择损失函数设计微调策略超参数调整RAG模型在智能代理中的应用客户服务信息检索内容创作决策支持：结论引言在人工智能的快速发展中，大型预训练模型（LLMs）已经成为推动技术进步的关键力量。这些模型通过在海量数据上的预训练，掌握了丰富的语言知识和模式识别能力，从而在多种自然语言处理任务上展现出卓越的性能。然而，预训练模型的通用性也意味着它
ChatGPT 绘图的工作原理
ChatGPT的绘图功能结合了自然语言处理（NLP）和图像生成的技术，这种综合能力依赖于预训练模型（如GPT-4）和图像生成模型（如DALL-E）之间的紧密协作。ChatGPT本质上是一个大规模的语言模型，但通过与图像生成模型集成，它得以执行基于描述生成图像的任务。接下来，我们将从模型架构、训练方法、推理机制和一些技术挑战等方面，详细讨论ChatGPT进行绘图的工作原理。
深度解析智能问答系统：如何打造精准、高效的AI对话架构？和老莫一起学AI 人工智能架构自然语言处理产品经理语言模型学习 ai
在人工智能的飞速发展中，智能问答系统（QA系统）逐渐成为了企业内部管理、客户服务、搜索引擎等多个领域中的关键技术。今天，我们将深入探讨一个基于大模型、自然语言处理、知识检索的智能问答系统的架构，详细介绍其技术原理、流程以及未来应用前景。一、系统整体概览在这个智能问答系统中，整个流程可以大致划分为两大部分：前端问答生成与后端离线数据处理。前端部分是用户交互的核心，通过用户的输入、关键词提取、检索和问
AI行业高压与人才健康：纪念Felix Hill，并探讨AI代码生成工具的价值前端
今天，我们怀着沉痛的心情悼念GoogleDeepMind研究科学家FelixHill，这位杰出的AI学者在41岁的年纪离开了我们。他的离世引发了我们对AI行业高压环境与人才健康问题的深刻反思。Felix生前曾公开表达AI行业前所未有的压力，这促使我们思考如何利用技术，例如AI代码生成器，来改善开发者的工作环境，提升效率，守护人才健康。FelixHill在自然语言处理和人工智能领域取得了令人瞩目的成
AI代码生成工具的未来：杨立昆的洞见与AI革命前端
近年来，人工智能（AI）领域取得了令人瞩目的进展，特别是以大型语言模型为代表的AI技术，在自然语言处理、图像生成等领域展现出强大的能力。然而，深度学习先驱杨立昆（YannLeCun）却对现有的AI系统提出了尖锐的批评，他认为目前的AI系统“理解能力远不如猫”，缺乏对真实世界的理解和常识。这引发了人们对AI未来发展方向的思考，也为我们探讨AI代码生成工具，以及AI技术对人类社会的影响提供了新的视角。
未来教育：AI知识库如何重塑学习体验知识管理知识库知识库软件
在科技日新月异的今天，教育领域正经历着前所未有的变革。人工智能（AI）技术的快速发展，特别是AI知识库的广泛应用，正在重塑我们的学习体验，使之变得更加高效、个性化和智能化。本文将深入探讨AI知识库如何影响未来教育，以及它如何为学习者提供前所未有的学习体验。一、AI知识库：教育领域的智能助手AI知识库，作为结合了人工智能技术的知识管理系统，不仅能够存储和处理海量信息，还能通过自然语言处理、机器学习等
2024 年技术盘点与展望：从 AI 辅助到个人成长的多元探索 109702008 杂谈人工智能
一、引言2024年，技术领域的发展日新月异，我在这片汹涌的浪潮中不断探索与成长。这一年，我不仅见证了人工智能技术的飞速发展，还通过AI辅助创作、AI赋能编程以及参与各类竞赛与课程，实现了个人技术的显著提升与视野的拓展。本文将从总结盘点的角度，回顾我在技术领域的成长历程，并对未来进行展望。二、AI辅助创作：提升写作效率与质量在自然语言处理技术（NLP）的推动下，AI写作工具成为了我的得力助手。这些工
ViewController添加button按钮解析。（翻译）张亚雄 c
<div class="it610-blog-content-contain" style="font-size: 14px"></div>// ViewController.m // Reservation software // // Created by 张亚雄 on 15/6/2.
mongoDB 简单的增删改查开窍的石头 mongodb
在上一篇文章中我们已经讲了mongodb怎么安装和数据库/表的创建。在这里我们讲mongoDB的数据库操作在mongo中对于不存在的表当你用db.表名他会自动统计下边用到的user是表明，db代表的是数据库添加(insert):
log4j配置 0624chenhong log4j
1) 新建java项目 2) 导入jar包，项目右击，properties—java build path—libraries—Add External jar，加入log4j.jar包。 3) 新建一个类com.hand.Log4jTest package com.hand; import org.apache.log4j.Logger; public class
多点触摸(图片缩放为例) 不懂事的小屁孩多点触摸
多点触摸的事件跟单点是大同小异的，上个图片缩放的代码，供大家参考一下 import android.app.Activity; import android.os.Bundle; import android.view.MotionEvent; import android.view.View; import android.view.View.OnTouchListener
有关浏览器窗口宽度高度几个值的解析换个号韩国红果果 JavaScript html
1 元素的 offsetWidth 包括border padding content 整体的宽度。 clientWidth 只包括内容区 padding 不包括border。 clientLeft = offsetWidth -clientWidth 即这个元素border的值 offsetLeft 若无已定位的包裹元素
数据库产品巡礼：IBM DB2概览蓝儿唯美 db2
IBM DB2是一个支持了NoSQL功能的关系数据库管理系统，其包含了对XML，图像存储和Java脚本对象表示（JSON）的支持。DB2可被各种类型的企业使用，它提供了一个数据平台，同时支持事务和分析操作，通过提供持续的数据流来保持事务工作流和分析操作的高效性。 DB2支持的操作系统 DB2可应用于以下三个主要的平台: 工作站，DB2可在Linus、Unix、Windo
java笔记5 a-john java
控制执行流程： 1，true和false 利用条件表达式的真或假来决定执行路径。例：（a==b）。它利用条件操作符“==”来判断a值是否等于b值，返回true或false。java不允许我们将一个数字作为布尔值使用，虽然这在C和C++里是允许的。如果想在布尔测试中使用一个非布尔值，那么首先必须用一个条件表达式将其转化成布尔值，例如if(a!=0)。 2，if-els
Web开发常用手册汇总 aijuans PHP
一门技术，如果没有好的参考手册指导,很难普及大众。这其实就是为什么很多技术，非常好，却得不到普遍运用的原因。正如我们学习一门技术，过程大概是这个样子： ①我们日常工作中，遇到了问题，困难。寻找解决方案，即寻找新的技术； ②为什么要学习这门技术？这门技术是不是很好的解决了我们遇到的难题，困惑。这个问题，非常重要，我们不是为了学习技术而学习技术，而是为了更好的处理我们遇到的问题，才需要学习新的
今天帮助人解决的一个sql问题 asialee sql
今天有个人问了一个问题，如下： type AD value A
意图对象传递数据百合不是茶 android 意图Intent Bundle对象数据的传递
学习意图将数据传递给目标活动; 初学者需要好好研究的 1,将下面的代码添加到main.xml中 <?xml version="1.0" encoding="utf-8"?> <LinearLayout xmlns:android="http:/
oracle查询锁表解锁语句 bijian1013 oracle object session kill
一.查询锁定的表如下语句，都可以查询锁定的表语句一： select a.sid, a.serial#, p.spid, c.object_name, b.session_id, b.oracle_username, b.os_user_name from v$process p, v$s
mac osx 10.10 下安装 mysql 5.6 二进制文件［tar.gz］征客丶 mysql osx
场景：在 mac osx 10.10 下安装 mysql 5.6 的二进制文件。环境：mac osx 10.10、mysql 5.6 的二进制文件步骤：[所有目录请从根“/”目录开始取，以免层级弄错导致找不到目录] 1、下载 mysql 5.6 的二进制文件，下载目录下面称之为 mysql5.6SourceDir；下载地址：http://dev.mysql.com/downl
分布式系统与框架 bit1129 分布式
RPC框架 Dubbo 什么是Dubbo Dubbo是一个分布式服务框架，致力于提供高性能和透明化的RPC远程服务调用方案，以及SOA服务治理方案。其核心部分包含: 远程通讯: 提供对多种基于长连接的NIO框架抽象封装，包括多种线程模型，序列化，以及“请求-响应”模式的信息交换方式。集群容错: 提供基于接
那些令人蛋痛的专业术语白糖_ spring Web SSO IOC
spring 【控制反转(IOC)/依赖注入(DI)】：由容器控制程序之间的关系，而非传统实现中，由程序代码直接操控。这也就是所谓“控制反转”的概念所在：控制权由应用代码中转到了外部容器，控制权的转移，是所谓反转。简单的说：对象的创建又容器(比如spring容器)来执行，程序里不直接new对象。 Web 【单点登录(SSO)】：SSO的定义是在多个应用系统中，用户
《给大忙人看的java8》摘抄 braveCS java8
函数式接口：只包含一个抽象方法的接口 lambda表达式：是一段可以传递的代码你最好将一个lambda表达式想象成一个函数，而不是一个对象，并记住它可以被转换为一个函数式接口。事实上，函数式接口的转换是你在Java中使用lambda表达式能做的唯一一件事。方法引用：又是要传递给其他代码的操作已经有实现的方法了，这时可以使
编程之美-计算字符串的相似度 bylijinnan java 算法编程之美
public class StringDistance { /** * 编程之美计算字符串的相似度 * 我们定义一套操作方法来把两个不相同的字符串变得相同，具体的操作方法为： * 1.修改一个字符（如把“a”替换为“b”）; * 2.增加一个字符（如把“abdd”变为“aebdd”）; * 3.删除一个字符（如把“travelling”变为“trav
上传、下载压缩图片 chengxuyuancsdn 下载
/** * * @param uploadImage --本地路径(tomacat路径) * @param serverDir --服务器路径 * @param imageType --文件或图片类型 * 此方法可以上传文件或图片.txt,.jpg,.gif等 */ public void upload(String uploadImage,Str
bellman-ford(贝尔曼-福特)算法 comsci 算法 F#
Bellman-Ford算法(根据发明者 Richard Bellman 和 Lester Ford 命名)是求解单源最短路径问题的一种算法。单源点的最短路径问题是指：给定一个加权有向图G和源点s，对于图G中的任意一点v，求从s到v的最短路径。有时候这种算法也被称为 Moore-Bellman-Ford 算法，因为 Edward F. Moore zu 也为这个算法的发展做出了贡献。与迪科
oracle ASM中ASM_POWER_LIMIT参数 daizj ASM oracle ASM_POWER_LIMIT 磁盘平衡
ASM_POWER_LIMIT 该初始化参数用于指定ASM例程平衡磁盘所用的最大权值，其数值范围为0~11，默认值为1。该初始化参数是动态参数，可以使用ALTER SESSION或ALTER SYSTEM命令进行修改。示例如下： SQL>ALTER SESSION SET Asm_power_limit=2;
高级排序:快速排序 dieslrae 快速排序
public void quickSort(int[] array){ this.quickSort(array, 0, array.length - 1); } public void quickSort(int[] array,int left,int right){ if(right - left <= 0
C语言学习六指针_何谓变量的地址一个指针变量到底占几个字节 dcj3sjt126com C语言
# include <stdio.h> int main(void) { /* 1、一个变量的地址只用第一个字节表示 2、虽然他只使用了第一个字节表示，但是他本身指针变量类型就可以确定出他指向的指针变量占几个字节了 3、他都只存了第一个字节地址，为什么只需要存一个字节的地址，却占了4个字节，虽然只有一个字节，但是这些字节比较多，所以编号就比较大，
phpize使用方法 dcj3sjt126com PHP
phpize是用来扩展php扩展模块的，通过phpize可以建立php的外挂模块,下面介绍一个它的使用方法,需要的朋友可以参考下安装（fastcgi模式）的时候，常常有这样一句命令：代码如下: /usr/local/webserver/php/bin/phpize 一、phpize是干嘛的？ phpize是什么？ phpize是用来扩展php扩展模块的，通过phpi
Java虚拟机学习 - 对象引用强度 shuizhaosi888 JAVA虚拟机
本文原文链接：http://blog.csdn.net/java2000_wl/article/details/8090276 转载请注明出处！无论是通过计数算法判断对象的引用数量，还是通过根搜索算法判断对象引用链是否可达，判定对象是否存活都与“引用”相关。引用主要分为：强引用(Strong Reference)、软引用(Soft Reference)、弱引用(Wea
.NET Framework 3.5 Service Pack 1（完整软件包）下载地址 happyqing .net 下载 framework
Microsoft .NET Framework 3.5 Service Pack 1（完整软件包） http://www.microsoft.com/zh-cn/download/details.aspx?id=25150 Microsoft .NET Framework 3.5 Service Pack 1 是一个累积更新，包含很多基于 .NET Framewo
JAVA定时器的使用 jingjing0907 java timer 线程定时器
1、在应用开发中，经常需要一些周期性的操作，比如每5分钟执行某一操作等。对于这样的操作最方便、高效的实现方式就是使用java.util.Timer工具类。 privatejava.util.Timer timer; timer = newTimer(true); timer.schedule( newjava.util.TimerTask() { public void run()
Webbench 流浪鱼 webbench
首页下载地址 http://home.tiscali.cz/~cz210552/webbench.html Webbench是知名的网站压力测试工具，它是由Lionbridge公司（http://www.lionbridge.com）开发。 Webbench能测试处在相同硬件上，不同服务的性能以及不同硬件上同一个服务的运行状况。webbench的标准测试可以向我们展示服务器的两项内容：每秒钟相
第11章动画效果（中） onestopweb 动画
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
windows下制作bat启动脚本. sanyecao2314 java cmd 脚本 bat
java -classpath C:\dwjj\commons-dbcp.jar;C:\dwjj\commons-pool.jar;C:\dwjj\log4j-1.2.16.jar;C:\dwjj\poi-3.9-20121203.jar;C:\dwjj\sqljdbc4.jar;C:\dwjj\voucherimp.jar com.citsamex.core.startup.MainStart
Java进行RSA加解密的例子 tomcat_oracle java
加密是保证数据安全的手段之一。加密是将纯文本数据转换为难以理解的密文；解密是将密文转换回纯文本。　　数据的加解密属于密码学的范畴。通常，加密和解密都需要使用一些秘密信息，这些秘密信息叫做密钥，将纯文本转为密文或者转回的时候都要用到这些密钥。　　对称加密指的是发送者和接收者共用同一个密钥的加解密方法。　　非对称加密(又称公钥加密)指的是需要一个私有密钥一个公开密钥，两个不同的密钥的
Android_ViewStub 阿尔萨斯 ViewStub
public final class ViewStub extends View java.lang.Object android.view.View android.view.ViewStub 类摘要： ViewStub 是一个隐藏的，不占用内存空间的视图对象，它可以在运行时延迟加载布局资源文件。当 ViewSt