Scofield_Phil

TensorFlow RNN深度学习 BiLSTM+CRF 实现 sequence labeling 序列标注源码

在TensorFlow RNN 深度学习下 BiLSTM+CRF 实现 sequence labeling

双向LSTM+CRF 序列标注问题

源码

去年底样子一直在做NLP相关task,是个关于序列标注问题。这 sequence labeling属于NLP的经典问题了，开始尝试用HMM，哦不，用CRF做baseline，by the way, 用的CRF++。

关于CRF的理论就不再啰嗦了，街货。顺便提下，CRF比HMM在理论上以及实际效果上都要好不少。但我要说的是CRF跑我这task还是不太乐观。P值0.6样子，R低的离谱，所以F1很不乐观。mentor告诉我说是特征不足，师兄说是这个task本身就比较难做，F1低算是正常了。

CRF做完baseline后，一直在着手用BiLSTM+CRF跑 sequence labeling，奈何项目繁多，没有多余的精力去按照正常的计划做出来。后来还是一点一点的，按照大牛们的步骤以及参考现有的代码，把 BiLSTM+CRF的实现拿下了。后来发现，跑出来的效果也不太理想……可能是这个task确实变态……抑或模型还要加强吧~

这里对比下CRF与LSTM的cell，先说RNN吧，RNN其实是比CNN更适合做序列问题的模型，RNN隐层当前时刻的输入有一部分是前一时刻的隐层输出，这使得他能通过循环反馈连接看到前面的信息，将一段序列的前面的context capture 过来参与此刻的计算，并且还具备非线性的拟合能力，这都是CRF无法超越的地方。而LSTM的cell很好的将RNN的梯度弥散问题优化解决了，他对门卫gate说：老兄，有的不太重要的信息，你该忘掉就忘掉吧，免得占用现在的资源。而双向LSTM就更厉害了，不仅看得到过去，还能将未来的序列考虑进来，使得上下文信息充分被利用。而CRF，他不像LSTM能够考虑长远的上下文信息，它更多地考虑整个句子的局部特征的线性加权组合(通过特征模板扫描整个句子)，特别的一点，他计算的是联合概率，优化了整个序列，而不是拼接每个时刻的最优值。那么，将BILSTM与CRF一起就构成了还比较不错的组合，这目前也是学术界的流行做法~

另外针对目前的跑通结果提几个改进点：

1.+CNN，通过CNN的卷积操作去提取英文单词的字母细节。

2.+char representation,作用与上相似，提取更细粒度的细节。

3.more joint model to go.

fine，叨了不少。codes time:

完整代码以及相关预处理的数据请移步github: scofiled's github/bilstm+crf

requirements:

ubuntu14

python2.7

tensorflow 0.8

numpy

pandas0.15

BILSTM_CRF.py

import math
import helper
import numpy as np
import tensorflow as tf
from tensorflow.models.rnn import rnn, rnn_cell

class BILSTM_CRF(object):
    
    def __init__(self, num_chars, num_classes, num_steps=200, num_epochs=100, embedding_matrix=None, is_training=True, is_crf=True, weight=False):
        # Parameter
        self.max_f1 = 0
        self.learning_rate = 0.002
        self.dropout_rate = 0.5
        self.batch_size = 128
        self.num_layers = 1   
        self.emb_dim = 100
        self.hidden_dim = 100
        self.num_epochs = num_epochs
        self.num_steps = num_steps
        self.num_chars = num_chars
        self.num_classes = num_classes
        
        # placeholder of x, y and weight
        self.inputs = tf.placeholder(tf.int32, [None, self.num_steps])
        self.targets = tf.placeholder(tf.int32, [None, self.num_steps])
        self.targets_weight = tf.placeholder(tf.float32, [None, self.num_steps])
        self.targets_transition = tf.placeholder(tf.int32, [None])
        
        # char embedding
        if embedding_matrix != None:
            self.embedding = tf.Variable(embedding_matrix, trainable=False, name="emb", dtype=tf.float32)
        else:
            self.embedding = tf.get_variable("emb", [self.num_chars, self.emb_dim])
        self.inputs_emb = tf.nn.embedding_lookup(self.embedding, self.inputs)
        self.inputs_emb = tf.transpose(self.inputs_emb, [1, 0, 2])
        self.inputs_emb = tf.reshape(self.inputs_emb, [-1, self.emb_dim])
        self.inputs_emb = tf.split(0, self.num_steps, self.inputs_emb)

        # lstm cell
        lstm_cell_fw = tf.nn.rnn_cell.BasicLSTMCell(self.hidden_dim)
        lstm_cell_bw = tf.nn.rnn_cell.BasicLSTMCell(self.hidden_dim)

        # dropout
        if is_training:
            lstm_cell_fw = tf.nn.rnn_cell.DropoutWrapper(lstm_cell_fw, output_keep_prob=(1 - self.dropout_rate))
            lstm_cell_bw = tf.nn.rnn_cell.DropoutWrapper(lstm_cell_bw, output_keep_prob=(1 - self.dropout_rate))

        lstm_cell_fw = tf.nn.rnn_cell.MultiRNNCell([lstm_cell_fw] * self.num_layers)
        lstm_cell_bw = tf.nn.rnn_cell.MultiRNNCell([lstm_cell_bw] * self.num_layers)

        # get the length of each sample
        self.length = tf.reduce_sum(tf.sign(self.inputs), reduction_indices=1)
        self.length = tf.cast(self.length, tf.int32)  
        
        # forward and backward
        self.outputs, _, _ = rnn.bidirectional_rnn(
            lstm_cell_fw, 
            lstm_cell_bw,
            self.inputs_emb, 
            dtype=tf.float32,
            sequence_length=self.length
        )
        
        # softmax
        self.outputs = tf.reshape(tf.concat(1, self.outputs), [-1, self.hidden_dim * 2])
        self.softmax_w = tf.get_variable("softmax_w", [self.hidden_dim * 2, self.num_classes])
        self.softmax_b = tf.get_variable("softmax_b", [self.num_classes])
        self.logits = tf.matmul(self.outputs, self.softmax_w) + self.softmax_b

        if not is_crf:
            pass
        else:
            self.tags_scores = tf.reshape(self.logits, [self.batch_size, self.num_steps, self.num_classes])
            self.transitions = tf.get_variable("transitions", [self.num_classes + 1, self.num_classes + 1])
            
            dummy_val = -1000
            class_pad = tf.Variable(dummy_val * np.ones((self.batch_size, self.num_steps, 1)), dtype=tf.float32)
            self.observations = tf.concat(2, [self.tags_scores, class_pad])

            begin_vec = tf.Variable(np.array([[dummy_val] * self.num_classes + [0] for _ in range(self.batch_size)]), trainable=False, dtype=tf.float32)
            end_vec = tf.Variable(np.array([[0] + [dummy_val] * self.num_classes for _ in range(self.batch_size)]), trainable=False, dtype=tf.float32) 
            begin_vec = tf.reshape(begin_vec, [self.batch_size, 1, self.num_classes + 1])
            end_vec = tf.reshape(end_vec, [self.batch_size, 1, self.num_classes + 1])

            self.observations = tf.concat(1, [begin_vec, self.observations, end_vec])

            self.mask = tf.cast(tf.reshape(tf.sign(self.targets),[self.batch_size * self.num_steps]), tf.float32)
            
            # point score
            self.point_score = tf.gather(tf.reshape(self.tags_scores, [-1]), tf.range(0, self.batch_size * self.num_steps) * self.num_classes + tf.reshape(self.targets,[self.batch_size * self.num_steps]))
            self.point_score *= self.mask
            
            # transition score
            self.trans_score = tf.gather(tf.reshape(self.transitions, [-1]), self.targets_transition)
            
            # real score
            self.target_path_score = tf.reduce_sum(self.point_score) + tf.reduce_sum(self.trans_score)
            
            # all path score
            self.total_path_score, self.max_scores, self.max_scores_pre  = self.forward(self.observations, self.transitions, self.length)
            
            # loss
            self.loss = - (self.target_path_score - self.total_path_score)
        
        # summary
        self.train_summary = tf.scalar_summary("loss", self.loss)
        self.val_summary = tf.scalar_summary("loss", self.loss)        
        
        self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss) 

    def logsumexp(self, x, axis=None):
        x_max = tf.reduce_max(x, reduction_indices=axis, keep_dims=True)
        x_max_ = tf.reduce_max(x, reduction_indices=axis)
        return x_max_ + tf.log(tf.reduce_sum(tf.exp(x - x_max), reduction_indices=axis))

    def forward(self, observations, transitions, length, is_viterbi=True, return_best_seq=True):
        length = tf.reshape(length, [self.batch_size])
        transitions = tf.reshape(tf.concat(0, [transitions] * self.batch_size), [self.batch_size, 6, 6])
        observations = tf.reshape(observations, [self.batch_size, self.num_steps + 2, 6, 1])
        observations = tf.transpose(observations, [1, 0, 2, 3])
        previous = observations[0, :, :, :]
        max_scores = []
        max_scores_pre = []
        alphas = [previous]
        for t in range(1, self.num_steps + 2):
            previous = tf.reshape(previous, [self.batch_size, 6, 1])
            current = tf.reshape(observations[t, :, :, :], [self.batch_size, 1, 6])
            alpha_t = previous + current + transitions
            if is_viterbi:
                max_scores.append(tf.reduce_max(alpha_t, reduction_indices=1))
                max_scores_pre.append(tf.argmax(alpha_t, dimension=1))
            alpha_t = tf.reshape(self.logsumexp(alpha_t, axis=1), [self.batch_size, 6, 1])
            alphas.append(alpha_t)
            previous = alpha_t           
            
        alphas = tf.reshape(tf.concat(0, alphas), [self.num_steps + 2, self.batch_size, 6, 1])
        alphas = tf.transpose(alphas, [1, 0, 2, 3])
        alphas = tf.reshape(alphas, [self.batch_size * (self.num_steps + 2), 6, 1])

        last_alphas = tf.gather(alphas, tf.range(0, self.batch_size) * (self.num_steps + 2) + length)
        last_alphas = tf.reshape(last_alphas, [self.batch_size, 6, 1])

        max_scores = tf.reshape(tf.concat(0, max_scores), (self.num_steps + 1, self.batch_size, 6))
        max_scores_pre = tf.reshape(tf.concat(0, max_scores_pre), (self.num_steps + 1, self.batch_size, 6))
        max_scores = tf.transpose(max_scores, [1, 0, 2])
        max_scores_pre = tf.transpose(max_scores_pre, [1, 0, 2])

        return tf.reduce_sum(self.logsumexp(last_alphas, axis=1)), max_scores, max_scores_pre        

    def train(self, sess, save_file, X_train, y_train, X_val, y_val):
        saver = tf.train.Saver()

        char2id, id2char = helper.loadMap("char2id")
        label2id, id2label = helper.loadMap("label2id")

        merged = tf.merge_all_summaries()
        summary_writer_train = tf.train.SummaryWriter('loss_log/train_loss', sess.graph)  
        summary_writer_val = tf.train.SummaryWriter('loss_log/val_loss', sess.graph)     
        
        num_iterations = int(math.ceil(1.0 * len(X_train) / self.batch_size))

        cnt = 0
        for epoch in range(self.num_epochs):
            # shuffle train in each epoch
            sh_index = np.arange(len(X_train))
            np.random.shuffle(sh_index)
            X_train = X_train[sh_index]
            y_train = y_train[sh_index]
            print "current epoch: %d" % (epoch)
            for iteration in range(num_iterations):
                # train
                X_train_batch, y_train_batch = helper.nextBatch(X_train, y_train, start_index=iteration * self.batch_size, batch_size=self.batch_size)
                y_train_weight_batch = 1 + np.array((y_train_batch == label2id['B']) | (y_train_batch == label2id['E']), float)
                transition_batch = helper.getTransition(y_train_batch)
                
                _, loss_train, max_scores, max_scores_pre, length, train_summary =\
                    sess.run([
                        self.optimizer, 
                        self.loss, 
                        self.max_scores, 
                        self.max_scores_pre, 
                        self.length,
                        self.train_summary
                    ], 
                    feed_dict={
                        self.targets_transition:transition_batch, 
                        self.inputs:X_train_batch, 
                        self.targets:y_train_batch, 
                        self.targets_weight:y_train_weight_batch
                    })

                predicts_train = self.viterbi(max_scores, max_scores_pre, length, predict_size=self.batch_size)
                if iteration % 10 == 0:
                    cnt += 1
                    precision_train, recall_train, f1_train = self.evaluate(X_train_batch, y_train_batch, predicts_train, id2char, id2label)
                    summary_writer_train.add_summary(train_summary, cnt)
                    print "iteration: %5d, train loss: %5d, train precision: %.5f, train recall: %.5f, train f1: %.5f" % (iteration, loss_train, precision_train, recall_train, f1_train)  
                    
                # validation
                if iteration % 100 == 0:
                    X_val_batch, y_val_batch = helper.nextRandomBatch(X_val, y_val, batch_size=self.batch_size)
                    y_val_weight_batch = 1 + np.array((y_val_batch == label2id['B']) | (y_val_batch == label2id['E']), float)
                    transition_batch = helper.getTransition(y_val_batch)
                    
                    loss_val, max_scores, max_scores_pre, length, val_summary =\
                        sess.run([
                            self.loss, 
                            self.max_scores, 
                            self.max_scores_pre, 
                            self.length,
                            self.val_summary
                        ], 
                        feed_dict={
                            self.targets_transition:transition_batch, 
                            self.inputs:X_val_batch, 
                            self.targets:y_val_batch, 
                            self.targets_weight:y_val_weight_batch
                        })
                    
                    predicts_val = self.viterbi(max_scores, max_scores_pre, length, predict_size=self.batch_size)
                    precision_val, recall_val, f1_val = self.evaluate(X_val_batch, y_val_batch, predicts_val, id2char, id2label)
                    summary_writer_val.add_summary(val_summary, cnt)
                    print "iteration: %5d, valid loss: %5d, valid precision: %.5f, valid recall: %.5f, valid f1: %.5f" % (iteration, loss_val, precision_val, recall_val, f1_val)

                    if f1_val > self.max_f1:
                        self.max_f1 = f1_val
                        save_path = saver.save(sess, save_file)
                        print "saved the best model with f1: %.5f" % (self.max_f1)

    def test(self, sess, X_test, X_test_str, output_path):
        char2id, id2char = helper.loadMap("char2id")
        label2id, id2label = helper.loadMap("label2id")
        num_iterations = int(math.ceil(1.0 * len(X_test) / self.batch_size))
        print "number of iteration: " + str(num_iterations)
        with open(output_path, "wb") as outfile:
            for i in range(num_iterations):
                print "iteration: " + str(i + 1)
                results = []
                X_test_batch = X_test[i * self.batch_size : (i + 1) * self.batch_size]
                X_test_str_batch = X_test_str[i * self.batch_size : (i + 1) * self.batch_size]
                if i == num_iterations - 1 and len(X_test_batch) < self.batch_size:
                    X_test_batch = list(X_test_batch)
                    X_test_str_batch = list(X_test_str_batch)
                    last_size = len(X_test_batch)
                    X_test_batch += [[0 for j in range(self.num_steps)] for i in range(self.batch_size - last_size)]
                    X_test_str_batch += [['x' for j in range(self.num_steps)] for i in range(self.batch_size - last_size)]
                    X_test_batch = np.array(X_test_batch)
                    X_test_str_batch = np.array(X_test_str_batch)
                    results = self.predictBatch(sess, X_test_batch, X_test_str_batch, id2label)
                    results = results[:last_size]
                else:
                    X_test_batch = np.array(X_test_batch)
                    results = self.predictBatch(sess, X_test_batch, X_test_str_batch, id2label)
                
                for i in range(len(results)):
                    doc = ''.join(X_test_str_batch[i])
                    outfile.write(doc + "<@>" +" ".join(results[i]).encode("utf-8") + "\n")

    def viterbi(self, max_scores, max_scores_pre, length, predict_size=128):
        best_paths = []
        for m in range(predict_size):
            path = []
            last_max_node = np.argmax(max_scores[m][length[m]])
            # last_max_node = 0
            for t in range(1, length[m] + 1)[::-1]:
                last_max_node = max_scores_pre[m][t][last_max_node]
                path.append(last_max_node)
            path = path[::-1]
            best_paths.append(path)
        return best_paths

    def predictBatch(self, sess, X, X_str, id2label):
        results = []
        length, max_scores, max_scores_pre = sess.run([self.length, self.max_scores, self.max_scores_pre], feed_dict={self.inputs:X})
        predicts = self.viterbi(max_scores, max_scores_pre, length, self.batch_size)
        for i in range(len(predicts)):
            x = ''.join(X_str[i]).decode("utf-8")
            y_pred = ''.join([id2label[val] for val in predicts[i] if val != 5 and val != 0])
            entitys = helper.extractEntity(x, y_pred)
            results.append(entitys)
        return results

    def evaluate(self, X, y_true, y_pred, id2char, id2label):
        precision = -1.0
        recall = -1.0
        f1 = -1.0
        hit_num = 0
        pred_num = 0
        true_num = 0
        for i in range(len(y_true)):
            x = ''.join([str(id2char[val].encode("utf-8")) for val in X[i]])
            y = ''.join([str(id2label[val].encode("utf-8")) for val in y_true[i]])
            y_hat = ''.join([id2label[val] for val in y_pred[i]  if val != 5])
            true_labels = helper.extractEntity(x, y)
            pred_labels = helper.extractEntity(x, y_hat)
            hit_num += len(set(true_labels) & set(pred_labels))
            pred_num += len(set(pred_labels))
            true_num += len(set(true_labels))
        if pred_num != 0:
            precision = 1.0 * hit_num / pred_num
        if true_num != 0:
            recall = 1.0 * hit_num / true_num
        if precision > 0 and recall > 0:
            f1 = 2.0 * (precision * recall) / (precision + recall)
        return precision, recall, f1

util.py

#encoding:utf-8
import re
import os
import csv
import time
import pickle
import numpy as np
import pandas as pd

def getEmbedding(infile_path="embedding"):
	char2id, id_char = loadMap("char2id")
	row_index = 0
	with open(infile_path, "rb") as infile:
		for row in infile:
			row = row.strip()
			row_index += 1
			if row_index == 1:
				num_chars = int(row.split()[0])
				emb_dim = int(row.split()[1])
				emb_matrix = np.zeros((len(char2id.keys()), emb_dim))
				continue
			items = row.split()
			char = items[0]
			emb_vec = [float(val) for val in items[1:]]
			if char in char2id:
				emb_matrix[char2id[char]] = emb_vec
	return emb_matrix

def nextBatch(X, y, start_index, batch_size=128):
    last_index = start_index + batch_size
    X_batch = list(X[start_index:min(last_index, len(X))])
    y_batch = list(y[start_index:min(last_index, len(X))])
    if last_index > len(X):
        left_size = last_index - (len(X))
        for i in range(left_size):
            index = np.random.randint(len(X))
            X_batch.append(X[index])
            y_batch.append(y[index])
    X_batch = np.array(X_batch)
    y_batch = np.array(y_batch)
    return X_batch, y_batch

def nextRandomBatch(X, y, batch_size=128):
    X_batch = []
    y_batch = []
    for i in range(batch_size):
        index = np.random.randint(len(X))
        X_batch.append(X[index])
        y_batch.append(y[index])
    X_batch = np.array(X_batch)
    y_batch = np.array(y_batch)
    return X_batch, y_batch

# use "0" to padding the sentence
def padding(sample, seq_max_len):
	for i in range(len(sample)):
		if len(sample[i]) < seq_max_len:
			sample[i] += [0 for _ in range(seq_max_len - len(sample[i]))]
	return sample

def prepare(chars, labels, seq_max_len, is_padding=True):
	X = []
	y = []
	tmp_x = []
	tmp_y = []

	for record in zip(chars, labels):
		c = record[0]
		l = record[1]
		# empty line
		if c == -1:
			if len(tmp_x) <= seq_max_len:
				X.append(tmp_x)
				y.append(tmp_y)
			tmp_x = []
			tmp_y = []
		else:
			tmp_x.append(c)
			tmp_y.append(l)	
	if is_padding:
		X = np.array(padding(X, seq_max_len))
	else:
		X = np.array(X)
	y = np.array(padding(y, seq_max_len))

	return X, y

def extractEntity(sentence, labels):
    entitys = []
    re_entity = re.compile(r'BM*E')
    m = re_entity.search(labels)
    while m:
        entity_labels = m.group()
        start_index = labels.find(entity_labels)
        entity = sentence[start_index:start_index + len(entity_labels)]
        labels = list(labels)
        # replace the "BM*E" with "OO*O"
        labels[start_index: start_index + len(entity_labels)] = ['O' for i in range(len(entity_labels))] 
        entitys.append(entity)
        labels = ''.join(labels)
        m = re_entity.search(labels)
    return entitys

def loadMap(token2id_filepath):
	if not os.path.isfile(token2id_filepath):
		print "file not exist, building map"
		buildMap()

	token2id = {}
	id2token = {}
	with open(token2id_filepath) as infile:
		for row in infile:
			row = row.rstrip().decode("utf-8")
			token = row.split('\t')[0]
			token_id = int(row.split('\t')[1])
			token2id[token] = token_id
			id2token[token_id] = token
	return token2id, id2token

def saveMap(id2char, id2label):
	with open("char2id", "wb") as outfile:
		for idx in id2char:
			outfile.write(id2char[idx] + "\t" + str(idx)  + "\r\n")
	with open("label2id", "wb") as outfile:
		for idx in id2label:
			outfile.write(id2label[idx] + "\t" + str(idx) + "\r\n")
	print "saved map between token and id"

def buildMap(train_path="train.in"):
	df_train = pd.read_csv(train_path, delimiter='\t', quoting=csv.QUOTE_NONE, skip_blank_lines=False, header=None, names=["char", "label"])
	chars = list(set(df_train["char"][df_train["char"].notnull()]))
	labels = list(set(df_train["label"][df_train["label"].notnull()]))
	char2id = dict(zip(chars, range(1, len(chars) + 1)))
	label2id = dict(zip(labels, range(1, len(labels) + 1)))
	id2char = dict(zip(range(1, len(chars) + 1), chars))
	id2label =  dict(zip(range(1, len(labels) + 1), labels))
	id2char[0] = ""
	id2label[0] = ""
	char2id[""] = 0
	label2id[""] = 0
	id2char[len(chars) + 1] = ""
	char2id[""] = len(chars) + 1

	saveMap(id2char, id2label)
	
	return char2id, id2char, label2id, id2label

def getTrain(train_path, val_path, train_val_ratio=0.99, use_custom_val=False, seq_max_len=200):
	char2id, id2char, label2id, id2label = buildMap(train_path)
	df_train = pd.read_csv(train_path, delimiter='\t', quoting=csv.QUOTE_NONE, skip_blank_lines=False, header=None, names=["char", "label"])

	# map the char and label into id
	df_train["char_id"] = df_train.char.map(lambda x : -1 if str(x) == str(np.nan) else char2id[x])
	df_train["label_id"] = df_train.label.map(lambda x : -1 if str(x) == str(np.nan) else label2id[x])
	
	# convert the data in maxtrix
	X, y = prepare(df_train["char_id"], df_train["label_id"], seq_max_len)

	# shuffle the samples
	num_samples = len(X)
	indexs = np.arange(num_samples)
	np.random.shuffle(indexs)
	X = X[indexs]
	y = y[indexs]
	
	if val_path != None:
		X_train = X
		y_train = y	
		X_val, y_val = getTest(val_path, is_validation=True, seq_max_len=seq_max_len)
	else:
		# split the data into train and validation set
		X_train = X[:int(num_samples * train_val_ratio)]
		y_train = y[:int(num_samples * train_val_ratio)]
		X_val = X[int(num_samples * train_val_ratio):]
		y_val = y[int(num_samples * train_val_ratio):]

	print "train size: %d, validation size: %d" %(len(X_train), len(y_val))

	return X_train, y_train, X_val, y_val

def getTest(test_path="test.in", is_validation=False, seq_max_len=200):
	char2id, id2char = loadMap("char2id")
	label2id, id2label = loadMap("label2id")

	df_test = pd.read_csv(test_path, delimiter='\t', quoting=csv.QUOTE_NONE, skip_blank_lines=False, header=None, names=["char", "label"])
	
	def mapFunc(x, char2id):
		if str(x) == str(np.nan):
			return -1
		elif x.decode("utf-8") not in char2id:
			return char2id[""]
		else:
			return char2id[x.decode("utf-8")]

	df_test["char_id"] = df_test.char.map(lambda x:mapFunc(x, char2id))
	df_test["label_id"] = df_test.label.map(lambda x : -1 if str(x) == str(np.nan) else label2id[x])
	
	if is_validation:
		X_test, y_test = prepare(df_test["char_id"], df_test["label_id"], seq_max_len)
		return X_test, y_test
	else:
		df_test["char"] = df_test.char.map(lambda x : -1 if str(x) == str(np.nan) else x)
		X_test, _ = prepare(df_test["char_id"], df_test["char_id"], seq_max_len)
		X_test_str, _ = prepare(df_test["char"], df_test["char_id"], seq_max_len, is_padding=False)
		print "test size: %d" %(len(X_test))
		return X_test, X_test_str

def getTransition(y_train_batch):
	transition_batch = []
	for m in range(len(y_train_batch)):
		y = [5] + list(y_train_batch[m]) + [0]
		for t in range(len(y)):
			if t + 1 == len(y):
				continue
			i = y[t]
			j = y[t + 1]
			if i == 0:
				break
			transition_batch.append(i * 6 + j)
	transition_batch = np.array(transition_batch)
	return transition_batch

train.py

import time
import helper
import argparse
import numpy as np
import pandas as pd
import tensorflow as tf
from BILSTM_CRF import BILSTM_CRF

# python train.py train.in model -v validation.in -c char_emb -e 10 -g 2

parser = argparse.ArgumentParser()
parser.add_argument("train_path", help="the path of the train file")
parser.add_argument("save_path", help="the path of the saved model")
parser.add_argument("-v","--val_path", help="the path of the validation file", default=None)
parser.add_argument("-e","--epoch", help="the number of epoch", default=100, type=int)
parser.add_argument("-c","--char_emb", help="the char embedding file", default=None)
parser.add_argument("-g","--gpu", help="the id of gpu, the default is 0", default=0, type=int)

args = parser.parse_args()

train_path = args.train_path
save_path = args.save_path
val_path = args.val_path
num_epochs = args.epoch
emb_path = args.char_emb
gpu_config = "/cpu:0"
#gpu_config = "/gpu:"+str(args.gpu)
num_steps = 200 # it must consist with the test

start_time = time.time()
print "preparing train and validation data"
X_train, y_train, X_val, y_val = helper.getTrain(train_path=train_path, val_path=val_path, seq_max_len=num_steps)
char2id, id2char = helper.loadMap("char2id")
label2id, id2label = helper.loadMap("label2id")
num_chars = len(id2char.keys())
num_classes = len(id2label.keys())
if emb_path != None:
	embedding_matrix = helper.getEmbedding(emb_path)
else:
	embedding_matrix = None

print "building model"
config = tf.ConfigProto(allow_soft_placement=True)
with tf.Session(config=config) as sess:
	with tf.device(gpu_config):
		initializer = tf.random_uniform_initializer(-0.1, 0.1)
		with tf.variable_scope("model", reuse=None, initializer=initializer):
			model = BILSTM_CRF(num_chars=num_chars, num_classes=num_classes, num_steps=num_steps, num_epochs=num_epochs, embedding_matrix=embedding_matrix, is_training=True)

		print "training model"
		tf.initialize_all_variables().run()
		model.train(sess, save_path, X_train, y_train, X_val, y_val)

		print "final best f1 is: %f" % (model.max_f1)

		end_time = time.time()
		print "time used %f(hour)" % ((end_time - start_time) / 3600)

test.py

import time
import helper
import argparse
import numpy as np
import pandas as pd
import tensorflow as tf
from BILSTM_CRF import BILSTM_CRF

# python test.py model test.in test.out -c char_emb -g 2

parser = argparse.ArgumentParser()
parser.add_argument("model_path", help="the path of model file")
parser.add_argument("test_path", help="the path of test file")
parser.add_argument("output_path", help="the path of output file")
parser.add_argument("-c","--char_emb", help="the char embedding file", default=None)
parser.add_argument("-g","--gpu", help="the id of gpu, the default is 0", default=0, type=int)
args = parser.parse_args()

model_path = args.model_path
test_path = args.test_path
output_path = args.output_path
gpu_config = "/cpu:0"
emb_path = args.char_emb
num_steps = 200 # it must consist with the train

start_time = time.time()

print "preparing test data"
X_test, X_test_str = helper.getTest(test_path=test_path, seq_max_len=num_steps)
char2id, id2char = helper.loadMap("char2id")
label2id, id2label = helper.loadMap("label2id")
num_chars = len(id2char.keys())
num_classes = len(id2label.keys())
if emb_path != None:
	embedding_matrix = helper.getEmbedding(emb_path)
else:
	embedding_matrix = None

print "building model"
config = tf.ConfigProto(allow_soft_placement=True)
with tf.Session(config=config) as sess:
	with tf.device(gpu_config):
		initializer = tf.random_uniform_initializer(-0.1, 0.1)
		with tf.variable_scope("model", reuse=None, initializer=initializer):
			model = BILSTM_CRF(num_chars=num_chars, num_classes=num_classes, num_steps=num_steps, embedding_matrix=embedding_matrix, is_training=False)

		print "loading model parameter"
		saver = tf.train.Saver()
		saver.restore(sess, model_path)

		print "testing"
		model.test(sess, X_test, X_test_str, output_path)

		end_time = time.time()
		print "time used %f(hour)" % ((end_time - start_time) / 3600)

相关预处理的数据请参考github: scofiled's github/bilstm+crf

你可能感兴趣的:(NLP,ubuntu,TensorFlow,LSTM,Python,NLP)

常用Python自动化测试框架有哪些？字节程序员软件测试 python 开发语言压力测试单元测试集成测试
随着技术的进步和自动化技术的出现，市面上出现了一些自动化测试框架。只需要进行一些适用性和效率参数的调整，这些自动化测试框架就能够开箱即用，大大节省了测试时间。而且由于这些框架被广泛使用，他们具有很好的健壮性，并且具有广泛多样的用例集和技术来轻易发现微小的缺陷。以前，测试团队接手一个项目，他们不得不为这个项目构建一个自动化测试框架。一个测试框架应该具有最佳的测试用例、假设（assumptions）、
华为OD机试E卷 --增强的strstr--24年OD统一考试（Java & JS & Python & C & C++）飞码创造者最新华为OD机试题库2024 华为od java javascript python c语言
文章目录题目描述输入描述输出描述用例题目解析JS算法源码Java算法源码python算法源码题目描述C语言有一个库函数:char*strstr(constchar*haystack,constchar*needle),实现在字符串haystack中查找第一次出现字符串needle的位置，如果未找到则返回null。现要求实现一个strstr的增强函数，可以使用带可选段的字符串来模糊查询，strstr
使用 Python 指定内容爬取百度引擎搜索结果 m0_74825614 python 百度开发语言
在本篇博客中，我将展示如何使用Python编写一个简单的百度搜索爬虫。这个爬虫可以自动化地从百度获取搜索结果，并提取每个结果的标题和链接。我们将使用requests库来发送HTTP请求，使用BeautifulSoup库来解析HTML内容。需求分析在实现爬虫之前，我们需要明确以下需求：通过构建百度搜索的URL来发送搜索请求。解析百度搜索结果页面，提取每个结果的标题和链接。将搜索结果以列表形式返回，方
python连接MYSQL数据库（连接MYSQL数据库报错解决方法） Oblinto 数据库学习数据库 mysql
一、连接前的准备（如果报错可以从以下几个方面检查一下）1.检查mysql服务查看mysql服务是否开启sudosystemctlstatusmysql若没开启，开启mysql服务sudosystemctlstartmysql2.检查mysql的3306端口查看3306端口是否打开netstat-an|grep3306若没打开，打开3306端口sudoufwallow3306/tcp3.修改配置文件
PyCharm报 mysql连接异常退出，报错 Process finished with exit code -1073741819 (0xC0000005) maelstorm mysql 数据库
mysql-connector-python9.1.0，PyCharm报Processfinishedwithexitcode-1073741819(0xC0000005)_mysql1073741819-CSDN博客降级mysql-connector-python9.1.0到9.0.0搞得一晚上md
github上的python代码怎么运行_使用 Python 在 GitHub 上运行你的博客 -Fun言 weixin_39946300
使用Pelican创建博客，这是一个基于Python的平台，与GitHub配合的不错。GitHub是一个非常流行的用于源代码控制的Web服务，它使用Git同步本地文件和GitHub服务器上保留的副本，这样你就可以轻松地共享和备份你的工作。除了为代码仓库提供用户界面之外，GitHub还运允许用户直接从仓库发布网页。GitHub推荐的网站生成软件包是Jekll，是使用Ruby编写的。因为我是Pytho
python execjs库_python3调用js的库之execjs 一盏Online python execjs库
针对现在大部分的网站都是使用js加密，js加载的，并不能直接抓取出来，这时候就不得不适用一些三方类库来执行js语句执行JS的类库：execjs，PyV8，selenium，node这里主要讲一下execjs，一个比较好用且容易上手的类库(支持py2，与py3)，支持JSruntime。(一)安装：pipinstallPyExecJSoreasy_installPyExecJS(二)运行时环境exe
Python 执行 javascript PyExecJS 模块 weixin_30376083 python javascript json ViewUI
PyExecJS安装pipinstallPyExecJSPyExecJS的基本使用:>>>importexecjs>>>execjs.eval("'redyellowblue'.split('')")['red','yellow','blue']>>>ctx=execjs.compile("""...functionadd(x,y){...returnx+y;...}...""")>>>ctx.c
「QT」经验篇之界面代码与逻辑代码的分离思想何曾参静谧「QT」QT5程序设计 qt 系统架构数据库
✨博客主页何曾参静谧的博客（✅关注、点赞、⭐收藏、转发）全部专栏（专栏会有变化，以最新发布为准）「Win」Windows程序设计「IDE」集成开发环境「定制」定制开发集合「C/C++」C/C++程序设计「DSA」数据结构与算法「UG/NX」NX二次开发「QT」QT5程序设计「File」数据文件格式「UG/NX」BlockUI集合「Py」Python程序设计「Math」探秘数学世界「PK」Paras
在Python中运行JavaScript代码（使用execjs模块）飞起来fly呀 Python python 开发语言
使用execjs模块可以在Python中运行JavaScript代码。以下是使用execjs模块的基本步骤：1.安装execjs模块:可以使用pip命令进行安装:pipinstall execjs2.导入execjs模块:import execjs3.使用compile方法可以将JavaScript代码编译为可执行的函数compiled_func = execjs.compile(code)#执行
Python快速使用js接口程序媛小本 python javascript udp
在跨语言编程和Web开发中，Python和JavaScript是两种常用的编程语言。有时候，我们可能需要在Python环境中执行JavaScript代码。这就是execjs库发挥作用的地方。一、安装ExecJS在命令行中输入以下命令：pipinstallPyExecJS二、ExecJS的基本使用ExecJS支持多种JavaScript运行时环境，包括Node.js、SpiderMonkey、Web
Python设计模式详解之5 —— 原型模式拾工 Python设计模式 python 设计模式
Prototype设计模式是一种创建型设计模式，它通过复制已有的实例来创建新对象，而不是通过从头实例化。这种模式非常适合对象的创建成本较高或者需要避免复杂的构造过程时使用。Prototype模式提供了一种通过克隆来快速创建对象的方式。1.Prototype模式简介Prototype模式通过定义一个接口来克隆自身，使得客户端代码可以通过复制原型来创建新对象。Python中，Prototype模式可以
Python中的23种设计模式：详细分类与总结拾工 Python设计模式软件设计设计模式
设计模式是解决特定问题的通用方法，分为创建型模式、结构型模式和行为型模式三大类。以下是对每种模式的详细介绍，包括其核心思想、应用场景和优缺点。一、创建型模式（CreationalPatterns）创建型模式关注对象的创建，旨在解耦对象的创建过程，提高灵活性和可扩展性。1.单例模式（Singleton）核心思想：确保一个类只有一个实例，并提供全局访问点。应用场景：数据库连接、配置管理器、日志记录器。
华为OD机试E卷 -最长方连续方波信号（Java & Python& JS & C++ & C ）算法大师最新华为OD机试华为od java python javascript c语言华为od机考e卷
最新华为OD机试真题目录：点击查看目录华为OD面试真题精选：点击立即查看题目描述输入一串方波信号，求取最长的完全连续交替方波信号，并将其输出，如果有相同长度的交替方波信号，输出任一即可。方波信号高位用1标识，低位用0标识。说明：一个完整的信号一定以0开始然后以0结尾，即010是一个完整信号，但101，1010，0101不是输入的一串方波信号是由一个或多个完整信号组成两个相邻信号之间可能有0个或多个
「Py」进阶语法篇之 Python中的异常捕获与处理何曾参静谧「Py」Python程序设计 python 数据库开发语言
✨博客主页何曾参静谧的博客（✅关注、点赞、⭐收藏、转发）全部专栏（专栏会有变化，以最新发布为准）「Win」Windows程序设计「IDE」集成开发环境「UG/NX」BlockUI集合「C/C++」C/C++程序设计「DSA」数据结构与算法「UG/NX」NX二次开发「QT」QT5程序设计「File」数据文件格式「UG/NX」NX定制开发「Py」Python程序设计「Math」探秘数学世界「PK」Pa
AI Agent的记忆系统实现：从短期对话到长期知识技术出海录人工智能 AI ai agent
在上一篇文章中，我们搭建了AIAgent的基础框架。今天，我想深入讲讲AIAgent最核心的部分之一：记忆系统。说实话，我在实现记忆系统时走了不少弯路，希望通过这篇文章，能帮大家少走一些弯路。从一个bug说起还记得在开发知识助手的过程中，我遇到了一个很有意思的问题。一天我正在测试多轮对话功能：我：Python的装饰器是什么？助手：装饰器是Python中用于修改函数或类行为的一种设计模式...（省略
python如何在一个类里面调用另一个类里面的东西 xiamu_CDA python 开发语言
Python高手必备：轻松实现在一个类里调用另一个类的方法和属性Python是一门强大且灵活的编程语言，它的面向对象特性使得开发者可以轻松地组织和管理代码。然而，在实际开发过程中，我们经常会遇到这样一个问题：如何在一个类里面调用另一个类里面的东西？这看似简单的问题背后其实涉及到了许多面向对象编程的核心概念。本文将深入探讨这个问题，并提供几种实现方法，帮助你更好地理解和应用Python的类。为什么需
python给PDF添加水印 icon920 java pdf
#添加水印fromPyPDF2importPdfReader,PdfWriterfromcopyimportcopysy=PdfReader("C:\\test\\watermark.pdf")＃水印所在位置mark_page=sy.pages[0]#水印所在的页数#读取添加水印的文件file_reader=PdfReader("C:\\test\\PDF.pdf")#需要添加水印的PDFfile
使用python对pdf批量添加水印，并且水印字体，大小，位置，旋转角度都是可以调节不懂python不懂R python python pdf
1.使用python对pdf批量添加水印，并且水印字体，大小，位置，旋转角度都是可以调节的importosfromPyPDF2importPdfReader,PdfWriterfromreportlab.pdfgenimportcanvasfromreportlab.lib.pagesizesimportletterfromreportlab.lib.colorsimportColordefcre
Python批量为PDF添加水印：让你的文件瞬间高大上！码无止尽 Python办公自动化 python pdf
嗨，各位可爱的小伙伴们！小编在此奉上今天的超级干货：如何用Python给一大堆PDF文件添加水印。请放心，这不是在交朋友圈秀操作，而是有实际需求的哦！有时候我们需要在PDF文件上添加水印，比如“草稿”、“保密”、“审阅”等标识，来提醒自己或他人。今天就让我来教你如何用Python轻松搞定这件事！首先，让我给你看一下大致的实现思路，然后再附上实际代码。实现思路1、首先，我们需要一个PDF处理的Pyt
构建自动化网页内容监控系统：使用Python 爱你不会累
本文还有配套的精品资源，点击获取简介：网页监控更新工具是一个由Python开发的软件，用于检测和记录网页内容的变化。该工具利用Python在Web抓取和数据分析方面的优势，包括利用requests,BeautifulSoup,lxml,和diff-match-patch等库来获取网页内容、解析HTML文档及计算文本差异。工具支持在Windows7及Python2.7.3环境下运行，并允许用户设定监
python监控网页更新_【小白教程】Python3监控网页 weixin_39553904 python监控网页更新
之前用RSS来监控网页更新内容，可惜刷新时间太长了，三个小时。。只能看看新闻啥的，又没有小钱钱充会员（摊手听说Python可以做这个功能，抱着试试看的态度，本以为会很麻烦，没想到这么简单哈哈~我从来没有用过Python都做出来了，相信你也没问题！（我真是纯小白，路过的大佬请指教（⊙ｏ⊙）ノ）所用模块#监控模块fromurllibimportrequestfrombs4importBeautiful
python鸢尾花数据集knn_【python+机器学习1】python 实现 KNN weixin_39629269 python鸢尾花数据集knn
欢迎关注哈希大数据微信公众号【哈希大数据】1KNN算法基本介绍K-NearestNeighbor(k最邻近分类算法)，简称KNN，是最简单的一种有监督的机器学习算法。也是一种懒惰学习算法，即开始训练仅仅是保存所有样本集的信息，直到测试样本到达才开始进行分类决策。KNN算法的核心思想：要想确定测试样本属于哪一类，就先寻找所有训练样本中与该测试样本“距离”最近的前K个样本，然后判断这K个样本中大部分所
实时监控网页变化，并增加多种提示信息安替-AnTi 自动化工具 linux 运维服务器监控网页变化
文章目录python代码实现优势手动部署下载源码安装依赖初次登录设置Docker部署设置监控chromeJS插件实现插件1背景介绍使用方法插件2参考文献通过订阅本篇文章，您可以实现在任意打开网页情况下，监控网页内指定内容或者全部内容的变化，变化的内容、时间点可以通过邮箱、微信等方式进行提醒。使用场景可以用来监控足球比赛的赔率、京东商品库存、价格等因素，并且可以为订阅用户添加各种定制化的服务。如在订
用python监控网页某个位置的值的变化老光私享 python 开发语言爬虫
可以使用Python的第三方库来监控网页上某个位置的值的变化。一种方法是使用BeautifulSoup库来爬取网页并解析HTML/XML。然后，您可以使用正则表达式或其他方法来提取所需信息。另一种方法是使用Selenium库来模拟浏览器行为，并使用JavaScript来获取网页上的信息。下面是一个使用BeautifulSoup的例子：importrequestsfrombs4importBeaut
python向pdf添加水印 ChenWenKen Python应用 python 前端
fromtypingimportUnion,Tuplefromreportlab.libimportunitsfromreportlab.pdfgenimportcanvasfromreportlab.pdfbaseimportpdfmetricsfromreportlab.pdfbase.ttfontsimportTTFontpdfmetrics.registerFont(TTFont('msy
python笔记（3）(re库和pandas库) Techer_Y 笔记
参考链接：Python正则表达式|菜鸟教程(runoob.com)1、re库，python正则表达式正则表达式是一个特殊的字符序列它能帮助你检查一个字符串是否与某种模式匹配。re模块使python语言拥有全部的正则表达式功能。re.match尝试从字符串起始位置匹配一个模式，如果不是起始位置匹配成功的话，match()就返回none。re.match(pattern,string,flags=0)
Python PDF添加水印 lxccc9 python 笔记
PDF添加水印加载模块：fromPyPDF2importPdfFileReader,PdfFileWriterimportosPDF添加水印：watermark_pdf=PdfFileReader('./tests/watermark.pdf')#读取第一页watermark=watermark_pdf.getPage(0)#读取需要加水印的pdf文件input_pdf=PdfFileReader
用Python写前端 eternity_ld 前端 python 开发语言
分享一个让开发交互式Webapp超级简单的工具。不会HTML，CSS，JAVASCRIPT也没事。交互式Webapp非常实用，比如说做一个问卷调查页面、一个投票系统、一个信息收集表单，上传文件等等，因为网页是可视化的，因此还可以作为一个没有服务端的图片界面应用程序而使用。如果你有这样的开发需求，那用Python真的是太简单了。借助于PyWebIO（pipinstallpywebio），你可以分分钟
使用python做出一只懒羊羊大G哥 python 开发语言
今天使用Python的Turtle库做出一只懒羊羊PythonTurtle库功能与用途一、绘图基础功能Turtle库提供了一种简单易用的方式来进行图形绘制。通过控制屏幕上的海龟指针移动来完成线条和形状的创建。可以设置画笔的颜色、大小以及方向等属性，从而实现多样化的视觉效果。importturtlet=turtle.Turtle()t.forward(100)#向前走100像素距离t.right(9
Dom 周华华 JavaScript html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml&q
【Spark九十六】RDD API之combineByKey bit1129 spark
1. combineByKey函数的运行机制 RDD提供了很多针对元素类型为(K,V)的API，这些API封装在PairRDDFunctions类中，通过Scala隐式转换使用。这些API实现上是借助于combineByKey实现的。combineByKey函数本身也是RDD开放给Spark开发人员使用的API之一首先看一下combineByKey的方法说明：
msyql设置密码报错：ERROR 1372 (HY000): 解决方法详解 daizj mysql 设置密码
MySql给用户设置权限同时指定访问密码时，会提示如下错误： ERROR 1372 (HY000): Password hash should be a 41-digit hexadecimal number；问题原因：你输入的密码是明文。不允许这么输入。解决办法：用select password('你想输入的密码');查询出你的密码对应的字符串，然后
路漫漫其修远兮吾将上下而求索周凡杨学习思索
王国维在他的《人间词话》中曾经概括了为学的三种境界古今之成大事业、大学问者，罔不经过三种之境界。“昨夜西风凋碧树。独上高楼，望尽天涯路。”此第一境界也。“衣带渐宽终不悔，为伊消得人憔悴。”此第二境界也。“众里寻他千百度，蓦然回首，那人却在灯火阑珊处。”此第三境界也。学习技术，这也是你必须经历的三种境界。第一层境界是说，学习的路是漫漫的，你必须做好充分的思想准备，如果半途而废还不如不要开始。这里，注
Hadoop(二)对话单的操作朱辉辉33 hadoop
Debug： 1、 A = LOAD '/user/hue/task.txt' USING PigStorage(' ') AS (col1,col2,col3); DUMP A; //输出结果前几行示例： (>ggsnPDPRecord(21),,) (-->recordType(0),,) (-->networkInitiation(1),,)
web报表工具FineReport常用函数的用法总结（日期和时间函数）老A不折腾 finereport 报表工具 web开发
web报表工具FineReport常用函数的用法总结（日期和时间函数）说明：凡函数中以日期作为参数因子的，其中日期的形式都必须是yy/mm/dd。而且必须用英文环境下双引号(" ")引用。 DATE DATE(year,month,day):返回一个表示某一特定日期的系列数。 Year:代表年，可为一到四位数。 Month:代表月份。
c++ 宏定义中的##操作符墙头上一根草 C++
#与##在宏定义中的--宏展开 #include <stdio.h> #define f(a,b) a##b #define g(a) #a #define h(a) g(a) int main() { &nbs
分析Spring源代码之，DI的实现 aijuans spring DI 现源代码
(转) 分析Spring源代码之，DI的实现 2012/1/3 by tony 接着上次的讲，以下这个sample [java] view plain copy print
for循环的进化 alxw4616 JavaScript
// for循环的进化 // 菜鸟 for (var i = 0; i < Things.length ; i++) { // Things[i] } // 老鸟 for (var i = 0, len = Things.length; i < len; i++) { // Things[i] } // 大师 for (var i = Things.le
网络编程Socket和ServerSocket简单的使用百合不是茶网络编程基础 IP地址端口
网络编程;TCP/IP协议网络:实现计算机之间的信息共享,数据资源的交换协议:数据交换需要遵守的一种协议,按照约定的数据格式等写出去端口:用于计算机之间的通信每运行一个程序，系统会分配一个编号给该程序，作为和外界交换数据的唯一标识 0~65535 查看被使用的
JDK1.5 生产消费者 bijian1013 java thread 生产消费者 java多线程
ArrayBlockingQueue：一个由数组支持的有界阻塞队列。此队列按 FIFO（先进先出）原则对元素进行排序。队列的头部是在队列中存在时间最长的元素。队列的尾部是在队列中存在时间最短的元素。新元素插入到队列的尾部，队列检索操作则是从队列头部开始获得元素。 ArrayBlockingQueue的常用方法：
JAVA版身份证获取性别、出生日期及年龄 bijian1013 java 性别出生日期年龄
工作中需要根据身份证获取性别、出生日期及年龄，且要还要支持15位长度的身份证号码，网上搜索了一下，经过测试好像多少存在点问题，干脆自已写一个。 CertificateNo.java package com.bijian.study; import java.util.Calendar; import
【Java范型六】范型与枚举 bit1129 java
首先，枚举类型的定义不能带有类型参数，所以，不能把枚举类型定义为范型枚举类，例如下面的枚举类定义是有编译错的 public enum EnumGenerics<T> { //编译错，提示枚举不能带有范型参数 OK, ERROR; public <T> T get(T type) { return null;
【Nginx五】Nginx常用日志格式含义 bit1129 nginx
1. log_format 1.1 log_format指令用于指定日志的格式，格式： log_format name(格式名称) type(格式样式) 1.2 如下是一个常用的Nginx日志格式： log_format main '[$time_local]|$request_time|$status|$body_bytes
Lua 语言 15 分钟快速入门 ronin47 lua 基础
- - 单行注释 - - [[ [多行注释] - - ]] - - - - - - - - - - - 1. 变量 & 控制流 - - - - - - - - - - num = 23 - - 数字都是双精度 str = 'aspythonstring'
java-35.求一个矩阵中最大的二维矩阵 ( 元素和最大 ) bylijinnan java
the idea is from: http://blog.csdn.net/zhanxinhang/article/details/6731134 public class MaxSubMatrix { /**see http://blog.csdn.net/zhanxinhang/article/details/6731134 * Q35 求一个矩阵中最大的二维
mongoDB文档型数据库特点开窍的石头 mongoDB文档型数据库特点
MongoDD: 文档型数据库存储的是Bson文档-->json的二进制特点：内部是执行引擎是js解释器，把文档转成Bson结构，在查询时转换成js对象。 mongoDB传统型数据库对比传统类型数据库：结构化数据，定好了表结构后每一个内容符合表结构的。也就是说每一行每一列的数据都是一样的文档型数据库：不用定好数据结构，
[毕业季节]欢迎广大毕业生加入JAVA程序员的行列 comsci java
一年一度的毕业季来临了。。。。。。。。正在投简历的学弟学妹们。。。如果觉得学校推荐的单位和公司不适合自己的兴趣和专业，可以考虑来我们软件行业，做一名职业程序员。。。软件行业的开发工具中，对初学者最友好的就是JAVA语言了，网络上不仅仅有大量的
PHP操作Excel – PHPExcel 基本用法详解 cuiyadll PHP Excel
导出excel属性设置//Include classrequire_once('Classes/PHPExcel.php');require_once('Classes/PHPExcel/Writer/Excel2007.php');$objPHPExcel = new PHPExcel();//Set properties 设置文件属性$objPHPExcel->getProperties
IBM Webshpere MQ Client User Issue (MCAUSER) darrenzhu IBM jms user MQ MCAUSER
IBM MQ JMS Client去连接远端MQ Server的时候，需要提供User和Password吗？答案是根据情况而定，取决于所定义的Channel里面的属性Message channel agent user identifier (MCAUSER)的设置。 http://stackoverflow.com/questions/20209429/how-mca-user-i
网线的接法 dcj3sjt126com
一、PC连HUB (直连线)A端：（标准568B）：白橙，橙，白绿，蓝，白蓝，绿，白棕，棕。 B端：（标准568B）：白橙，橙，白绿，蓝，白蓝，绿，白棕，棕。二、PC连PC （交叉线）A端：(568A)：白绿，绿，白橙，蓝，白蓝，橙，白棕，棕； B端：（标准568B）：白橙，橙，白绿，蓝，白蓝，绿，白棕，棕。三、HUB连HUB&nb
Vimium插件让键盘党像操作Vim一样操作Chrome dcj3sjt126com chrome vim
什么是键盘党？键盘党是指尽可能将所有电脑操作用键盘来完成，而不去动鼠标的人。鼠标应该说是新手们的最爱，很直观，指哪点哪，很听话！不过常常使用电脑的人，如果一直使用鼠标的话，手会发酸，因为操作鼠标的时候，手臂不是在一个自然的状态，臂肌会处于绷紧状态。而使用键盘则双手是放松状态，只有手指在动。而且尽量少的从鼠标移动到键盘来回操作，也省不少事。在chrome里安装 vimium 插件
MongoDB查询（2）——数组查询[六] eksliang mongodb MongoDB查询数组
MongoDB查询数组转载请出自出处：http://eksliang.iteye.com/blog/2177292 一、概述 MongoDB查询数组与查询标量值是一样的，例如，有一个水果列表，如下所示： > db.food.find() { "_id" : "001", "fruits" : [ "苹
cordova读写文件（1） gundumw100 JavaScript Cordova
使用cordova可以很方便的在手机sdcard中读写文件。首先需要安装cordova插件：file 命令为： cordova plugin add org.apache.cordova.file 然后就可以读写文件了，这里我先是写入一个文件，具体的JS代码为： var datas=null;//datas need write var directory=&
HTML5 FormData 进行文件jquery ajax 上传到又拍云 ileson jquery Ajax html5 FormData
html5 新东西：FormData 可以提交二进制数据。页面test.html <!DOCTYPE> <html> <head> <title> formdata file jquery ajax upload</title> </head> <body> <
swift appearanceWhenContainedIn:(version1.2 xcode6.4) 啸笑天 version
swift1.2中没有oc中对应的方法： + (instancetype)appearanceWhenContainedIn:(Class <UIAppearanceContainer>)ContainerClass, ... NS_REQUIRES_NIL_TERMINATION; 解决方法：在swift项目中新建oc类如下： #import &
java实现SMTP邮件服务器 macroli java 编程
电子邮件传递可以由多种协议来实现。目前，在Internet 网上最流行的三种电子邮件协议是SMTP、POP3 和 IMAP，下面分别简单介绍。　　◆ SMTP 协议　　简单邮件传输协议(Simple Mail Transfer Protocol,SMTP)是一个运行在TCP/IP之上的协议，用它发送和接收电子邮件。SMTP 服务器在默认端口25上监听。SMTP客户使用一组简单的、基于文本的
mongodb group by having where 查询sql qiaolevip 每天进步一点点学习永无止境 mongo 纵观千象
SELECT cust_id, SUM(price) as total FROM orders WHERE status = 'A' GROUP BY cust_id HAVING total > 250 db.orders.aggregate( [ { $match: { status: 'A' } }, { $group: {
Struts2 Pojo（六） Luob. POJO strust2
注意：附件中有完整案例 1.采用POJO对象的方法进行赋值和传值 2.web配置 <?xml version="1.0" encoding="UTF-8"?> <web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee&q
struts2步骤 wuai struts
1、添加jar包 2、在web.xml中配置过滤器 <filter> <filter-name>struts2</filter-name> <filter-class>org.apache.st

TensorFlow RNN深度学习 BiLSTM+CRF 实现 sequence labeling 序列标注 源码

你可能感兴趣的:(NLP,ubuntu,TensorFlow,LSTM,Python,NLP)

TensorFlow RNN深度学习 BiLSTM+CRF 实现 sequence labeling 序列标注源码