20.RNN模型: 电影评论情感分析

  • 问题描述
    预测用户的电影评论是积极还是消极的。
  • 流程
    1.文本数据-token化:将单词转化为int型的index
%%time
from tensorflow.python.keras.preprocessing.text import Tokenizer
num_words = 10000
tokenizer = Tokenizer(num_words=num_words)
tokenizer.fit_on_texts(data_text)
x_train_tokens = tokenizer.texts_to_sequences(x_train_text)
x_test_tokens = tokenizer.texts_to_sequences(x_test_text)

%%time可以获得单元格运行花费的时间
2.padding和truncating
将不同长度的文本转化为长度一致的输入,并获得idx2word的字典

pad = "pre"
x_train_pad = pad_sequences(x_train_tokens, maxlen=max_tokens,
                           padding=pad, truncating=pad)
x_test_pad = pad_sequences(x_test_tokens, maxlen=max_tokens,
                           padding=pad, truncating=pad)
idx = tokenizer.word_index
inverse_map = dict(zip(idx.values(), idx.keys()))

3.建立RNN模型
第一层是Embedding,接三层GRU,最后是FC层

model = Sequential()
embedding_size = 8
model.add(Embedding(input_dim=num_words,
                   output_dim=embedding_size,
                   input_length=max_tokens,
                   name="layer_embedding"))
model.add(GRU(units=16, return_sequences=True))
model.add(GRU(units=8, return_sequences=True))
model.add(GRU(units=4))
model.add(Dense(1, activation="sigmoid"))
optimizer = Adam(lr=1e-3)
model.compile(loss="binary_crossentropy",
             optimizer=optimizer,
             metrics=["accuracy"])
model.fit(x_train_pad, y_train,
         validation_split=0.05,
         epochs=1,
         batch_size=64)
# GRU的参数:
# units: 正整数,输出空间的维度。
# return_sequences: 布尔值。是返回输出序列中的最后一个输出,还是全部序列。

4.新数据的预测

text1 = "This movie is fantastic! I really like it because it is so good!"
text2 = "Good movie!"
text3 = "Maybe I like this movie."
text4 = "Meh ..."
text5 = "If I were a drunk teenager then this movie might be good."
text6 = "Bad movie!"
text7 = "Not a good movie!"
text8 = "This movie really sucks! Can I get my money back please?"
texts = [text1, text2, text3, text4, text5, text6, text7, text8]

tokens = tokenizer.texts_to_sequences(texts)
tokens_pad = pad_sequences(tokens, maxlen=max_tokens,
                          padding=pad,
                          truncating=pad)

model.predict(tokens_pad)

5.查看embedding层的权重

layer_embedding = model.get_layer("layer_embedding")
weights_embedding = layer_embedding.get_weights()[0]
token_good = tokenizer.word_index["good"]
token_great = tokenizer.word_index["great"]
print(weights_embedding[token_good])
print(weights_embedding[token_great])

你可能感兴趣的:(20.RNN模型: 电影评论情感分析)