Omni-Space

Deep Learning Techniques for Text Classification

原文链接： https://medium.com/datadriveninvestor/deep-learning-techniques-for-text-classification-9392ca9492c7

The exponential growth in the number of complex datasets every year requires more enhancement in machine learning methods to provide robust and accurate data classification. Lately, deep learning approaches are achieving better results compared to previous machine learning algorithms on tasks like image classification, natural language processing, face recognition, etc. The success of these deep learning algorithms relies on their capacity to model complex and non-linear relationships within the data.

However, finding suitable structures for these models has been a challenge for researchers which are addressed leveraging. Transfer Learning techniques where instead of training a model from scratch, we reuse a pre-trained model and then fine-tune it for another related task which have resulted in a massive leap in state-of-the-art results for many of the NLP tasks through various approaches such as ULMFiT, the OpenAI Transformer, ELMo and Google AI’s BERT.

Topics you’ll find on this post include the following:

Deep Neural Networks
Recurrent Neural Networks (RNN)
Gated Recurrent Unit (GRU)
Long Short-Term Memory (LSTM)
Convolutional Neural Networks (CNN)
Hierarchical Attention Networks
Recurrent Convolutional Neural Networks (RCNN)
Random Multimodel Deep Learning (RMDL)
Hierarchical Deep Learning for Text (HDLTex)

Deep Neural Networks

Deep Neural Networks architectures are designed to learn through multiple connections of layers where every single layer only receives a connection from previous and provides connections only to the next layer in the hidden part. The input layer is embedding vectors as shown in Figure below. The output layer neurons equal to the number of classes for multi-class classification and only one neuron for binary classification. Here, we have multi-class DNNs where the number of nodes in each layer as well as the number of layers are randomly assigned. The implementation of Deep Neural Network (DNN) is basically a discriminatively trained model that uses the standard back-propagation algorithm and sigmoid or ReLU as activation functions. The output layer for multi-class classification should use Softmax.

import packages:

from sklearn.datasets import fetch_20newsgroups
from keras.layers import  Dropout, Dense
from keras.models import Sequential
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import metrics

convert text to TF-IDF:

def TFIDF(X_train, X_test,MAX_NB_WORDS=75000):
    vectorizer_x = TfidfVectorizer(max_features=MAX_NB_WORDS)
    X_train = vectorizer_x.fit_transform(X_train).toarray()
    X_test = vectorizer_x.transform(X_test).toarray()
    print("tf-idf with",str(np.array(X_train).shape[1]),"features")
    return (X_train,X_test)

Build a DNN Model for Text:

def Build_Model_DNN_Text(shape, nClasses, dropout=0.5):
    """
    buildModel_DNN_Tex(shape, nClasses,dropout)
    Build Deep neural networks Model for text classification
    Shape is input feature space
    nClasses is number of classes
    """
    model = Sequential()
    node = 512 # number of nodes
    nLayers = 4 # number of  hidden layer

    model.add(Dense(node,input_dim=shape,activation='relu'))
    model.add(Dropout(dropout))
    for i in range(0,nLayers):
        model.add(Dense(node,input_dim=node,activation='relu'))
        model.add(Dropout(dropout))
    model.add(Dense(nClasses, activation='softmax'))

    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

    return model

Load text dataset (20newsgroups):

newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

run DNN and see result:

X_train_tfidf,X_test_tfidf = TFIDF(X_train,X_test)
model_DNN = Build_Model_DNN_Text(X_train_tfidf.shape[1], 20)
model_DNN.fit(X_train_tfidf, y_train,
                              validation_data=(X_test_tfidf, y_test),
                              epochs=10,
                              batch_size=128,
                              verbose=2)

predicted = model_DNN.predict(X_test_tfidf)

print(metrics.classification_report(y_test, predicted))

Model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
dense_1 (Dense)              (None, 512)               38400512
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0
_________________________________________________________________
dense_2 (Dense)              (None, 512)               262656
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0
_________________________________________________________________
dense_3 (Dense)              (None, 512)               262656
_________________________________________________________________
dropout_3 (Dropout)          (None, 512)               0
_________________________________________________________________
dense_4 (Dense)              (None, 512)               262656
_________________________________________________________________
dropout_4 (Dropout)          (None, 512)               0
_________________________________________________________________
dense_5 (Dense)              (None, 512)               262656
_________________________________________________________________
dropout_5 (Dropout)          (None, 512)               0
_________________________________________________________________
dense_6 (Dense)              (None, 20)                10260
=================================================================
Total params: 39,461,396
Trainable params: 39,461,396
Non-trainable params: 0
_________________________________________________________________

Output:

Train on 11314 samples, validate on 7532 samples
Epoch 1/10
 - 16s - loss: 2.7553 - acc: 0.1090 - val_loss: 1.9330 - val_acc: 0.3184
Epoch 2/10
 - 15s - loss: 1.5330 - acc: 0.4222 - val_loss: 1.1546 - val_acc: 0.6204
Epoch 3/10
 - 15s - loss: 0.7438 - acc: 0.7257 - val_loss: 0.8405 - val_acc: 0.7499
Epoch 4/10
 - 15s - loss: 0.2967 - acc: 0.9020 - val_loss: 0.9214 - val_acc: 0.7767
Epoch 5/10
 - 15s - loss: 0.1557 - acc: 0.9543 - val_loss: 0.8965 - val_acc: 0.7917
Epoch 6/10
 - 15s - loss: 0.1015 - acc: 0.9705 - val_loss: 0.9427 - val_acc: 0.7949
Epoch 7/10
 - 15s - loss: 0.0595 - acc: 0.9835 - val_loss: 0.9893 - val_acc: 0.7995
Epoch 8/10
 - 15s - loss: 0.0495 - acc: 0.9866 - val_loss: 0.9512 - val_acc: 0.8079
Epoch 9/10
 - 15s - loss: 0.0437 - acc: 0.9867 - val_loss: 0.9690 - val_acc: 0.8117
Epoch 10/10
 - 15s - loss: 0.0443 - acc: 0.9880 - val_loss: 1.0004 - val_acc: 0.8070


               precision    recall  f1-score   support

          0       0.76      0.78      0.77       319
          1       0.67      0.80      0.73       389
          2       0.82      0.63      0.71       394
          3       0.76      0.69      0.72       392
          4       0.65      0.86      0.74       385
          5       0.84      0.75      0.79       395
          6       0.82      0.87      0.84       390
          7       0.86      0.90      0.88       396
          8       0.95      0.91      0.93       398
          9       0.91      0.92      0.92       397
         10       0.98      0.92      0.95       399
         11       0.96      0.85      0.90       396
         12       0.71      0.69      0.70       393
         13       0.95      0.70      0.81       396
         14       0.86      0.91      0.88       394
         15       0.85      0.90      0.87       398
         16       0.79      0.84      0.81       364
         17       0.99      0.77      0.87       376
         18       0.58      0.75      0.65       310
         19       0.52      0.60      0.55       251

avg / total       0.82      0.81      0.81      7532

Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNN) is another neural network architecture that is addressed by the researchers for text mining and classification. RNN assigns more weights to the previous data points of sequence. Therefore, this technique is a powerful method for text, string, and sequential data classification. In RNN, the neural net considers the information of previous nodes in a very sophisticated method which allows for better semantic analysis of the structures in the dataset.

Gated Recurrent Unit (GRU)

Gated Recurrent Unit (GRU) is a gating mechanism for RNN which was introduced by J. Chung et al. and K.Cho et al.. GRU is a simplified variant of the LSTM architecture, but there are differences as follows: GRU contains two gates and does not possess any internal memory (as shown in Figure; and finally, a second non-linearity is not applied (tanh in Figure).

Long Short-Term Memory (LSTM)

Long Short-Term Memory~(LSTM) was introduced by S. Hochreiter and J. Schmidhuber and developed by many research scientists.

To deal with these problems Long Short-Term Memory (LSTM) is a special type of RNN that preserves long term dependency in a more effective way compared to the basic RNNs. This is particularly useful to overcome vanishing gradient problem as LSTM uses multiple gates to carefully regulate the amount of information that will be allowed into each node state. The figure shows the basic cell of a LSTM model.

import packages:

from keras.layers import Dropout, Dense, GRU, Embedding
from keras.models import Sequential
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import metrics
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.datasets import fetch_20newsgroups

convert text to word embedding (Using GloVe):

def loadData_Tokenizer(X_train, X_test,MAX_NB_WORDS=75000,MAX_SEQUENCE_LENGTH=500):
    np.random.seed(7)
    text = np.concatenate((X_train, X_test), axis=0)
    text = np.array(text)
    tokenizer = Tokenizer(num_words=MAX_NB_WORDS)
    tokenizer.fit_on_texts(text)
    sequences = tokenizer.texts_to_sequences(text)
    word_index = tokenizer.word_index
    text = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
    print('Found %s unique tokens.' % len(word_index))
    indices = np.arange(text.shape[0])
    # np.random.shuffle(indices)
    text = text[indices]
    print(text.shape)
    X_train = text[0:len(X_train), ]
    X_test = text[len(X_train):, ]
    embeddings_index = {}
    f = open("C:\\Users\\swayam\\Documents\\Glove\\glove.6B.50d.txt", encoding="utf8")
    for line in f:

        values = line.split()
        word = values[0]
        try:
            coefs = np.asarray(values[1:], dtype='float32')
        except:
            pass
        embeddings_index[word] = coefs
    f.close()
    print('Total %s word vectors.' % len(embeddings_index))
    return (X_train, X_test, word_index,embeddings_index)

Build a RNN Model for Text:

def Build_Model_RNN_Text(word_index, embeddings_index, nclasses,  MAX_SEQUENCE_LENGTH=500, EMBEDDING_DIM=50, dropout=0.5):
    """
    def buildModel_RNN(word_index, embeddings_index, nclasses,  MAX_SEQUENCE_LENGTH=500, EMBEDDING_DIM=50, dropout=0.5):
    word_index in word index ,
    embeddings_index is embeddings index, look at data_helper.py
    nClasses is number of classes,
    MAX_SEQUENCE_LENGTH is maximum lenght of text sequences
    """

    model = Sequential()
    hidden_layer = 3
    gru_node = 32

    embedding_matrix = np.random.random((len(word_index) + 1, EMBEDDING_DIM))
    for word, i in word_index.items():
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            # words not found in embedding index will be all-zeros.
            if len(embedding_matrix[i]) != len(embedding_vector):
                print("could not broadcast input array from shape", str(len(embedding_matrix[i])),
                      "into shape", str(len(embedding_vector)), " Please make sure your"
                                                                " EMBEDDING_DIM is equal to embedding_vector file ,GloVe,")
                exit(1)
            embedding_matrix[i] = embedding_vector
    model.add(Embedding(len(word_index) + 1,
                                EMBEDDING_DIM,
                                weights=[embedding_matrix],
                                input_length=MAX_SEQUENCE_LENGTH,
                                trainable=True))


    print(gru_node)
    for i in range(0,hidden_layer):
        model.add(GRU(gru_node,return_sequences=True, recurrent_dropout=0.2))
        model.add(Dropout(dropout))
    model.add(GRU(gru_node, recurrent_dropout=0.2))
    model.add(Dropout(dropout))
    model.add(Dense(256, activation='relu'))
    model.add(Dense(nclasses, activation='softmax'))


    model.compile(loss='sparse_categorical_crossentropy',
                      optimizer='adam',
                      metrics=['accuracy'])
    return model

run RNN and see result:

newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)


model_RNN = Build_Model_RNN_Text(word_index,embeddings_index, 20)

model_RNN.fit(X_train_Glove, y_train,
                              validation_data=(X_test_Glove, y_test),
                              epochs=10,
                              batch_size=128,
                              verbose=2)

predicted = Build_Model_RNN_Text.predict_classes(X_test_Glove)

print(metrics.classification_report(y_test, predicted))

Model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding_1 (Embedding)      (None, 500, 50)           8960500
_________________________________________________________________
gru_1 (GRU)                  (None, 500, 256)          235776
_________________________________________________________________
dropout_1 (Dropout)          (None, 500, 256)          0
_________________________________________________________________
gru_2 (GRU)                  (None, 500, 256)          393984
_________________________________________________________________
dropout_2 (Dropout)          (None, 500, 256)          0
_________________________________________________________________
gru_3 (GRU)                  (None, 500, 256)          393984
_________________________________________________________________
dropout_3 (Dropout)          (None, 500, 256)          0
_________________________________________________________________
gru_4 (GRU)                  (None, 256)               393984
_________________________________________________________________
dense_1 (Dense)              (None, 20)                5140
=================================================================
Total params: 10,383,368
Trainable params: 10,383,368
Non-trainable params: 0
_________________________________________________________________

Output:

Train on 11314 samples, validate on 7532 samples
Epoch 1/20
 - 268s - loss: 2.5347 - acc: 0.1792 - val_loss: 2.2857 - val_acc: 0.2460
Epoch 2/20
 - 271s - loss: 1.6751 - acc: 0.3999 - val_loss: 1.4972 - val_acc: 0.4660
Epoch 3/20
 - 270s - loss: 1.0945 - acc: 0.6072 - val_loss: 1.3232 - val_acc: 0.5483
Epoch 4/20
 - 269s - loss: 0.7761 - acc: 0.7312 - val_loss: 1.1009 - val_acc: 0.6452
Epoch 5/20
 - 269s - loss: 0.5513 - acc: 0.8112 - val_loss: 1.0395 - val_acc: 0.6832
Epoch 6/20
 - 269s - loss: 0.3765 - acc: 0.8754 - val_loss: 0.9977 - val_acc: 0.7086
Epoch 7/20
 - 270s - loss: 0.2481 - acc: 0.9202 - val_loss: 1.0485 - val_acc: 0.7270
Epoch 8/20
 - 269s - loss: 0.1717 - acc: 0.9463 - val_loss: 1.0269 - val_acc: 0.7394
Epoch 9/20
 - 269s - loss: 0.1130 - acc: 0.9644 - val_loss: 1.1498 - val_acc: 0.7369
Epoch 10/20
 - 269s - loss: 0.0640 - acc: 0.9808 - val_loss: 1.1442 - val_acc: 0.7508
Epoch 11/20
 - 269s - loss: 0.0567 - acc: 0.9828 - val_loss: 1.2318 - val_acc: 0.7414
Epoch 12/20
 - 268s - loss: 0.0472 - acc: 0.9858 - val_loss: 1.2204 - val_acc: 0.7496
Epoch 13/20
 - 269s - loss: 0.0319 - acc: 0.9910 - val_loss: 1.1895 - val_acc: 0.7657
Epoch 14/20
 - 268s - loss: 0.0466 - acc: 0.9853 - val_loss: 1.2821 - val_acc: 0.7517
Epoch 15/20
 - 271s - loss: 0.0269 - acc: 0.9917 - val_loss: 1.2869 - val_acc: 0.7557
Epoch 16/20
 - 271s - loss: 0.0187 - acc: 0.9950 - val_loss: 1.3037 - val_acc: 0.7598
Epoch 17/20
 - 268s - loss: 0.0157 - acc: 0.9959 - val_loss: 1.2974 - val_acc: 0.7638
Epoch 18/20
 - 270s - loss: 0.0121 - acc: 0.9966 - val_loss: 1.3526 - val_acc: 0.7602
Epoch 19/20
 - 269s - loss: 0.0262 - acc: 0.9926 - val_loss: 1.4182 - val_acc: 0.7517
Epoch 20/20
 - 269s - loss: 0.0249 - acc: 0.9918 - val_loss: 1.3453 - val_acc: 0.7638


               precision    recall  f1-score   support

          0       0.71      0.71      0.71       319
          1       0.72      0.68      0.70       389
          2       0.76      0.62      0.69       394
          3       0.67      0.58      0.62       392
          4       0.68      0.67      0.68       385
          5       0.75      0.73      0.74       395
          6       0.82      0.74      0.78       390
          7       0.83      0.83      0.83       396
          8       0.81      0.90      0.86       398
          9       0.92      0.90      0.91       397
         10       0.91      0.94      0.93       399
         11       0.87      0.76      0.81       396
         12       0.57      0.70      0.63       393
         13       0.81      0.85      0.83       396
         14       0.74      0.93      0.82       394
         15       0.82      0.83      0.83       398
         16       0.74      0.78      0.76       364
         17       0.96      0.83      0.89       376
         18       0.64      0.60      0.62       310
         19       0.48      0.56      0.52       251

avg / total       0.77      0.76      0.76      7532

Convolutional Neural Networks (CNN)

Convolutional Neural Networks (CNN) is Another deep learning architecture that is employed for hierarchical document classification. Although originally built for image processing with architecture similar to the visual cortex, CNNs have also been effectively used for text classification. In a basic CNN for image processing, an image tensor is convolved with a set of kernels of size d by d. These convolution layers are called feature maps and can be stacked to provide multiple filters on the input. To reduce the computational complexity, CNNs use pooling which reduces the size of the output from one layer to the next in the network. Different pooling techniques are used to reduce outputs while preserving important features.

The most common pooling method is max pooling where the maximum element is selected from the pooling window. In order to feed the pooled output from stacked featured maps to the next layer, the maps are flattened into one column. The final layers in a CNN are typically fully connected dense layers. In general, during the back-propagation step of a convolutional neural network not only the weights are adjusted but also the feature detector filters. A potential problem of CNN used for text is the number of ‘channels’, Sigma (size of the feature space). This might be very large (e.g. 50K), for text but for images this is less of a problem (e.g. only 3 channels of RGB). This means the dimensionality of the CNN for text is very high.

import packages:

from keras.layers import Dropout, Dense,Input,Embedding,Flatten, MaxPooling1D, Conv1D
from keras.models import Sequential,Model
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn import metrics
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from sklearn.datasets import fetch_20newsgroups
from keras.layers.merge import Concatenate

convert text to word embedding (Using GloVe):

def loadData_Tokenizer(X_train, X_test,MAX_NB_WORDS=75000,MAX_SEQUENCE_LENGTH=500):
    np.random.seed(7)
    text = np.concatenate((X_train, X_test), axis=0)
    text = np.array(text)
    tokenizer = Tokenizer(num_words=MAX_NB_WORDS)
    tokenizer.fit_on_texts(text)
    sequences = tokenizer.texts_to_sequences(text)
    word_index = tokenizer.word_index
    text = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
    print('Found %s unique tokens.' % len(word_index))
    indices = np.arange(text.shape[0])
    # np.random.shuffle(indices)
    text = text[indices]
    print(text.shape)
    X_train = text[0:len(X_train), ]
    X_test = text[len(X_train):, ]
    embeddings_index = {}
    f = open("C:\\Users\\swayam\\Documents\\Glove\\glove.6B.50d.txt", encoding="utf8")
    for line in f:
        values = line.split()
        word = values[0]
        try:
            coefs = np.asarray(values[1:], dtype='float32')
        except:
            pass
        embeddings_index[word] = coefs
    f.close()
    print('Total %s word vectors.' % len(embeddings_index))
    return (X_train, X_test, word_index,embeddings_index)

Build a CNN Model for Text:

def Build_Model_CNN_Text(word_index, embeddings_index, nclasses, MAX_SEQUENCE_LENGTH=500, EMBEDDING_DIM=50, dropout=0.5):

    """
        def buildModel_CNN(word_index, embeddings_index, nclasses, MAX_SEQUENCE_LENGTH=500, EMBEDDING_DIM=50, dropout=0.5):
        word_index in word index ,
        embeddings_index is embeddings index, look at data_helper.py
        nClasses is number of classes,
        MAX_SEQUENCE_LENGTH is maximum lenght of text sequences,
        EMBEDDING_DIM is an int value for dimention of word embedding look at data_helper.py
    """

    model = Sequential()
    embedding_matrix = np.random.random((len(word_index) + 1, EMBEDDING_DIM))
    for word, i in word_index.items():
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            # words not found in embedding index will be all-zeros.
            if len(embedding_matrix[i]) !=len(embedding_vector):
                print("could not broadcast input array from shape",str(len(embedding_matrix[i])),
                                 "into shape",str(len(embedding_vector))," Please make sure your"
                                 " EMBEDDING_DIM is equal to embedding_vector file ,GloVe,")
                exit(1)

            embedding_matrix[i] = embedding_vector

    embedding_layer = Embedding(len(word_index) + 1,
                                EMBEDDING_DIM,
                                weights=[embedding_matrix],
                                input_length=MAX_SEQUENCE_LENGTH,
                                trainable=True)

    # applying a more complex convolutional approach
    convs = []
    filter_sizes = []
    layer = 5
    print("Filter  ",layer)
    for fl in range(0,layer):
        filter_sizes.append((fl+2))

    node = 128
    sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='int32')
    embedded_sequences = embedding_layer(sequence_input)

    for fsz in filter_sizes:
        l_conv = Conv1D(node, kernel_size=fsz, activation='relu')(embedded_sequences)
        l_pool = MaxPooling1D(5)(l_conv)
        #l_pool = Dropout(0.25)(l_pool)
        convs.append(l_pool)

    l_merge = Concatenate(axis=1)(convs)
    l_cov1 = Conv1D(node, 5, activation='relu')(l_merge)
    l_cov1 = Dropout(dropout)(l_cov1)
    l_pool1 = MaxPooling1D(5)(l_cov1)
    l_cov2 = Conv1D(node, 5, activation='relu')(l_pool1)
    l_cov2 = Dropout(dropout)(l_cov2)
    l_pool2 = MaxPooling1D(30)(l_cov2)
    l_flat = Flatten()(l_pool2)
    l_dense = Dense(1024, activation='relu')(l_flat)
    l_dense = Dropout(dropout)(l_dense)
    l_dense = Dense(512, activation='relu')(l_dense)
    l_dense = Dropout(dropout)(l_dense)
    preds = Dense(nclasses, activation='softmax')(l_dense)
    model = Model(sequence_input, preds)

    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

    return model

run CNN and see result:

newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)


model_CNN = Build_Model_CNN_Text(word_index,embeddings_index, 20)


model_CNN.summary()

model_CNN.fit(X_train_Glove, y_train,
                              validation_data=(X_test_Glove, y_test),
                              epochs=15,
                              batch_size=128,
                              verbose=2)

predicted = model_CNN.predict(X_test_Glove)

predicted = np.argmax(predicted, axis=1)


print(metrics.classification_report(y_test, predicted))

Model:

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 500)          0
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, 500, 50)      8960500     input_1[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 499, 128)     12928       embedding_1[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 498, 128)     19328       embedding_1[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 497, 128)     25728       embedding_1[0][0]
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 496, 128)     32128       embedding_1[0][0]
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 495, 128)     38528       embedding_1[0][0]
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)  (None, 99, 128)      0           conv1d_1[0][0]
__________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)  (None, 99, 128)      0           conv1d_2[0][0]
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)  (None, 99, 128)      0           conv1d_3[0][0]
__________________________________________________________________________________________________
max_pooling1d_4 (MaxPooling1D)  (None, 99, 128)      0           conv1d_4[0][0]
__________________________________________________________________________________________________
max_pooling1d_5 (MaxPooling1D)  (None, 99, 128)      0           conv1d_5[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 495, 128)     0           max_pooling1d_1[0][0]
                                                                 max_pooling1d_2[0][0]
                                                                 max_pooling1d_3[0][0]
                                                                 max_pooling1d_4[0][0]
                                                                 max_pooling1d_5[0][0]
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, 491, 128)     82048       concatenate_1[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 491, 128)     0           conv1d_6[0][0]
__________________________________________________________________________________________________
max_pooling1d_6 (MaxPooling1D)  (None, 98, 128)      0           dropout_1[0][0]
__________________________________________________________________________________________________
conv1d_7 (Conv1D)               (None, 94, 128)      82048       max_pooling1d_6[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 94, 128)      0           conv1d_7[0][0]
__________________________________________________________________________________________________
max_pooling1d_7 (MaxPooling1D)  (None, 3, 128)       0           dropout_2[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 384)          0           max_pooling1d_7[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 1024)         394240      flatten_1[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, 1024)         0           dense_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 512)          524800      dropout_3[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 512)          0           dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 20)           10260       dropout_4[0][0]
==================================================================================================
Total params: 10,182,536
Trainable params: 10,182,536
Non-trainable params: 0
__________________________________________________________________________________________________

Output:

Train on 11314 samples, validate on 7532 samples
Epoch 1/15
 - 6s - loss: 2.9329 - acc: 0.0783 - val_loss: 2.7628 - val_acc: 0.1403
Epoch 2/15
 - 4s - loss: 2.2534 - acc: 0.2249 - val_loss: 2.1715 - val_acc: 0.4007
Epoch 3/15
 - 4s - loss: 1.5643 - acc: 0.4326 - val_loss: 1.7846 - val_acc: 0.5052
Epoch 4/15
 - 4s - loss: 1.1771 - acc: 0.5662 - val_loss: 1.4949 - val_acc: 0.6131
Epoch 5/15
 - 4s - loss: 0.8880 - acc: 0.6797 - val_loss: 1.3629 - val_acc: 0.6256
Epoch 6/15
 - 4s - loss: 0.6990 - acc: 0.7569 - val_loss: 1.2013 - val_acc: 0.6624
Epoch 7/15
 - 4s - loss: 0.5037 - acc: 0.8200 - val_loss: 1.0674 - val_acc: 0.6807
Epoch 8/15
 - 4s - loss: 0.4050 - acc: 0.8626 - val_loss: 1.0223 - val_acc: 0.6863
Epoch 9/15
 - 4s - loss: 0.2952 - acc: 0.8968 - val_loss: 0.9045 - val_acc: 0.7120
Epoch 10/15
 - 4s - loss: 0.2314 - acc: 0.9217 - val_loss: 0.8574 - val_acc: 0.7326
Epoch 11/15
 - 4s - loss: 0.1778 - acc: 0.9436 - val_loss: 0.8752 - val_acc: 0.7270
Epoch 12/15
 - 4s - loss: 0.1475 - acc: 0.9524 - val_loss: 0.8299 - val_acc: 0.7355
Epoch 13/15
 - 4s - loss: 0.1089 - acc: 0.9657 - val_loss: 0.8034 - val_acc: 0.7491
Epoch 14/15
 - 4s - loss: 0.1047 - acc: 0.9666 - val_loss: 0.8172 - val_acc: 0.7463
Epoch 15/15
 - 4s - loss: 0.0749 - acc: 0.9774 - val_loss: 0.8511 - val_acc: 0.7313


               precision    recall  f1-score   support

          0       0.75      0.61      0.67       319
          1       0.63      0.74      0.68       389
          2       0.74      0.54      0.62       394
          3       0.49      0.76      0.60       392
          4       0.60      0.70      0.64       385
          5       0.79      0.57      0.66       395
          6       0.73      0.76      0.74       390
          7       0.83      0.74      0.78       396
          8       0.86      0.88      0.87       398
          9       0.95      0.78      0.86       397
         10       0.93      0.93      0.93       399
         11       0.92      0.77      0.84       396
         12       0.55      0.72      0.62       393
         13       0.76      0.85      0.80       396
         14       0.86      0.83      0.84       394
         15       0.91      0.73      0.81       398
         16       0.75      0.65      0.70       364
         17       0.95      0.86      0.90       376
         18       0.60      0.49      0.54       310
         19       0.37      0.60      0.46       251

avg / total       0.76      0.73      0.74      7532

Hierarchical Attention Networks

Recurrent Convolutional Neural Networks (RCNN)

Recurrent Convolutional Neural Networks (RCNN) is also used for text classification. The main idea of this technique is capturing contextual information with the recurrent structure and constructing the representation of text using a convolutional neural network. This architecture is a combination of RNN and CNN to use the advantages of both technique in a model.

import packages:

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import GRU
from keras.layers import Conv1D, MaxPooling1D
from keras.datasets import imdb
from sklearn.datasets import fetch_20newsgroups
import numpy as np
from sklearn import metrics
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

Convert text to word embedding (Using GloVe):

def loadData_Tokenizer(X_train, X_test,MAX_NB_WORDS=75000,MAX_SEQUENCE_LENGTH=500):
    np.random.seed(7)
    text = np.concatenate((X_train, X_test), axis=0)
    text = np.array(text)
    tokenizer = Tokenizer(num_words=MAX_NB_WORDS)
    tokenizer.fit_on_texts(text)
    sequences = tokenizer.texts_to_sequences(text)
    word_index = tokenizer.word_index
    text = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
    print('Found %s unique tokens.' % len(word_index))
    indices = np.arange(text.shape[0])
    # np.random.shuffle(indices)
    text = text[indices]
    print(text.shape)
    X_train = text[0:len(X_train), ]
    X_test = text[len(X_train):, ]
    embeddings_index = {}
    f = open("C:\\Users\\swayam\\Documents\\Glove\\glove.6B.50d.txt", encoding="utf8")
    for line in f:
        values = line.split()
        word = values[0]
        try:
            coefs = np.asarray(values[1:], dtype='float32')
        except:
            pass
        embeddings_index[word] = coefs
    f.close()
    print('Total %s word vectors.' % len(embeddings_index))
    return (X_train, X_test, word_index,embeddings_index)

def Build_Model_RCNN_Text(word_index, embeddings_index, nclasses, MAX_SEQUENCE_LENGTH=500, EMBEDDING_DIM=50):

    kernel_size = 2
    filters = 256
    pool_size = 2
    gru_node = 256

    embedding_matrix = np.random.random((len(word_index) + 1, EMBEDDING_DIM))
    for word, i in word_index.items():
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            # words not found in embedding index will be all-zeros.
            if len(embedding_matrix[i]) !=len(embedding_vector):
                print("could not broadcast input array from shape",str(len(embedding_matrix[i])),
                                 "into shape",str(len(embedding_vector))," Please make sure your"
                                 " EMBEDDING_DIM is equal to embedding_vector file ,GloVe,")
                exit(1)

            embedding_matrix[i] = embedding_vector

    model = Sequential()
    model.add(Embedding(len(word_index) + 1,
                                EMBEDDING_DIM,
                                weights=[embedding_matrix],
                                input_length=MAX_SEQUENCE_LENGTH,
                                trainable=True))
    model.add(Dropout(0.25))
    model.add(Conv1D(filters, kernel_size, activation='relu'))
    model.add(MaxPooling1D(pool_size=pool_size))
    model.add(Conv1D(filters, kernel_size, activation='relu'))
    model.add(MaxPooling1D(pool_size=pool_size))
    model.add(Conv1D(filters, kernel_size, activation='relu'))
    model.add(MaxPooling1D(pool_size=pool_size))
    model.add(Conv1D(filters, kernel_size, activation='relu'))
    model.add(MaxPooling1D(pool_size=pool_size))
    model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
    model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
    model.add(LSTM(gru_node, return_sequences=True, recurrent_dropout=0.2))
    model.add(LSTM(gru_node, recurrent_dropout=0.2))
    model.add(Dense(1024,activation='relu'))
    model.add(Dense(nclasses))
    model.add(Activation('softmax'))

    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

    return model

newsgroups_train = fetch_20newsgroups(subset='train')
newsgroups_test = fetch_20newsgroups(subset='test')
X_train = newsgroups_train.data
X_test = newsgroups_test.data
y_train = newsgroups_train.target
y_test = newsgroups_test.target

X_train_Glove,X_test_Glove, word_index,embeddings_index = loadData_Tokenizer(X_train,X_test)

Run RCNN :

model_RCNN = Build_Model_CNN_Text(word_index,embeddings_index, 20)


model_RCNN.summary()

model_RCNN.fit(X_train_Glove, y_train,
                              validation_data=(X_test_Glove, y_test),
                              epochs=15,
                              batch_size=128,
                              verbose=2)

predicted = model_RCNN.predict(X_test_Glove)

predicted = np.argmax(predicted, axis=1)
print(metrics.classification_report(y_test, predicted))

summary of the model:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
embedding_1 (Embedding)      (None, 500, 50)           8960500
_________________________________________________________________
dropout_1 (Dropout)          (None, 500, 50)           0
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 499, 256)          25856
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 249, 256)          0
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 248, 256)          131328
_________________________________________________________________
max_pooling1d_2 (MaxPooling1 (None, 124, 256)          0
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 123, 256)          131328
_________________________________________________________________
max_pooling1d_3 (MaxPooling1 (None, 61, 256)           0
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 60, 256)           131328
_________________________________________________________________
max_pooling1d_4 (MaxPooling1 (None, 30, 256)           0
_________________________________________________________________
lstm_1 (LSTM)                (None, 30, 256)           525312
_________________________________________________________________
lstm_2 (LSTM)                (None, 30, 256)           525312
_________________________________________________________________
lstm_3 (LSTM)                (None, 30, 256)           525312
_________________________________________________________________
lstm_4 (LSTM)                (None, 256)               525312
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              263168
_________________________________________________________________
dense_2 (Dense)              (None, 20)                20500
_________________________________________________________________
activation_1 (Activation)    (None, 20)                0
=================================================================
Total params: 11,765,256
Trainable params: 11,765,256
Non-trainable params: 0
_________________________________________________________________

Output:

Train on 11314 samples, validate on 7532 samples
Epoch 1/15
 - 28s - loss: 2.6624 - acc: 0.1081 - val_loss: 2.3012 - val_acc: 0.1753
Epoch 2/15
 - 22s - loss: 2.1142 - acc: 0.2224 - val_loss: 1.9168 - val_acc: 0.2669
Epoch 3/15
 - 22s - loss: 1.7465 - acc: 0.3290 - val_loss: 1.8257 - val_acc: 0.3412
Epoch 4/15
 - 22s - loss: 1.4730 - acc: 0.4356 - val_loss: 1.5433 - val_acc: 0.4436
Epoch 5/15
 - 22s - loss: 1.1800 - acc: 0.5556 - val_loss: 1.2973 - val_acc: 0.5467
Epoch 6/15
 - 22s - loss: 0.9910 - acc: 0.6281 - val_loss: 1.2530 - val_acc: 0.5797
Epoch 7/15
 - 22s - loss: 0.8581 - acc: 0.6854 - val_loss: 1.1522 - val_acc: 0.6281
Epoch 8/15
 - 22s - loss: 0.7058 - acc: 0.7428 - val_loss: 1.2385 - val_acc: 0.6033
Epoch 9/15
 - 22s - loss: 0.6792 - acc: 0.7515 - val_loss: 1.0200 - val_acc: 0.6775
Epoch 10/15
 - 22s - loss: 0.5782 - acc: 0.7948 - val_loss: 1.0961 - val_acc: 0.6577
Epoch 11/15
 - 23s - loss: 0.4674 - acc: 0.8341 - val_loss: 1.0866 - val_acc: 0.6924
Epoch 12/15
 - 23s - loss: 0.4284 - acc: 0.8512 - val_loss: 0.9880 - val_acc: 0.7096
Epoch 13/15
 - 22s - loss: 0.3883 - acc: 0.8670 - val_loss: 1.0190 - val_acc: 0.7151
Epoch 14/15
 - 22s - loss: 0.3334 - acc: 0.8874 - val_loss: 1.0025 - val_acc: 0.7232
Epoch 15/15
 - 22s - loss: 0.2857 - acc: 0.9038 - val_loss: 1.0123 - val_acc: 0.7331


             precision    recall  f1-score   support

          0       0.64      0.73      0.68       319
          1       0.45      0.83      0.58       389
          2       0.81      0.64      0.71       394
          3       0.64      0.57      0.61       392
          4       0.55      0.78      0.64       385
          5       0.77      0.52      0.62       395
          6       0.84      0.77      0.80       390
          7       0.87      0.79      0.83       396
          8       0.85      0.90      0.87       398
          9       0.98      0.84      0.90       397
         10       0.93      0.96      0.95       399
         11       0.92      0.79      0.85       396
         12       0.59      0.53      0.56       393
         13       0.82      0.82      0.82       396
         14       0.84      0.84      0.84       394
         15       0.83      0.89      0.86       398
         16       0.68      0.86      0.76       364
         17       0.97      0.86      0.91       376
         18       0.66      0.50      0.57       310
         19       0.53      0.31      0.40       251

avg / total       0.77      0.75      0.75      7532

Random Multimodel Deep Learning (RMDL)

A new ensemble, deep learning approach for classification. Deep learning models have achieved state-of-the-art results across many domains. RMDL solves the problem of finding the best deep learning structure and architecture while simultaneously improving robustness and accuracy through ensembles of different deep learning architectures. RDMLs can accept a variety of data as input including text, video, images, and symbols.

Random Multimodel Deep Learning (RDML) architecture for classification. RMDL includes 3 Random models, oneDNN classifier at left, one Deep CNN classifier at middle, and one Deep RNN classifier at right (each unit could be LSTMor GRU).

Installation

There are pip and git for RMDL installation:

Using pip

pip install RMDL

Using git

git clone --recursive [https://github.com/kk7nc/RMDL.git](https://github.com/kk7nc/RMDL.git)

The primary requirements for this package are Python 3 with Tensorflow. The requirements.txt file contains a listing of the required Python packages; to install all requirements, run the following:

pip -r install requirements.txt

pip3  install -r requirements.txt

Or:

conda install --file requirements.txt

Hierarchical Deep Learning for Text (HDLTex)

Increasingly large document collections require improved information processing methods for searching, retrieving, and organizing text documents. Central to these information processing methods is document classification, which has become an important task supervised learning aims to solve. Recently, the performance of traditional supervised classifiers has degraded as the number of documents has increased. This exponential growth of document volume has also increased the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead, we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide a hierarchical understanding of the documents.

Conclusion

This should give you a good idea of how you can leverage deep learning for text classification task! The list of techniques here are not exhaustive but definitely cover some of the most popular and widely used methods to train neural network models for the text classification task. I recommend you to try these out with your own data.

你可能感兴趣的:(Text,Mininng)

android系统selinux中添加新属性property 辉色投像
1.定位/android/system/sepolicy/private/property_contexts声明属性开头：persist.charge声明属性类型：u:object_r:system_prop:s0图12.定位到android/system/sepolicy/public/domain.te删除neverallow{domain-init}default_prop:property
swagger访问路径 igotyback swagger
Swagger2.x版本访问地址：http://{ip}:{port}/{context-path}/swagger-ui.html{ip}是你的服务器IP地址。{port}是你的应用服务端口，通常为8080。{context-path}是你的应用上下文路径，如果应用部署在根路径下，则为空。Swagger3.x版本对于Swagger3.x版本（也称为OpenAPI3）访问地址：http://{ip
C#中使用split分割字符串互联网打工人no1 c#
1、用字符串分隔：usingSystem.Text.RegularExpressions;stringstr="aaajsbbbjsccc";string[]sArray=Regex.Split(str,"js",RegexOptions.IgnoreCase);foreach(stringiinsArray)Response.Write(i.ToString()+"");输出结果：aaabbbc
LLM 词汇表落难Coder LLMs NLP 大语言模型大模型 llama 人工智能
Contextwindow“上下文窗口”是指语言模型在生成新文本时能够回溯和参考的文本量。这不同于语言模型训练时所使用的大量数据集，而是代表了模型的“工作记忆”。较大的上下文窗口可以让模型理解和响应更复杂和更长的提示，而较小的上下文窗口可能会限制模型处理较长提示或在长时间对话中保持连贯性的能力。Fine-tuning微调是使用额外的数据进一步训练预训练语言模型的过程。这使得模型开始表示和模仿微调数
Xinference如何注册自定义模型玩人工智能的辣条哥人工智能 AI 大模型 Xinference
环境：Xinference问题描述：Xinference如何注册自定义模型解决方案：1.写个model_config.json，内容如下{"version":1,"context_length":2048,"model_name":"custom-llama-3","model_lang":["en","ch"],"model_ability":["generate","chat"],"model
关于Mysql 中 Row size too large (＞ 8126) 错误的解决和理解秋刀prince mysql mysql 数据库
提示：啰嗦一嘴，数据库的任何操作和验证前，一定要记得先备份！！！不会有错；文章目录问题发现一、问题导致的可能原因1、页大小2、行格式2.1compact格式2.2Redundant格式2.3Dynamic格式2.4Compressed格式3、BLOB和TEXT列二、解决办法1、修改页大小（不推荐）2、修改行格式3、修改数据类型为BLOB和TEXT列4、其他优化方式（可以参考使用）4.1合理设置数据
TextFiled 中输入金额宁梓茞
要求:输入的金额不能超过六位,小数点后面只能输入两位小数如果textFIled中第一位输入的是0,后面必须输入小数点,否则禁止输入用到textfiled代理方法#pragmamark----textFiledDelegate-----(BOOL)textField:(UITextField*)textFieldshouldChangeCharactersInRange:(NSRange)range
JAVA·一个简单的登录窗口 MortalTom java 开发语言学习
文章目录概要整体架构流程技术名词解释技术细节资源概要JavaSwing是Java基础类库的一部分，主要用于开发图形用户界面（GUI）程序整体架构流程新建项目，导入sql.jar包（链接放在了文末），编译项目并运行技术名词解释一、特点丰富的组件提供了多种可视化组件，如按钮（JButton）、文本框（JTextField）、标签（JLabel）、下拉列表（JComboBox）等，可以满足不同的界面设计
2019-07-09 AutoCompleteTextView 问题皮皮铭
实现自定义Adapter要实现Filterable接口，不然会报错重写getFilter()方法performFiltering()方法实现过滤数据的操作publishResults()用来接收performFiltering()的返回值，发布。
HarmonyOS开发实战（ Beta5.0）搜索框热搜词自动切换让开，我要吃人了 OpenHarmony HarmonyOS 鸿蒙开发 harmonyos 华为鸿蒙移动开发鸿蒙系统前端开发语言
鸿蒙HarmonyOS开发往期必看：HarmonyOSNEXT应用开发性能实践总结最新版！“非常详细的”鸿蒙HarmonyOSNext应用开发学习路线！（从零基础入门到精通）介绍本示例介绍使用TextInput组件与Swiper组件实现搜索框内热搜词自动切换。效果图预览使用说明页面顶部搜索框内热搜词条自动切换，编辑搜索框时自动隐藏。实现思路使用TextInput实现搜索框TextInput({te
uniapp使用内置地图选择插件，实现地址选择并在地图上标点神夜大侠 Uniapp vue.js uniapp
uniapp使用内置地图选择插件，实现地址选择并在地图上标点代码如下：page{background:#F4F5F6;}::-webkit-scrollbar{width:0;height:0;color:transparent;}page{height:100%;width:100%;font-size:24rpx;}image,view,input,textarea,label,text,na
HarmonyOS Next鸿蒙扫一扫功能实现 JohnLiu_ HarmonyOS Next harmonyos 华为扫一扫鸿蒙
直接使用的是华为官方提供的api，封装成一个工具类方便调用。import{common}from'@kit.AbilityKit';import{scanBarcode,scanCore}from'@kit.ScanKit';exportnamespaceScanUtil{exportasyncfunctionstartScan(context:common.Context):Promise{if
golang 实现文件上传下载 wangwei830 go
Gin框架上传下载上传（支持批量上传）httpRouter.POST("/upload",func(ctx*gin.Context){forms,err:=ctx.MultipartForm()iferr!=nil{fmt.Println("error",err)}files:=forms.File["fileName"]for_,v:=rangefiles{iferr:=ctx.SaveUplo
linux 安装Sublime Text 3 hhyiyuanyu Python学习 linux sublime text
方法/步骤打开官网http://www.sublimetext.com/3，选择64位进行下载执行命令wgethttps://download.sublimetext.com/sublime_text_3_build_3126_x64.tar.bz2进行下载3、下载完成进行解压,执行tar-xvvfsublime_text_3_build_3126_x64.tar.bz解压4、解压完成以后，移动到
sublime个人设置 bawangtianzun sublime text 编辑器
如何拥有jiangly蒋老师同款编译器(sublimec++配置竞赛向）_哔哩哔哩_bilibiliSublimeText4的安装教程（新手竞赛向）-知乎(zhihu.com)创建文件自动保存为c++打开SublimeText软件。转到"Tools"（工具）>"Developer"（开发者）>"NewPlugin"（新建插件）。在打开的新文件中，粘贴以下代码：importsublimeimport
html+css网页设计旅游网站首页1个页面 html+css+js网页设计 html css 旅游
html+css网页设计旅游网站首页1个页面网页作品代码简单，可使用任意HTML辑软件（如：Dreamweaver、HBuilder、Vscode、Sublime、Webstorm、Text、Notepad++等任意html编辑软件进行运行及修改编辑等操作）。获取源码1，访问该网站https://download.csdn.net/download/qq_42431718/897527112，点击
spring mvc @RequestBody String类型参数 zoyation spring-mvc spring mvc
通过如下配置：text/html;charset=UTF-8application/json;charset=UTF-8在springmvc的Controller层使用@RequestBody接收Content-Type为application/json的数据时，默认支持Map方式和对象方式参数@RequestMapping(value="/{code}/saveUser",method=Requ
处理标签包裹的字符串，并取出前250字符周bro 前端 javascript 开发语言
//假设这是你的HTML字符串varhtmlString=`这是一个段落。这是一个标题这是另一个段落，包含一些链接。`;//解析HTML字符串并提取文本functionextractTextFromHTML(html){varparser=newDOMParser();vardoc=parser.parseFromString(html,"text/html");vartextContent=do
css设置当字数超过限制后以省略号（...）显示周bro css 前端 vue css3 html 经验分享
1、文字超出一行，省略超出部分，显示’…’用text-overflow:ellipsis属性来，当然还需要加宽度width属来兼容部分浏览。overflow:hidden;text-overflow:ellipsis;white-space:nowrap;2、多行文本溢出显示省略号display:-webkit-box;-webkit-box-orient:vertical;-webkit-lin
360前端星计划-动画可以这么玩马小蜗
动画的基本原理定时器改变对象的属性根据新的属性重新渲染动画functionupdate(context){//更新属性}constticker=newTicker();ticker.tick(update,context);动画的种类1、JavaScript动画操作DOMCanvas2、CSS动画transitionanimation3、SVG动画SMILJS动画的优缺点优点：灵活度、可控性、性能
Kubernetes部署MySQL数据持久化沫殇-MS Kubernetes MySQL数据库 kubernetes mysql 容器
一、安装配置NFS服务端1、安装nfs-kernel-server：sudoapt-yinstallnfs-kernel-server2、服务端创建共享目录#列出所有可用块设备的信息lsblk#格式化磁盘sudomkfs-text4/dev/sdb#创建一个目录：sudomkdir-p/data/nfs/mysql#更改目录权限：sudochown-Rnobody:nogroup/data/nfs
JavaScript中秋快乐！ Q_w7742 javascript 开发语言 ecmascript
我们来实现一个简单的祝福网页~主要的难度在于使用canvas绘图当点击canvas时候，跳出“中秋节快乐”字样，需要注册鼠标单击事件和计时器。首先定义主要函数：初始化当点击canvas之后转到onCanvasClick函数，绘图生成灯笼。functiononCanvasClick(){//事件处理函数context.clearRect(0,0,canvas1.width,canvas1.heigh
k8s证书过期问题处理 olina_qin kubernetes 容器云原生
k8s证书过期问题处理opensslx509-in/etc/kubernetes/pki/apiserver.crt-noout-dateskubeadmcertsrenewallsystemctlrestartkubeleopensslx509-in/etc/kubernetes/pki/apiserver.crt-noout-text|grep"NotAfter"cp/etc/kubernet
FlagEmbedding 吉小雨 python库 python
FlagEmbedding教程FlagEmbedding是一个用于生成文本嵌入（textembeddings）的库，适合处理自然语言处理（NLP）中的各种任务。嵌入（embeddings）是将文本表示为连续向量，能够捕捉语义上的相似性，常用于文本分类、聚类、信息检索等场景。官方文档链接：FlagEmbedding官方GitHub一、FlagEmbedding库概述1.1什么是FlagEmbeddi
golang学习笔记--MPG模型 xxzed golang #学习笔记学习笔记 golang
MPG模式：M（Machine）：操作系统的主线程P（Processor）：协程执行需要的资源（上下文context），可以看作一个局部的调度器，使go代码在一个线程上跑，他是实现从N：1到N：M映射的关键G（Goroutine）：协程，有自己的栈。包含指令指针（instructionpointer）和其它信息（正在等待的channel等等），用于调度。一个P下面可以有多个G1、当前程序有三个M,
matlab设置图像窗口大小,matlab 图形窗口大小的设置 weixin_39534002 matlab设置图像窗口大小
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%常用选项和小技巧%%%%%%画等值线[cchh]=contour(peaks(30),'LINESPEC','b-')clabel(cc,hh,'manual')%写文本text(5,10,'\bfmath\slmath\itmath\rmmath\alpha','color',[0.10.10.9],'fonts
由于直接在一个回答中提供完整且多语言的游戏商城代码是不现实的（因为每种语言都有其独特的语法和库），我将为你概述一个游戏商城的核心概念，并提供几种不同编程语言的基本框架或示例代码段。 uthRaman 游戏 python 开发语言
商城系统概述hailiangwang.com游戏商城系统通常包含以下部分：用户系统（登录、注册、用户信息）商品列表（游戏、DLC、虚拟货币等）购物车系统支付系统订单系统2.示例框架（伪代码）首先，我们给出一个伪代码框架，描述商城的核心逻辑。plaintextclassUser:deflogin(username,password):#验证用户登录passdefregister(username,p
初识HTTP（1） S1mple_easy 计算机网络学习笔记 http
HTTP基本概念HTTP是超⽂本传输协议，也就是HyperTextTransferProtocol超文本传输协议：HTTP是一个在计算机世界⾥专⻔在两点之间传输⽂字、图⽚、⾳频、视频等超⽂本数据的约定和规范。HTTP常见的状态码具体含义常见状态码1xx提示信息，表示目前是协议处理状态，还需后续操作2xx成功，报文已经收到并被正确处理200/204/2063xx重定向，资源位置发生变动，需要客户端重
python的request请求401_Python模拟HTTPS请求返回HTTP 401 unauthorized错误 weixin_39599372
Python模拟HTTPS请求返回HTTP401unauthorized错误开始是使用的httplib模块，代码如下：header={"Content-type":"application/json","Accept":"*/*"}params={‘source‘:‘en‘,‘target‘:‘es‘,‘text‘:match.group(1)}data=urllib.urlencode(para
php状态监控源码,PHP服务器状态监控实现程序江子星 php状态监控源码
*/header('Content-type:text/html;charset=utf-8');include'./smtp/class.smtp.php';include'./smtp/class.phpmailer.php';functionsendmail($subject='',$body=''){date_default_timezone_set('Asia/Shanghai');//
JVM StackMapTable 属性的作用及理解 lijingyao8206 jvm 字节码 Class文件 StackMapTable
在Java 6版本之后JVM引入了栈图(Stack Map Table)概念。为了提高验证过程的效率，在字节码规范中添加了Stack Map Table属性，以下简称栈图，其方法的code属性中存储了局部变量和操作数的类型验证以及字节码的偏移量。也就是一个method需要且仅对应一个Stack Map Table。在Java 7版
回调函数调用方法百合不是茶 java
最近在看大神写的代码时,.发现其中使用了很多的回调 ,以前只是在学习的时候经常用到 ,现在写个笔记记录一下代码很简单: MainDemo :调用方法得到方法的返回结果
[时间机器]制造时间机器需要一些材料 comsci 制造
根据我的计算和推测,要完全实现制造一台时间机器,需要某些我们这个世界不存在的物质和材料... 甚至可以这样说,这种材料和物质,我们在反应堆中也无法获得......
开口埋怨不如闭口做事邓集海邓集海做人做事工作
“开口埋怨，不如闭口做事。”不是名人名言，而是一个普通父亲对儿子的训导。但是，因为这句训导，这位普通父亲却造就了一个名人儿子。这位普通父亲造就的名人儿子，叫张明正。　　　　张明正出身贫寒，读书时成绩差，常挨老师批评。高中毕业，张明正连普通大学的分数线都没上。高考成绩出来后，平时开口怨这怨那的张明正，不从自身找原因，而是不停地埋怨自己家庭条件不好、埋怨父母没有给他创造良好的学习环境。　　　　
jQuery插件开发全解析，类级别与对象级别开发 IT独行者 jquery 开发插件　函数
jQuery插件的开发包括两种：一种是类级别的插件开发，即给 jQuery添加新的全局函数，相当于给 jQuery类本身添加方法。 jQuery的全局函数就是属于 jQuery命名空间的函数，另一种是对象级别的插件开发，即给 jQuery对象添加方法。下面就两种函数的开发做详细的说明。 1 、类级别的插件开发类级别的插件开发最直接的理解就是给jQuer
Rome解析Rss 413277409 Rome解析Rss
import java.net.URL; import java.util.List; import org.junit.Test; import com.sun.syndication.feed.synd.SyndCategory; import com.sun.syndication.feed.synd.S
RSA加密解密无量加密解密 rsa
RSA加密解密代码代码有待整理 package com.tongbanjie.commons.util; import java.security.Key; import java.security.KeyFactory; import java.security.KeyPair; import java.security.KeyPairGenerat
linux 软件安装遇到的问题 aichenglong linux 遇到的问题 ftp
1 ftp配置中遇到的问题 500 OOPS: cannot change directory 出现该问题的原因:是SELinux安装机制的问题.只要disable SELinux就可以了修改方法:1 修改/etc/selinux/config 中SELINUX=disabled 2 source /etc
面试心得 alafqq 面试
最近面试了好几家公司。记录下；支付宝，面试我的人胖胖的，看着人挺好的；博彦外包的职位，面试失败；阿里金融，面试官人也挺和善，只不过我让他吐血了。。。由于印象比较深，记录下； 1，自我介绍 2，说下八种基本类型；（算上string。楼主才答了3种，哈哈，string其实不是基本类型，是引用类型） 3，什么是包装类，包装类的优点； 4，平时看过什么书？NND，什么书都没看过。。照样
java的多态性探讨百合不是茶 java
java的多态性是指main方法在调用属性的时候类可以对这一属性做出反应的情况 //package 1; class A{ public void test(){ System.out.println("A"); } } class D extends A{ public void test(){ S
网络编程基础篇之JavaScript-学习笔记 bijian1013 JavaScript
1.documentWrite <html> <head> <script language="JavaScript"> document.write("这是电脑网络学校"); document.close(); </script> </h
探索JUnit4扩展：深入Rule bijian1013 JUnit Rule 单元测试
本文将进一步探究Rule的应用，展示如何使用Rule来替代@BeforeClass，@AfterClass，@Before和@After的功能。在上一篇中提到，可以使用Rule替代现有的大部分Runner扩展，而且也不提倡对Runner中的withBefores()，withAfte
[CSS]CSS浮动十五条规则 bit1129 css
这些浮动规则，主要是参考CSS权威指南关于浮动规则的总结，然后添加一些简单的例子以验证和理解这些规则。 1. 所有的页面元素都可以浮动 2. 一个元素浮动后，会成为块级元素，比如<span>,a, strong等都会变成块级元素 3.一个元素左浮动，会向最近的块级父元素的左上角移动，直到浮动元素的左外边界碰到块级父元素的左内边界；如果这个块级父元素已经有浮动元素停靠了
【Kafka六】Kafka Producer和Consumer多Broker、多Partition场景 bit1129 partition
0.Kafka服务器配置 3个broker 1个topic，6个partition，副本因子是2 2个consumer，每个consumer三个线程并发读取 1. Producer package kafka.examples.multibrokers.producers; import java.util.Properties; import java.util.
zabbix_agentd.conf配置文件详解 ronin47 zabbix 配置文件
Aliaskey的别名，例如 Alias=ttlsa.userid:vfs.file.regexp[/etc/passwd,^ttlsa:.:([0-9]+),,,,\1]，或者ttlsa的用户ID。你可以使用key：vfs.file.regexp[/etc/passwd,^ttlsa:.: ([0-9]+),,,,\1]，也可以使用ttlsa.userid。备注: 别名不能重复，但是可以有多个
java--19.用矩阵求Fibonacci数列的第N项 bylijinnan fibonacci
参考了网上的思路，写了个Java版的： public class Fibonacci { final static int[] A={1,1,1,0}; public static void main(String[] args) { int n=7; for(int i=0;i<=n;i++){ int f=fibonac
Netty源码学习-LengthFieldBasedFrameDecoder bylijinnan java netty
先看看LengthFieldBasedFrameDecoder的官方API http://docs.jboss.org/netty/3.1/api/org/jboss/netty/handler/codec/frame/LengthFieldBasedFrameDecoder.html API举例说明了LengthFieldBasedFrameDecoder的解析机制，如下：实
AES加密解密 chicony 加密解密
AES加解密算法，使用Base64做转码以及辅助加密： package com.wintv.common; import javax.crypto.Cipher; import javax.crypto.spec.IvParameterSpec; import javax.crypto.spec.SecretKeySpec; import sun.misc.BASE64Decod
文件编码格式转换 ctrain 编码格式
package com.test; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream;
mysql 在linux客户端插入数据中文乱码 daizj mysql 中文乱码
1、查看系统客户端，数据库，连接层的编码查看方法： http://daizj.iteye.com/blog/2174993 进入mysql，通过如下命令查看数据库编码方式： mysql> show variables like 'character_set_%'; +--------------------------+------
好代码是廉价的代码 dcj3sjt126com 程序员读书
长久以来我一直主张：好代码是廉价的代码。当我跟做开发的同事说出这话时，他们的第一反应是一种惊愕，然后是将近一个星期的嘲笑，把它当作一个笑话来讲。当他们走近看我的表情、知道我是认真的时，才收敛一点。当最初的惊愕消退后，他们会用一些这样的话来反驳： “好代码不廉价，好代码是采用经过数十年计算机科学研究和积累得出的最佳实践设计模式和方法论建立起来的精心制作的程序代码。” 我只
Android网络请求库——android-async-http dcj3sjt126com android
在iOS开发中有大名鼎鼎的ASIHttpRequest库，用来处理网络请求操作，今天要介绍的是一个在Android上同样强大的网络请求库android-async-http，目前非常火的应用Instagram和Pinterest的Android版就是用的这个网络请求库。这个网络请求库是基于Apache HttpClient库之上的一个异步网络请求处理库，网络处理均基于Android的非UI线程，通
ORACLE 复习笔记之SQL语句的优化 eksliang SQL优化 Oracle sql语句优化 SQL语句的优化
转载请出自出处：http://eksliang.iteye.com/blog/2097999 SQL语句的优化总结如下 sql语句的优化可以按照如下六个步骤进行：合理使用索引避免或者简化排序消除对大表的扫描避免复杂的通配符匹配调整子查询的性能 EXISTS和IN运算符下面我就按照上面这六个步骤分别进行总结：
浅析：Android 嵌套滑动机制（NestedScrolling） gg163 android 移动开发滑动机制嵌套
谷歌在发布安卓 Lollipop版本之后，为了更好的用户体验，Google为Android的滑动机制提供了NestedScrolling特性 NestedScrolling的特性可以体现在哪里呢？ 比如你使用了Toolbar，下面一个ScrollView，向上滚
使用hovertree菜单作为后台导航 hvt JavaScript jquery .net hovertree asp.net
hovertree是一个jquery菜单插件，官方网址：http://keleyi.com/jq/hovertree/ ，可以登录该网址体验效果。 0.1.3版本：http://keleyi.com/jq/hovertree/demo/demo.0.1.3.htm hovertree插件包含文件： http://keleyi.com/jq/hovertree/css
SVG 教程（二）矩形天梯梦 svg
SVG <rect> SVG Shapes SVG有一些预定义的形状元素，可被开发者使用和操作：矩形 <rect> 圆形 <circle> 椭圆 <ellipse> 线 <line> 折线 <polyline> 多边形 <polygon> 路径 <path>
一个简单的队列 luyulong java 数据结构队列
public class MyQueue { private long[] arr; private int front; private int end; // 有效数据的大小 private int elements; public MyQueue() { arr = new long[10]; elements = 0; front
基础数据结构和算法九：Binary Search Tree sunwinner Algorithm
A binary search tree (BST) is a binary tree where each node has a Comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all
项目出现的一些问题和体会 Steven-Walker DAO Web servlet
第一篇博客不知道要写点什么，就先来点近阶段的感悟吧。这几天学了servlet和数据库等知识，就参照老方的视频写了一个简单的增删改查的，完成了最简单的一些功能，使用了三层架构。 dao层完成的是对数据库具体的功能实现，service层调用了dao层的实现方法，具体对servlet提供支持。 &
高手问答：Java老A带你全面提升Java单兵作战能力！ ITeye管理员 java
本期特邀《Java特种兵》作者：谢宇，CSDN论坛ID: xieyuooo 针对JAVA问题给予大家解答，欢迎网友积极提问，与专家一起讨论! 作者简介：淘宝网资深Java工程师，CSDN超人气博主，人称“胖哥”。 CSDN博客地址： http://blog.csdn.net/xieyuooo 作者在进入大学前是一个不折不扣的计算机白痴，曾经被人笑话过不懂鼠标是什么，