Chapter 3 news classification: multi classification problems

Keywords: Python Machine Learning Deep Learning NLP

Reuters data set

For this news data set, this is a multi classification problem

Dataset characteristics:

Text classification dataset
Contains 46 different topics
There are at least 10 samples for each topic in the training set
The dataset is in Keras and can be directly transferred in

Difference between multi classification problem and 0-1 problem (single classification) (for example):

Single category: is this object human? A: Yes, B: No
Multi classification: which of the following categories does this object belong to? A: Person, B: car, C: plane, D: 🐟 …

3-12 loading data sets

from keras.datasets import reuters

(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words = 10000)

# Test whether the load is successful
print(len(train_data))
print(len(test_data))
# print(train_data[10])

3-13 decoding index into news text

word_index = reuters.get_word_index()

# Reverse data
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

# Decode text
decoded_newswire = ' '.join(reverse_word_index.get(i-3,'?') for i in train_data[0])

print(decoded_newswire)
print(train_labels[10])

? ? ? said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3
3

Translation (there is some grammatical confusion behind, but it can still be seen that this is a small piece of news)

？？？ Said that due to its acquisition of aerospace company in December, it expected earnings per share of 1.15 to 30 dlr per share in 1987. CTS said that the net pre tax should rise to 9 to 10 mln dlr from six mln dlr and rental operating income from 19 to 22 mln dlr from 12.5 mln dlr in 1986. It is said that the cash flow per share this year should be 2.50 to 3 DLRS

3-14 coded data

Data Vectorization

import numpy as np

# Define vectorization function
def vectorize_sequence(sequences, dimension = 10000):
    results = np.zeros((len(sequences), dimension))
    for i ,sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results
    
# Vectorized sample
x_train = vectorize_sequence(train_data) # Training data Vectorization
x_test = vectorize_sequence(test_data)   # Test data Vectorization

Label Vectorization

After vectorizing the data, we use one - hot coding to vectorize the tags

def to_one_hot(labels, dimension = 46):
    results = np.zeros((len(labels), dimension))
    for i, label in enumerate(labels):
        results[i,label] = 1.
    return results

# Vectorization label
one_hot_train_labels = to_one_hot(train_labels) # Training set
one_hot_test_labels = to_one_hot(test_labels)   # Test set

# Vectorization using Keras's built-in method
from keras.utils.np_utils import to_categorical

one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)

3-15 multi classification model definition

In this problem, there are 46 possible output categories for a news data in the sample, and the middle layer using 16 dimensions in the previous data set may not be able to distinguish, making its smaller dimensions become the intermediate bottleneck of information transmission. For this reason, we use a larger dimension neural network in this example

# Model definition
from keras import models
from keras import layers

model  = models.Sequential()
model.add(layers.Dense(64, activation = 'relu', input_shape = (10000, )))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax')) # The final output has 46 possibilities

3-16 compilation model

model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

3-17 verification method

# 3-17-1 we set aside 1000 samples in the training data as the validation set
# Here we simply use the data slicing knowledge of python

x_val = x_train[:1000] # 0 to 1000 are validation sets
partial_x_train = x_train[1000:] # We use the data after 1000 as our training data

y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

# 3-17-2 training model
history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs = 20,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

Train on 7982 samples, validate on 1000 samples
Epoch 1/20
7982/7982 [==============================] - 1s 182us/step - loss: 2.5544 - accuracy: 0.5014 - val_loss: 1.7032 - val_accuracy: 0.6170
Epoch 2/20
7982/7982 [==============================] - 1s 100us/step - loss: 1.4136 - accuracy: 0.6994 - val_loss: 1.2892 - val_accuracy: 0.7080
Epoch 3/20
7982/7982 [==============================] - 1s 107us/step - loss: 1.0453 - accuracy: 0.7741 - val_loss: 1.1258 - val_accuracy: 0.7470
Epoch 4/20
7982/7982 [==============================] - 1s 95us/step - loss: 0.8166 - accuracy: 0.8254 - val_loss: 1.0081 - val_accuracy: 0.7840
Epoch 5/20
7982/7982 [==============================] - 1s 94us/step - loss: 0.6438 - accuracy: 0.8631 - val_loss: 0.9416 - val_accuracy: 0.8130
Epoch 6/20
7982/7982 [==============================] - 1s 91us/step - loss: 0.5099 - accuracy: 0.8968 - val_loss: 0.8941 - val_accuracy: 0.8200
Epoch 7/20
7982/7982 [==============================] - 1s 104us/step - loss: 0.4163 - accuracy: 0.9136 - val_loss: 0.8934 - val_accuracy: 0.8080
Epoch 8/20
7982/7982 [==============================] - 1s 117us/step - loss: 0.3332 - accuracy: 0.9290 - val_loss: 0.8881 - val_accuracy: 0.8150
Epoch 9/20
7982/7982 [==============================] - 1s 93us/step - loss: 0.2763 - accuracy: 0.9392 - val_loss: 0.8839 - val_accuracy: 0.8200
Epoch 10/20
7982/7982 [==============================] - 1s 85us/step - loss: 0.2371 - accuracy: 0.9464 - val_loss: 0.8970 - val_accuracy: 0.8140
Epoch 11/20
7982/7982 [==============================] - 1s 96us/step - loss: 0.2001 - accuracy: 0.9505 - val_loss: 0.9158 - val_accuracy: 0.8110
Epoch 12/20
7982/7982 [==============================] - 1s 88us/step - loss: 0.1777 - accuracy: 0.9518 - val_loss: 0.9198 - val_accuracy: 0.8110
Epoch 13/20
7982/7982 [==============================] - 1s 81us/step - loss: 0.1604 - accuracy: 0.9543 - val_loss: 0.9159 - val_accuracy: 0.8190
Epoch 14/20
7982/7982 [==============================] - 1s 92us/step - loss: 0.1455 - accuracy: 0.9569 - val_loss: 0.9516 - val_accuracy: 0.8130
Epoch 15/20
7982/7982 [==============================] - 1s 86us/step - loss: 0.1388 - accuracy: 0.9559 - val_loss: 0.9443 - val_accuracy: 0.8190
Epoch 16/20
7982/7982 [==============================] - 1s 80us/step - loss: 0.1306 - accuracy: 0.9546 - val_loss: 1.0283 - val_accuracy: 0.7990
Epoch 17/20
7982/7982 [==============================] - 1s 96us/step - loss: 0.1217 - accuracy: 0.9587 - val_loss: 1.0271 - val_accuracy: 0.8100
Epoch 18/20
7982/7982 [==============================] - 1s 88us/step - loss: 0.1173 - accuracy: 0.9578 - val_loss: 1.0426 - val_accuracy: 0.8070
Epoch 19/20
7982/7982 [==============================] - 1s 86us/step - loss: 0.1151 - accuracy: 0.9563 - val_loss: 1.0390 - val_accuracy: 0.8090
Epoch 20/20
7982/7982 [==============================] - 1s 142us/step - loss: 0.1093 - accuracy: 0.9583 - val_loss: 1.0477 - val_accuracy: 0.8090

# Drawing

import matplotlib.pyplot as plt

loss = history.history['loss']
val_loss =history.history['val_loss']

epochs = range(1, len(loss) + 1)

plt.plot(epochs, loss, 'bo', label = 'Training loss') # 'bo'l represents the blue origin
plt.plot(epochs, val_loss, 'b', label = 'Validation loss') # b represents a solid blue line
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.show()

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-o4fdpieh-163347174466) (output_17_0. PNG)]

plt.clf() # Empty image

# Note that the following versions are in the book
# acc = history.history['acc']
# val_acc = history.history['val_acc']

# In my version of Keras, acc is replaced by accuracy for reference only
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

plt.plot(epochs, acc, 'bo', label = 'Training acc')
plt.plot(epochs, val_acc, 'b', label = 'Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()

plt.show()

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-hj9huu6e-163347174469) (output_18_0. PNG)]

3-21 use fewer training steps to train the neural network

model_2 = models.Sequential()
model_2.add(layers.Dense(64, activation = 'relu', input_shape = (10000, )))
model_2.add(layers.Dense(64, activation = 'relu'))
model_2.add(layers.Dense(46, activation = 'softmax'))

model_2.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy',
               metrics = ['accuracy'])

model_2.fit(partial_x_train, partial_y_train, epochs = 9,
            batch_size = 512, validation_data = (x_val, y_val))

results = model.evaluate(x_test, one_hot_test_labels)

print(results)

Train on 7982 samples, validate on 1000 samples
Epoch 1/9
7982/7982 [==============================] - 1s 97us/step - loss: 2.4856 - accuracy: 0.5234 - val_loss: 1.6474 - val_accuracy: 0.6420
Epoch 2/9
7982/7982 [==============================] - 1s 95us/step - loss: 1.3877 - accuracy: 0.6994 - val_loss: 1.2912 - val_accuracy: 0.7030
Epoch 3/9
7982/7982 [==============================] - 1s 92us/step - loss: 1.0489 - accuracy: 0.7699 - val_loss: 1.1425 - val_accuracy: 0.7600
Epoch 4/9
7982/7982 [==============================] - 1s 85us/step - loss: 0.8293 - accuracy: 0.8182 - val_loss: 1.0284 - val_accuracy: 0.7830
Epoch 5/9
7982/7982 [==============================] - 1s 99us/step - loss: 0.6605 - accuracy: 0.8543 - val_loss: 0.9781 - val_accuracy: 0.7820
Epoch 6/9
7982/7982 [==============================] - 1s 83us/step - loss: 0.5260 - accuracy: 0.8887 - val_loss: 0.9659 - val_accuracy: 0.7810
Epoch 7/9
7982/7982 [==============================] - 1s 84us/step - loss: 0.4247 - accuracy: 0.9138 - val_loss: 0.9053 - val_accuracy: 0.8040
Epoch 8/9
7982/7982 [==============================] - 1s 102us/step - loss: 0.3496 - accuracy: 0.9253 - val_loss: 0.8785 - val_accuracy: 0.8150
Epoch 9/9
7982/7982 [==============================] - 1s 100us/step - loss: 0.2851 - accuracy: 0.9394 - val_loss: 0.8921 - val_accuracy: 0.8210
2246/2246 [==============================] - 0s 157us/step
[1.251534348179587, 0.7853962779045105]

The results are compared with the completely random classifier results

import copy

# Randomly generated test_label
test_label_copy = copy.copy(test_labels)
np.random.shuffle(test_label_copy)

hits_array = np.array(test_labels) == np.array(test_label_copy)
float(np.sum(hits_array)) / len(test_labels)

0.18655387355298308

3-22 generate new prediction results on the dataset

prediction = model.predict(x_test)

# Each element in the prediction is a vector with a length of 46
print(prediction[0].shape)

# And its sum is 1 (indicating the probability that the data is divided into 46 different classifications)
print(np.sum(prediction[0]))

# Print its largest forecast category
print(np.argmax(prediction[0]))

(46,)
1.0000001
3

3-23 model with information bottleneck

Here, in order to verify that the dimension output of the intermediate full link layer is too small, which will make the full link layer an obstacle to information flow, we specially reduce the dimension of the full link layer here, and then make further comparative analysis by comparing the accuracy of prediction.

# For the establishment of the model, note that we change the middle full connection layer to 4
model_3 = models.Sequential()
model_3.add(layers.Dense(64, activation = 'relu', input_shape = (10000,)))

# You can change the numbers in this part, 8, 32 and 64 to see the impact
# --------------------------------------------------
model_3.add(layers.Dense(128, activation = 'relu'))
# --------------------------------------------------

model_3.add(layers.Dense(46, activation = 'softmax'))

# Model training
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy',
             metrics = ['accuracy'])

model.fit(partial_x_train, partial_y_train, 
          epochs = 20, batch_size = 128,
          validation_data = (x_val, y_val))

Train on 7982 samples, validate on 1000 samples
Epoch 1/20
7982/7982 [==============================] - 1s 176us/step - loss: 0.0738 - accuracy: 0.9588 - val_loss: 2.5090 - val_accuracy: 0.7760
Epoch 2/20
7982/7982 [==============================] - 1s 133us/step - loss: 0.0698 - accuracy: 0.9597 - val_loss: 2.6685 - val_accuracy: 0.7700
Epoch 3/20
7982/7982 [==============================] - 1s 130us/step - loss: 0.0676 - accuracy: 0.9607 - val_loss: 2.7785 - val_accuracy: 0.7700
Epoch 4/20
7982/7982 [==============================] - 1s 129us/step - loss: 0.0675 - accuracy: 0.9584 - val_loss: 2.9254 - val_accuracy: 0.7700
Epoch 5/20
7982/7982 [==============================] - 1s 133us/step - loss: 0.0670 - accuracy: 0.9588 - val_loss: 3.0094 - val_accuracy: 0.7710
Epoch 6/20
7982/7982 [==============================] - 1s 129us/step - loss: 0.0666 - accuracy: 0.9578 - val_loss: 3.0232 - val_accuracy: 0.7680
Epoch 7/20
7982/7982 [==============================] - 1s 168us/step - loss: 0.0665 - accuracy: 0.9590 - val_loss: 3.0974 - val_accuracy: 0.7730
Epoch 8/20
7982/7982 [==============================] - 1s 122us/step - loss: 0.0659 - accuracy: 0.9587 - val_loss: 3.2057 - val_accuracy: 0.7670
Epoch 9/20
7982/7982 [==============================] - 1s 133us/step - loss: 0.0648 - accuracy: 0.9599 - val_loss: 3.2828 - val_accuracy: 0.7650
Epoch 10/20
7982/7982 [==============================] - 1s 124us/step - loss: 0.0644 - accuracy: 0.9602 - val_loss: 3.1684 - val_accuracy: 0.7700
Epoch 11/20
7982/7982 [==============================] - 1s 131us/step - loss: 0.0647 - accuracy: 0.9577 - val_loss: 3.2552 - val_accuracy: 0.7650
Epoch 12/20
7982/7982 [==============================] - 1s 138us/step - loss: 0.0627 - accuracy: 0.9578 - val_loss: 3.4422 - val_accuracy: 0.7710
Epoch 13/20
7982/7982 [==============================] - 1s 135us/step - loss: 0.0631 - accuracy: 0.9594 - val_loss: 3.3429 - val_accuracy: 0.7610
Epoch 14/20
7982/7982 [==============================] - 1s 148us/step - loss: 0.0636 - accuracy: 0.9585 - val_loss: 3.6921 - val_accuracy: 0.7660
Epoch 15/20
7982/7982 [==============================] - 1s 160us/step - loss: 0.0632 - accuracy: 0.9597 - val_loss: 3.4518 - val_accuracy: 0.7640
Epoch 16/20
7982/7982 [==============================] - 1s 186us/step - loss: 0.0616 - accuracy: 0.9590 - val_loss: 3.7733 - val_accuracy: 0.7620
Epoch 17/20
7982/7982 [==============================] - 1s 139us/step - loss: 0.0621 - accuracy: 0.9590 - val_loss: 3.7500 - val_accuracy: 0.7610
Epoch 18/20
7982/7982 [==============================] - 1s 132us/step - loss: 0.0610 - accuracy: 0.9589 - val_loss: 3.9891 - val_accuracy: 0.7540
Epoch 19/20
7982/7982 [==============================] - 1s 150us/step - loss: 0.0622 - accuracy: 0.9588 - val_loss: 3.8385 - val_accuracy: 0.7500
Epoch 20/20
7982/7982 [==============================] - 1s 134us/step - loss: 0.0602 - accuracy: 0.9599 - val_loss: 4.1126 - val_accuracy: 0.7570



<keras.callbacks.callbacks.History at 0x1b52af48be0>

Comparison conclusion

# Print results
results = model.evaluate(x_test, one_hot_test_labels)
print(results)

2246/2246 [==============================] - 0s 89us/step
[4.692387377058727, 0.75200355052948]

If the data samples are divided into N categories, the last layer must be the full connection layer with size N
For single label and multi category problems, the last layer should use softmax as activation
The tags and are encoded by using one-hot, and finally category is used_ Crossentropy as a loss function

Write at the end

Note: the code of this article comes from Python deep learning and is uploaded in the form of electronic notes. It is only for learning reference. The authors have run successfully. If there is any omission, please practice the author of this article

Ladies and gentlemen, I've seen it here. Please use your fingers to praise the blogger 8. Your support is the author's greatest creative power (＾－＾)＞
Lack of talent and learning. If there is any mistake, please correct it
This article is only for the purpose of learning and communication, not for any commercial purpose. If copyright issues are involved, please contact the author as soon as possible

Posted by novice_php on Mon, 04 Oct 2021 17:00:15 -0700

Programmer Group