Practical test of video high level feature classification

meet Last article: https://blog.csdn.net/SPESEG/article/details/103875916

Use the jitter video data to test our video and see how it works.

Strategy: randomly select 20-40 frames, stack all frames, select the frames according to the index, then enter the integration V3 model as a whole, and then enter the classification model. There is no technical content.

Probability as similarity can be understood in this way. But I don't think it will work very well.

Considering the time, if there is any fast way to get the frame number of video, it will be better.

Firstly, 30 frames are selected randomly, the total number of frames is obtained by ffmpeg library, and then the corresponding frames are obtained by cv2 according to index, or other decoders.

1 - feature similarity first

2-GO through [high-level feature] two classifications again

1.1 I basically extracted the avg feature, but here Calculate similarity or measure You need to use the concept of tree. Yes, it's the tree you encounter in data structure.

But there are many kinds of such trees. At present, we can try the basic K-d tree. The structure of the tree is to convert the linear complexity into the log (n) complexity. sklearn has integrated the ready-made API without any effort.

But I found a github which was made by docker. I don't know anything about it. It's not convenient to take out what I want, even though it's also written by py.

Many projects are so structured, which is really a delay. To understand the code, you have to deal with every script.

I want to test MNIST. I can't do it with kd tree. It's too hard to lie in the slot.

The similarity is made of KNN, This blog post The results are mentioned in. I can only say that the first 50 may still be correct, the back may not be, or the first 30

1.2 our company's video random 30 frames inclusionv3 feature similarity in In this article It has been mentioned, but it does not actually do query, which is created by t-SNE diagram We also know that the effect will not be very good.

The following is the query result: however, it doesn't look too bad [Note: 012 is a category, 3 is a category], of course, there are bad results, as follows

#bad case
query data id 410, label 1
result:
 {10754, 1540, 7684, 6151, 3080, 5129, 1554, 3093, 1558, 22, 3614, 8737, 8226, 11301, 3625, 6701, 9264, 7223, 573, 62, 3138, 77, 10318, 79, 1106, 595, 2134, 6232, 92, 11360, 1126, 3688, 3189, 5755, 7294, 4740, 3214, 1177, 11941, 1190, 3752, 6648, 7340, 6841, 5313, 196, 3783, 7369, 6865, 9432, 5861, 9966, 11510, 3845, 5415, 8495, 9519, 6963, 3388, 1865, 1867, 332, 3919, 7509, 347, 6493, 8035, 5489, 10610, 3441, 3955, 2933, 381, 7550, 6549, 410, 932, 9128, 3496, 10666, 427, 3500, 429, 1456, 1969, 6579, 7607, 955, 446, 2499, 9172, 7125, 985, 6106, 2019, 8677, 10218, 7661, 7150, 7160}
results' label :
 [3 2 3 3 3 3 2 3 2 0 3 3 3 3 3 3 3 3 1 0 3 0 3 0 2 1 3 3 0 3 2 3 3 3 3 3 3
 2 3 2 3 3 3 3 3 0 3 3 3 3 3 3 3 3 3 3 3 3 3 2 2 1 3 3 1 3 3 3 3 3 3 3 1 3
 3 1 1 3 3 3 1 3 1 2 2 3 3 1 1 3 3 3 2 3 2 3 3 3 3 3]

query data id 1989, label 2
result:
 {6658, 7184, 8720, 3091, 1563, 31, 544, 10278, 1575, 1576, 5673, 11307, 2091, 2095, 561, 1589, 5688, 1081, 8762, 5696, 67, 1604, 4166, 3661, 8781, 1617, 594, 8278, 2144, 7778, 10342, 1127, 10857, 118, 10360, 4234, 2188, 11405, 1678, 1691, 1186, 7342, 8879, 7351, 9915, 5822, 7881, 3804, 221, 733, 8926, 1250, 1769, 7404, 11503, 6899, 1278, 1797, 2312, 6409, 6413, 6925, 11030, 1302, 2849, 291, 804, 8486, 7476, 6456, 10563, 327, 340, 9049, 1898, 10630, 4487, 2958, 398, 2449, 1433, 1435, 9627, 926, 1441, 1955, 422, 1453, 941, 1458, 11187, 4026, 1989, 5080, 1497, 2008, 4063, 10217, 8172, 7164}
results' label :
 [3 3 3 3 2 0 1 3 2 2 3 3 2 3 1 2 3 2 3 3 0 2 3 3 3 2 1 3 3 3 3 2 3 0 3 3 3
 3 2 2 2 3 3 3 3 3 3 3 0 1 3 2 2 3 3 3 2 2 3 3 3 3 3 2 3 1 1 3 3 3 3 1 1 3
 2 3 3 3 1 3 2 2 3 1 2 2 1 2 1 2 3 3 2 3 2 2 3 3 3 3]

query data id 861, label 1
result:
 {1, 10246, 10256, 7699, 5665, 9771, 3629, 4653, 5681, 54, 11840, 7238, 2649, 9819, 98, 4724, 8823, 8839, 7310, 8336, 2718, 9896, 6824, 1192, 9395, 7860, 8896, 3780, 10960, 4309, 8418, 9443, 2792, 8426, 5869, 753, 2290, 3324, 9209, 3837, 2310, 7946, 267, 6416, 11025, 6937, 8988, 6432, 5414, 6447, 8506, 316, 9533, 6975, 10053, 8519, 6475, 4429, 2894, 10063, 6480, 5464, 861, 10594, 1380, 5996, 9582, 11120, 7027, 2941, 11138, 4996, 7046, 11655, 2450, 10649, 1949, 3998, 9629, 10657, 5031, 10151, 11178, 6066, 8115, 9153, 3010, 7630, 9166, 4565, 8155, 4064, 7654, 7656, 9194, 11248, 7154, 7667, 7161, 4605}
results' label :
 [0 3 3 3 3 3 3 3 3 0 3 3 3 3 0 3 3 3 3 3 3 3 3 2 3 3 3 3 3 3 3 3 3 3 3 1 3
 3 3 3 3 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 1 3 2 3 3 3 3 3 3 3 3 3
 3 3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3]

query data id 677, label 1
result:
 {5640, 1033, 3087, 4627, 8224, 10787, 10280, 9258, 3627, 11820, 3630, 10800, 4156, 8765, 4157, 6716, 10812, 8770, 10818, 11335, 1095, 11849, 4168, 6220, 10831, 6738, 95, 96, 8290, 8294, 6248, 2664, 5226, 10861, 9336, 1667, 1670, 8864, 677, 2728, 691, 6845, 11970, 1218, 6862, 3793, 4306, 1749, 1751, 10968, 11487, 5855, 737, 3310, 3314, 3319, 276, 11542, 9495, 6424, 3360, 6436, 8998, 4922, 1343, 2377, 333, 11086, 4438, 1885, 7012, 8565, 7546, 902, 7049, 6548, 3989, 3988, 2979, 1444, 2986, 5554, 435, 1973, 11707, 6590, 10181, 10183, 8137, 8144, 11217, 10704, 2007, 8160, 8161, 9696, 1510, 8167, 11251, 5622}
results' label :
 [3 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 2 3 3 3 3 3 0 0 3 3 3 3 3 3 3 2 2
 3 1 3 1 3 3 2 3 3 3 2 2 3 3 3 1 3 3 3 1 3 3 3 3 3 3 3 2 3 1 3 3 2 3 3 3 1
 3 3 3 3 3 2 3 3 1 2 3 3 3 3 3 3 3 3 2 3 3 3 2 3 3 3]

2 about re classification, let's use our data to re test

The structure of the model is very simple. There is no complex layer. It is basic. There are three levels of deny, which is also the most basic in the recommendation.

Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 128)               262272    
_________________________________________________________________
activation_1 (Activation)    (None, 128)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                4128      
_________________________________________________________________
dropout_2 (Dropout)          (None, 32)                0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 32)                128       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 66        
_________________________________________________________________
activation_2 (Activation)    (None, 2)                 0         
=================================================================
Total params: 266,594
Trainable params: 266,530
Non-trainable params: 64

Due to the poor classification of our company's data, for example, some videos in Video 1 are not very good. They may not be in the category of 1. They may be in the category of touchy spots, but they are also classified into 1. In general, the result of two classification model is more than 75% of ACC, the highest is 80%. The best model is dense5 〝 14 〝 0.9838.hdf5, only 6M, if the compression estimation is smaller, so the recommended network is really much simpler than the image.

Model: "sequential_1" ACC=0.8
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 256)               524544    
_________________________________________________________________
activation_1 (Activation)    (None, 256)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                8224      
_________________________________________________________________
dropout_2 (Dropout)          (None, 32)                0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 32)                128       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 66        
_________________________________________________________________
activation_2 (Activation)    (None, 2)                 0         
=================================================================
Total params: 532,962
Trainable params: 532,898
Non-trainable params: 64

Next steps:

View our videos one by one, delete unnecessary ones, and then extract features for training.

In addition, we can join QQ group to discuss some related problems without wechat group

QQ group: 868373192

Voice image video depth learning group

SpeechImageKing

Published 193 original articles, won praise 173, visited 30000+

Private letter follow

Posted by Ben5on on Sun, 12 Jan 2020 23:44:17 -0800

Programmer Group

Practical test of video high level feature classification

Hot Keywords