Author: Zhi Guangda

When we are actually deploying a machine learning system, one of the most important things is the robustness of the system. We hope that the system will not only be effective for most cases, but also be truly reliable, such as being able to recognize attacks from others (deceiving your classification model).Therefore, the topic of Adversarial Robustness has attracted considerable attention in recent years.To improve the model first, we have to know what the problem is with the model. Today we will see how our model was deceived.

## Load model and sample pictures

The beauty of in-depth learning is that you can easily get started and see some real results in your data.Let's build our first example of a deception model.

Before we start, we use pytorch to load a trained Resnet50 model and a picture of a pig to*test.

We changed the size of the picture to 224X224 and converted it to tensor:

`from PIL import Image from torchvision import transforms import numpy as np import matplotlib.pyplot as plt import matplotlib %matplotlib inline %config InlineBackend.figure_format = 'svg' # read the image, resize to 224 and convert to PyTorch Tensor pig_img = Image.open("pig.jpg") preprocess = transforms.Compose([ transforms.Resize(224), transforms.ToTensor(), ]) pig_tensor = preprocess(pig_img)[None,:,:,:] # plot image (note that numpy using HWC whereas Pytorch user CHW, so we need to convert) plt.imshow(pig_tensor[0].numpy().transpose(1,2,0))`

`<matplotlib.image.AxesImage at 0x7f14d3fd3550>`

Load the trained ResNet50 model on the imagenet dataset below and enter a picture to view the results.Processing the following picture as batch_size x num_channels x height x width is a pytorch uniform input format

`import torch import torch.nn as nn from torchvision.models import resnet50 # simple Module to normalize an image class Normalize(nn.Module): def __init__(self, mean, std): super(Normalize, self).__init__() self.mean = torch.Tensor(mean) self.std = torch.Tensor(std) def forward(self, x): return (x - self.mean.type_as(x)[None,:,None,None]) / self.std.type_as(x)[None,:,None,None] # values are standard normalization for ImageNet images, # from https://github.com/pytorch/examples/blob/master/imagenet/main.py norm = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # load pre-trained ResNet50, and put into evaluation mode (necessary to e.g. turn off batchnorm) model = resnet50(pretrained=True) model.eval();`

`# form predictions pred = model(norm(pig_tensor))`

Now the model output pred is a 1000-dimensional vector representing the 1000 types of pictures of imagenet.To find out which category this result predicts, the easiest way is to find the largest value in a vector and find the corresponding category:

`import json with open("imagenet_class_index.json") as f: imagenet_classes = {int(i):x[1] for i,x in json.load(f).items()} print(imagenet_classes[pred.max(dim=1)[1].item()])`

`hog`

The prediction is correct!(The pig tag in the imagenet dataset is hog)

### Some basic concepts

To explain how to deceive a model below, we need to first introduce some basic concepts.

Step 1: We define the model as a function:

$$ h_{\theta}: \mathcal{X} \rightarrow \mathbb{R}^{k} $$

Represents a function that maps the input space to the k-dimensional output space, $k$is the number of classes, $\theta$represents all the training parameters in the model, so $h_{\theta}$represents our model.

Step 2: We define the loss function: $\ell\left(h_{\theta}(x), y\right)$, where $x$is the input sample $y$is the correct label. Specifically, we use the cross entropy loss function:

$$ \ell\left(h_{\theta}(x), y\right)=\log \left(\sum_{j=1}^{k} \exp \left(h_{\theta}(x){j}\right)\right)-h{\theta}(x)_{y} $$

Here$h_{\theta}(x){j}$represents the $j$th element in $h{\theta}(x)$

`# 341 is the class index corresponding to "hog" print(nn.CrossEntropyLoss()(model(norm(pig_tensor)),torch.LongTensor([341])).item())`

0.003882253309711814

The loss of 0.0039 is already very small, and our model will assume that this picture is a pig with the probability of $\exp (-0.0039) approx 0.996*

### Create a rival picture

So how do we process the image so that we can deceive the model and make it think of something else?Before answering this question, let's first see how the model is trained. A common method of training classifiers is to optimize the parameter $\theta$to minimize the average loss of some training sets $\left{x_{i} \in \mathcal{X}, y_{i} \in \mathbb{Z}\right}, i=1, \ldots, m$, we write it as an optimization problem

$$ \operatorname{minimize}{\theta} \frac{1}{m} \sum{i=1}^{m} \ell\left(h_{\theta}\left(x_{i}\right), y_{i}\right) $$

We usually solve this optimization problem by (random) gradient descent:

$$ \theta:=\theta-\frac{\alpha}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \nabla_{\theta} \ell\left(h_{\theta}\left(x_{i}\right), y_{i}\right) $$

Here, $\alpha$is the step, $\mathcal{B}$is a batch.For deep neural networks, this gradient can be effectively calculated by back propagation.But another advantage of reverse propagation is that we can let loss derive theta as well as loss derive the input itself!!That's how we want to generate counterexamples.We adjust the image by reverse propagation to maximize the loss, which means we have to solve the following optimization problems

$$ \underset{\hat{x}}{\operatorname{maximize}} \ell\left(h_{\theta}(\hat{x}), y\right) $$

Here, $\hat{\boldsymbol{x}}$represents our rival picture, and its purpose is to maximize the corresponding loss.Of course, we can't just optimize $\*hat{\boldsymbol{x}}$by any optimization. (After all, there are some images that aren't pigs, such as we completely change the image to a dog, so it's normal for the classifier to identify him as not a pig.) So we need to make sure that $\*hat{\boldsymbol{x}$is close to our original input of $x$.So we write the optimization question as:

$$ \operatorname{maximize} \ell\left(h_{\theta}(x+\delta), y\right) $$

Here, $\Delta$is the range of changes to the image, and in theory, we want $\Delta$to include any changes that make people visually think the altered picture is the same as the original input.This may include adding a small amount of noise to rotation, translation, zooming, or performing some 3D conversion of the underlying model, or even*another shooting angle of the pig.Mathematically, however, it is impossible to give a strict definition.So we can only define a space where the maximum perturbation to the picture does not change the meaning of the picture:

$$ \Delta=\left{\delta:|\delta|_{\infty} \leq \epsilon\right} $$

Here $|\delta|_{\alpha}$is defined as

$$ |\delta|_{\infty}=\max {i}\left| \delta{i}\right| $$

Now let's take a look at the effectiveness of this method. The following example uses PyTorch's SDG optimizer to maximize input perturbation to minimize loss.

`import torch.optim as optim epsilon = 2./255 delta = torch.zeros_like(pig_tensor, requires_grad=True) opt = optim.SGD([delta], lr=1e-1) for t in range(30): pred = model(norm(pig_tensor + delta)) loss = -nn.CrossEntropyLoss()(pred, torch.LongTensor([341])) if t % 5 == 0: print(t, loss.item()) opt.zero_grad() loss.backward() opt.step() delta.data.clamp_(-epsilon, epsilon) print("True class probability:", nn.Softmax(dim=1)(pred)[0,341].item())`

`0 -0.003882253309711814 5 -0.0069345044903457165 10 -0.01582527346909046 15 -0.08056001365184784 20 -11.751323699951172 25 -16.78317642211914 True class probability: 1.3113177601553616e-06`

After a 30-step gradient decrease, our Resnet50 thinks that the picture is less likely to be a pig* Now let's see what the model thinks of this picture.

`max_class = pred.max(dim=1)[1].item() print("Predicted class: ", imagenet_classes[max_class]) print("Predicted probability:", nn.Softmax(dim=1)(pred)[0,max_class].item())`

`Predicted class: wombat Predicted probability: 0.9999175071716309`

Now this model considers our input to be a woolly-nosed bear, which is interesting!Let's see what the pictures we actually entered look like:

`plt.imshow((pig_tensor + delta)[0].detach().numpy().transpose(1,2,0))`

`<matplotlib.image.AxesImage at 0x7f14d0dcb358>`

Unfortunately, we can't see the change in the picture with the naked eye at all. Now zoom in our delta 50 times and see what we've changed

`plt.imshow((50*delta+0.5)[0].detach().numpy().transpose(1,2,0))`

`<matplotlib.image.AxesImage at 0x7f14cc22a908>`

Therefore, by adding a small multiple of this seemingly random noise, we can create an image that looks the same as the original image but is misclassified.

### Targeted deception

Using this principle, we can further set the target of deception, such as*let's make the model think the pig is an airplane.

Unlike the above methods, we not only maximize the loss of the model output and the correct object, but also minimize the loss of the output to our target object:

$$ \underset{\delta \in \Delta}{\operatorname{maximize}}\left(\ell\left(h_{\theta}(x+\delta), y\right)-\ell\left(h_{\theta}(x+\delta), y_{\mathrm{target}}\right)\right) \equiv \underset{\delta \in \Delta}{\operatorname{maximize}}\left(h_{\theta}(x+\delta){y{\mathrm{target}}}-h_{\theta}(x+\delta)_{y}\right) $$

`delta = torch.zeros_like(pig_tensor, requires_grad=True) opt = optim.SGD([delta], lr=5e-3) for t in range(100): pred = model(norm(pig_tensor + delta)) loss = (-nn.CrossEntropyLoss()(pred, torch.LongTensor([341])) + nn.CrossEntropyLoss()(pred, torch.LongTensor([404]))) if t % 10 == 0: print(t, loss.item()) opt.zero_grad() loss.backward() opt.step() delta.data.clamp_(-epsilon, epsilon)`

`0 24.006044387817383 10 -0.24818801879882812 20 -8.039923667907715 30 -15.460402488708496 40 -21.939563751220703 50 -26.95309066772461 60 -31.754430770874023 70 -33.13744354248047 80 -37.07537841796875 90 -34.388519287109375`

`max_class = pred.max(dim=1)[1].item() print("Predicted class: ", imagenet_classes[max_class]) print("Predicted probability:", nn.Softmax(dim=1)(pred)[0,max_class].item())`

`Predicted class: airliner Predicted probability: 0.8934944868087769`

Now our model already thinks the pig is an airplane! *Similarly, our plane pigs still look the same as the original pictures*

`plt.imshow((pig_tensor + delta)[0].detach().numpy().transpose(1,2,0))`

`<matplotlib.image.AxesImage at 0x7f14cc1ccd68>`

Here's the noise we added

`plt.imshow((50*delta+0.5)[0].detach().numpy().transpose(1,2,0))`

`<matplotlib.image.AxesImage at 0x7f14cc1f37b8>`

This is just an interesting example. If you are interested in antagonism training, you can look up the relevant papers for further study.

Project Address: https://momodel.cn/explore/5eba60c9d99e51afef3bfebd?type=app (Recommended to use Google Chrome browser to open on your computer)

## Quote

- seq2seq: https://blog.csdn.net/rxm1989/article/details/79459739
- attention: https://zhuanlan.zhihu.com/p/47063917
- Code Source: https://github.com/EuphoriaYan/ChatRobot-For-Keras2

## About us

Mo (Web address: https://momodel.cn) Is a Python enabled online modeling platform for artificial intelligence that helps you quickly develop, train, and deploy models.

The near future Mo We are also continuing introductory courses and paper sharing activities related to machine learning. Welcome to our Public Number for the latest information!