# Introduction to antagonistic training - trying to deceive a model

Author: Zhi Guangda

When we are actually deploying a machine learning system, one of the most important things is the robustness of the system. We hope that the system will not only be effective for most cases, but also be truly reliable, such as being able to recognize attacks from others (deceiving your classification model).Therefore, the topic of Adversarial Robustness has attracted considerable attention in recent years.To improve the model first, we have to know what the problem is with the model. Today we will see how our model was deceived.

## Load model and sample pictures

The beauty of in-depth learning is that you can easily get started and see some real results in your data.Let's build our first example of a deception model.

Before we start, we use pytorch to load a trained Resnet50 model and a picture of a pig to*test.

We changed the size of the picture to 224X224 and converted it to tensor:

from PIL import Image
from torchvision import transforms
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

%matplotlib inline
%config InlineBackend.figure_format = 'svg'

# read the image, resize to 224 and convert to PyTorch Tensor
pig_img = Image.open("pig.jpg")
preprocess = transforms.Compose([
transforms.Resize(224),
transforms.ToTensor(),
])
pig_tensor = preprocess(pig_img)[None,:,:,:]

# plot image (note that numpy using HWC whereas Pytorch user CHW, so we need to convert)
plt.imshow(pig_tensor[0].numpy().transpose(1,2,0))

<matplotlib.image.AxesImage at 0x7f14d3fd3550>


Load the trained ResNet50 model on the imagenet dataset below and enter a picture to view the results.Processing the following picture as batch_size x num_channels x height x width is a pytorch uniform input format

import torch
import torch.nn as nn
from torchvision.models import resnet50

# simple Module to normalize an image
class Normalize(nn.Module):
def __init__(self, mean, std):
super(Normalize, self).__init__()
self.mean = torch.Tensor(mean)
self.std = torch.Tensor(std)
def forward(self, x):
return (x - self.mean.type_as(x)[None,:,None,None]) / self.std.type_as(x)[None,:,None,None]

# values are standard normalization for ImageNet images,
# from https://github.com/pytorch/examples/blob/master/imagenet/main.py
norm = Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

# load pre-trained ResNet50, and put into evaluation mode (necessary to e.g. turn off batchnorm)
model = resnet50(pretrained=True)
model.eval();

# form predictions
pred = model(norm(pig_tensor))


Now the model output pred is a 1000-dimensional vector representing the 1000 types of pictures of imagenet.To find out which category this result predicts, the easiest way is to find the largest value in a vector and find the corresponding category:

import json
with open("imagenet_class_index.json") as f:
imagenet_classes = {int(i):x[1] for i,x in json.load(f).items()}
print(imagenet_classes[pred.max(dim=1)[1].item()])

hog


The prediction is correct!(The pig tag in the imagenet dataset is hog)

### Some basic concepts

To explain how to deceive a model below, we need to first introduce some basic concepts.

Step 1: We define the model as a function:

$$h_{\theta}: \mathcal{X} \rightarrow \mathbb{R}^{k}$$

Represents a function that maps the input space to the k-dimensional output space, $k$is the number of classes, $\theta$represents all the training parameters in the model, so $h_{\theta}$represents our model.

Step 2: We define the loss function: $\ell\left(h_{\theta}(x), y\right)$, where $x$is the input sample $y$is the correct label. Specifically, we use the cross entropy loss function:

$$\ell\left(h_{\theta}(x), y\right)=\log \left(\sum_{j=1}^{k} \exp \left(h_{\theta}(x){j}\right)\right)-h{\theta}(x)_{y}$$

Here$h_{\theta}(x){j}$represents the $j$th element in $h{\theta}(x)$

# 341 is the class index corresponding to "hog"
print(nn.CrossEntropyLoss()(model(norm(pig_tensor)),torch.LongTensor([341])).item())

0.003882253309711814


The loss of 0.0039 is already very small, and our model will assume that this picture is a pig with the probability of $\exp (-0.0039) approx 0.996* ### Create a rival picture So how do we process the image so that we can deceive the model and make it think of something else?Before answering this question, let's first see how the model is trained. A common method of training classifiers is to optimize the parameter$\theta$to minimize the average loss of some training sets$\left{x_{i} \in \mathcal{X}, y_{i} \in \mathbb{Z}\right}, i=1, \ldots, m$, we write it as an optimization problem $$\operatorname{minimize}{\theta} \frac{1}{m} \sum{i=1}^{m} \ell\left(h_{\theta}\left(x_{i}\right), y_{i}\right)$$ We usually solve this optimization problem by (random) gradient descent: $$\theta:=\theta-\frac{\alpha}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \nabla_{\theta} \ell\left(h_{\theta}\left(x_{i}\right), y_{i}\right)$$ Here,$\alpha$is the step,$\mathcal{B}$is a batch.For deep neural networks, this gradient can be effectively calculated by back propagation.But another advantage of reverse propagation is that we can let loss derive theta as well as loss derive the input itself!!That's how we want to generate counterexamples.We adjust the image by reverse propagation to maximize the loss, which means we have to solve the following optimization problems $$\underset{\hat{x}}{\operatorname{maximize}} \ell\left(h_{\theta}(\hat{x}), y\right)$$ Here,$\hat{\boldsymbol{x}}$represents our rival picture, and its purpose is to maximize the corresponding loss.Of course, we can't just optimize$\*hat{\boldsymbol{x}}$by any optimization. (After all, there are some images that aren't pigs, such as we completely change the image to a dog, so it's normal for the classifier to identify him as not a pig.) So we need to make sure that$\*hat{\boldsymbol{x}$is close to our original input of$x$.So we write the optimization question as: $$\operatorname{maximize} \ell\left(h_{\theta}(x+\delta), y\right)$$ Here,$\Delta$is the range of changes to the image, and in theory, we want$\Delta$to include any changes that make people visually think the altered picture is the same as the original input.This may include adding a small amount of noise to rotation, translation, zooming, or performing some 3D conversion of the underlying model, or even*another shooting angle of the pig.Mathematically, however, it is impossible to give a strict definition.So we can only define a space where the maximum perturbation to the picture does not change the meaning of the picture: $$\Delta=\left{\delta:|\delta|_{\infty} \leq \epsilon\right}$$ Here$|\delta|_{\alpha}\$is defined as

$$|\delta|_{\infty}=\max {i}\left| \delta{i}\right|$$

Now let's take a look at the effectiveness of this method. The following example uses PyTorch's SDG optimizer to maximize input perturbation to minimize loss.

import torch.optim as optim
epsilon = 2./255

opt = optim.SGD([delta], lr=1e-1)

for t in range(30):
pred = model(norm(pig_tensor + delta))
loss = -nn.CrossEntropyLoss()(pred, torch.LongTensor([341]))
if t % 5 == 0:
print(t, loss.item())

loss.backward()
opt.step()
delta.data.clamp_(-epsilon, epsilon)

print("True class probability:", nn.Softmax(dim=1)(pred)[0,341].item())

0 -0.003882253309711814
5 -0.0069345044903457165
10 -0.01582527346909046
15 -0.08056001365184784
20 -11.751323699951172
25 -16.78317642211914
True class probability: 1.3113177601553616e-06


After a 30-step gradient decrease, our Resnet50 thinks that the picture is less likely to be a pig* Now let's see what the model thinks of this picture.

max_class = pred.max(dim=1)[1].item()
print("Predicted class: ", imagenet_classes[max_class])
print("Predicted probability:", nn.Softmax(dim=1)(pred)[0,max_class].item())

Predicted class:  wombat
Predicted probability: 0.9999175071716309


Now this model considers our input to be a woolly-nosed bear, which is interesting!Let's see what the pictures we actually entered look like:

plt.imshow((pig_tensor + delta)[0].detach().numpy().transpose(1,2,0))

<matplotlib.image.AxesImage at 0x7f14d0dcb358>


Unfortunately, we can't see the change in the picture with the naked eye at all. Now zoom in our delta 50 times and see what we've changed

plt.imshow((50*delta+0.5)[0].detach().numpy().transpose(1,2,0))

<matplotlib.image.AxesImage at 0x7f14cc22a908>


Therefore, by adding a small multiple of this seemingly random noise, we can create an image that looks the same as the original image but is misclassified.

### Targeted deception

Using this principle, we can further set the target of deception, such as*let's make the model think the pig is an airplane.

Unlike the above methods, we not only maximize the loss of the model output and the correct object, but also minimize the loss of the output to our target object:

$$\underset{\delta \in \Delta}{\operatorname{maximize}}\left(\ell\left(h_{\theta}(x+\delta), y\right)-\ell\left(h_{\theta}(x+\delta), y_{\mathrm{target}}\right)\right) \equiv \underset{\delta \in \Delta}{\operatorname{maximize}}\left(h_{\theta}(x+\delta){y{\mathrm{target}}}-h_{\theta}(x+\delta)_{y}\right)$$

delta = torch.zeros_like(pig_tensor, requires_grad=True)
opt = optim.SGD([delta], lr=5e-3)

for t in range(100):
pred = model(norm(pig_tensor + delta))
loss = (-nn.CrossEntropyLoss()(pred, torch.LongTensor([341])) +
nn.CrossEntropyLoss()(pred, torch.LongTensor([404])))
if t % 10 == 0:
print(t, loss.item())

loss.backward()
opt.step()
delta.data.clamp_(-epsilon, epsilon)

0 24.006044387817383
10 -0.24818801879882812
20 -8.039923667907715
30 -15.460402488708496
40 -21.939563751220703
50 -26.95309066772461
60 -31.754430770874023
70 -33.13744354248047
80 -37.07537841796875
90 -34.388519287109375

max_class = pred.max(dim=1)[1].item()
print("Predicted class: ", imagenet_classes[max_class])
print("Predicted probability:", nn.Softmax(dim=1)(pred)[0,max_class].item())

Predicted class:  airliner
Predicted probability: 0.8934944868087769


Now our model already thinks the pig is an airplane! *Similarly, our plane pigs still look the same as the original pictures*

plt.imshow((pig_tensor + delta)[0].detach().numpy().transpose(1,2,0))

<matplotlib.image.AxesImage at 0x7f14cc1ccd68>


plt.imshow((50*delta+0.5)[0].detach().numpy().transpose(1,2,0))

<matplotlib.image.AxesImage at 0x7f14cc1f37b8>


This is just an interesting example. If you are interested in antagonism training, you can look up the relevant papers for further study.