[deep learning] village to network -- on the new features of tensorflow Eagle execution mechanism

Keywords: Lambda Python network Session

Article directory

Preface

This article is [deep learning] village to network -- on the difference between static graph and dynamic graph of tensorflow Eagle execution mechanism (1) For the follow-up explanation, if you haven't read the previous article, you can read it first and then understand it.
Following the above, I have introduced several features above, but there are few specific examples. This article focuses on and examples, so that you can have a deeper understanding of the convenience brought by the Eager Execution mechanism.
All of the following examples rely on the following code blocks:

import tensorflow as tf
import numpy as np
import tensorflow.contrib.eager as tfe

# Enable dynamic graph mechanism
tfe.enable_eager_execution()

Convolution directly using operation

# Randomly generate two pictures, i.e. batch_size=2, size 28x28, three channels
images=np.random.randn(2,28,28,3).astype(np.float32)

# Convolution kernel parameter
filter=tf.get_variable("conv_w0",shape=[5,5,3,20],initializer=tf.truncated_normal_initializer)

# Convolution operation can be performed on the generated batch_size number to obtain the result immediately
conv=tf.nn.conv2d(input=images,filter=filter,strides=[1,2,2,1],padding="VALID")

# print(conv)

# shape of the numpy array showing the result
print(conv.numpy().shape)

The operation results are as follows:

(2, 12, 12, 20)

The shape of the convolution result is obtained successfully.

Automatically calculate gradient (derivative)

After opening the Eager mode, the forward propagation is very intuitive and easy to understand, but how to find the gradient? In tfe, there are four functions that directly serve back propagation. They are:

  • tfe.gradients_function
  • tfe.value_and_gradients_function
  • tfe.implicit_gradients
  • tfe.implicit_value_and_gradients
    Among them, the first two functions are used to find the gradient of the input function relative to all its parameters, that is, to find the derivative of all the input parameters of the function; the last two functions are used to find the derivative of all the variables used in the calculation process. These four functions are decorators in python, which can be written as follows:
@tfe.gradients_function
def f(x, y):
    return x ** 2 + y ** 3
# Calculate the derivative of f(x,y) at x=1,y=2
f(1., 2.)

Of course, you can pass in the function name as a parameter as a callback function without using the decorator:

def f(x, y):
    return x ** 2 + y ** 3
g = tfe.gradients_function(f)
# The derivative of f(x,y) at x=1,y=2 can also be calculated
g(1., 2.)

Calculate the gradient of all parameters

Tfe.gradients'function and tfe.value'and'gradients'function are used to derive the input parameters of the function, that is, their input is a function, and their output is the gradient function of the input function relative to all its parameters.
The difference is that tfe.gradients'function returns in the format of [parameter 1 gradient, parameter 2 gradient ], tfe.value and gradients function return in the format (function value, [parameter 1 gradient, parameter 2 gradient]). The latter has one more calculated function value than the former.

def func(x):
	return x*x+2*x+4.0

# Pass in function name as parameter as callback function
grad1=tfe.gradients_function(func)
print(grad1(2.0))

grad2=tfe.value_and_gradients_function(func)
print(grad2(2.0))

Output:

[<tf.Tensor: id=13, shape=(), dtype=float32, numpy=6.0>]
(<tf.Tensor: id=19, shape=(), dtype=float32, numpy=12.0>, [<tf.Tensor: id=24, shape=(), dtype=float32, numpy=6.0>])

Taking x=2 into func function, the value of function is 22 + 22 + 4 = 12. After deriving func: dx=2*x+2, then dx(x=2)=6.

Calculate the gradient of all variables

In practice, we want to derive the variables in the graph, because the variables in the model are the parts we need to optimize for gradient descent.
The functions of TFE. Implicit_grades and TFE. Implicit_value_and_grades are the functions of deriving "all variables used in the calculation process". The difference between them is: tfe.implicit gradients returns [(gradient 1, variable value 1), (gradient 2, variable value 2) ], tfe.implicit_value_and_gradients returns (function value, [(gradient 1, variable value 1), (gradient 2, variable value 2) ). Similarly, the latter has more calculated function values than the former.

x=tfe.Variable(initial_value=1.0,name="x")
y=tfe.Variable(initial_value=1.0,name="y")

def func(a):
	return x+2*a*x*y

grad1=tfe.implicit_gradients(func)
print(grad1(5.0))

grad2=tfe.implicit_value_and_gradients(func)
print(grad2(5.0))

Output:

[(<tf.Tensor: id=26, shape=(), dtype=float32, numpy=11.0>, <tf.Variable 'x:0' shape=() dtype=float32, numpy=1.0>), (<tf.Tensor: id=23, shape=(), dtype=float32, numpy=10.0>, <tf.Variable 'y:0' shape=() dtype=float32, numpy=1.0>)]
(<tf.Tensor: id=40, shape=(), dtype=float32, numpy=11.0>, [(<tf.Tensor: id=45, shape=(), dtype=float32, numpy=11.0>, <tf.Variable 'x:0' shape=() dtype=float32, numpy=1.0>), (<tf.Tensor: id=42, shape=(), dtype=float32, numpy=10.0>, <tf.Variable 'y:0' shape=() dtype=float32, numpy=1.0>)])

Reference: https://blog.csdn.net/guangcheng0312q/article/details/100117550

Using Python program flow to control model flow

As mentioned above, if there is a way of placeholder and feed in a static graph, you can't use if else, while and other statements. Because there is no data input in the process of composition of a static graph, you don't know which way to go. In this case, an error will be reported in the process of composition. For example:
The static graph needs to use tf.cond() function to control the deconstruction of the model.

def relu(x):
	# Use if else conditional control block
	if(x>=0):
		return x
	else:
		return tf.zeros(x.shape,dtype=tf.float64)

def true_relu(x):
	# Using tf.cond instead of if else to realize conditional control
	cond=tf.greater_equal(x,tf.constant(0.0,dtype=tf.float64))
	zeros=tf.zeros(x.shape,dtype=tf.float64)

	y=tf.cond(cond[0][0],lambda:x,lambda: zeros)
	return y

# Create a graph manually
g2 = tf.Graph()
with g2.as_default():
	# Placeholders are needed for input layers that do not enter specific values in static graphs
	x1=tf.placeholder(tf.float64,shape=(1,1))
	x2=tf.placeholder(tf.float64,shape=(1,1))

	result1=true_relu(x1)

	# Use result2=x2*relu(x2) #Report wrong
	result2=true_relu(x2)

with tf.Session(graph=g2) as sess:
	print(sess.run(fetches=[result1,result2],feed_dict={x1:np.asarray([[5.0]]),x2:np.asarray([[-5.0]])}))

Output:

[array([[5.]]), array([[-0.]])]

You said, I don't care. I just want to use if else statement to control. When result2=x2*relu(x2) is executed, an error will be reported:

xxxx(Not looking ahead) use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.

He will prompt "use Tensorflow operations (such as tf.cond) to execute subgraphs with tensor value as the condition", so let's save it. Let's look at the dynamic graph:

def relu1(x):
	# Use if else conditional control block
	if(x>=0):
		return x
	else:
		return tf.zeros(x.shape,dtype=tf.float64)

def relu2(x):
	# Using tf.cond instead of if else to realize conditional control
	cond=tf.greater_equal(x,tf.constant(0.0,dtype=tf.float64))
	zeros=tf.zeros(x.shape,dtype=tf.float64)

	y=tf.cond(cond[0][0],lambda:x,lambda: zeros)
	return y

# Input numpy array directly in dynamic graph
x1=np.asarray([[-5.0]])
x2=np.asarray([[5.]])


result1=relu1(x1)
result2=relu1(x2)

print("result1:{},result2:{}".format(result1,result2))

result3=relu2(x1)
result4=relu2(x2)

print("result3:{},result4:{}".format(result3,result4))

Output:

result1:[[-0.]],result2:[[5.]]
result3:[[0.]],result4:[[5.]]

The program can execute normally.

Automatic optimization

For dynamic graph mechanism, we can directly transfer loss function to optimizer for optimization:

w=tf.get_variable("w",initializer=3.0)

def loss(x):
	return w*x

# Create optimizer
optimizer=tf.train.GradientDescentOptimizer(learning_rate=0.1)

for i in range(5):
	optimizer.minimize(lambda:loss(5.0))
	print(w.numpy())

Output:

2.5
2.0
1.5
1.0
0.5

As you can see, the value of variable w is constantly updated to reduce the value of loss.
The minimize function in the optimizer is used to minimize loss and update var_list
This function is a simple combination of the compute ﹣ gradients() and apply ﹣ gradients() functions. It returns as an optimized and updated var ﹣ list. If global ﹣ step is not None, this operation will also do an auto increment operation for global ﹣ step.

  • Calculate ﹣ gradients() function: calculates the gradient of loss for variables in var ﹣ list. This function is the first part of the function minimize(), and returns a list composed of tuples (variables)
  • Apply gradients() function: apply the calculated gradients to the variables, and modify the variables. Generally, the gradient descent method calculates X1 learning and assigns the result to x1, so as to complete the update of x1. This function is the second part of the function minimize(). If there are more than one variable, it will be updated, and the global ﹣ step will be incremented automatically. It should be noted that the calculation of this step is different for different optimization operators. For example, the calculation process of adam optimizer and momentum optimizer is different, but the purpose is only one optimization parameter.

The gradient calculated in one step can be processed between the compute gradients() function and the apply gradients() function, thus becoming a custom optimizer based on the original optimizer. In fact, we can define a new optimizer from the beginning to the end using the gradient function mentioned above.

Published 63 original articles, won praise 16, visited 10000+
Private letter follow

Posted by bubatalazi on Thu, 13 Feb 2020 02:09:50 -0800