Enhanced Learning Practice | Customize Tingzi Chess in Gym Environment

Keywords: Python

In Article Enhanced Learning Practice|Customize the Gym Environment In, we learned how a simple environment should be defined and presented simply using print. In this article, we'll learn to customize a slightly more complex environment, Tingzi Chess. Recall the Tingzi game:

  • This is a two-player round game. Players use different placeholders (circles/forks), so action writing needs to distinguish players
  • The final reward is different for both players, winner + 1, loser - 1 (unless the draw + 0), reward writing needs to differentiate players
  • The final condition is that any row/column/diagonal occupies the same placeholder or the field has no space to occupy
  • From the perspective of a single player, the new state s_after action a under current state s It is not a follow-up state, but an intermediate state waiting for an opponent's action. The real follow-up state is the state s'(unless the game ends directly after taking action a), a s shown in the following figure:
  •  

In addition to the mechanism of the game itself, considering the compatibility with gym's API interface format, it is convenient to control the game process through an external loop, so env itself does not have to write code to control the game process/switch the player. What's more, we need a more vivid presentation of the environment, not print! Then, let's get there!

Step 1: Create a new file

To the directory: D:\Anaconda\envs\pytorch1.1\Lib\site-packagesgym\envs\user, create the file_u user Init_u. Py and TicTacToe_env.py (remember? Folder user is an article)   Enhanced Learning Practice|Customize the Gym Environment   The folder we created to store the custom environment).

Step 2: Write TicTacToe_env.py and u Init_u. PY

gym has a built-in drawing tool, rendering, but it's not fully functional and it's cumbersome to draw complex things. This article does not intend to go into any further research, but simply uses the basic lines/squares/circles in renderings to present the environment (more vivid game performance can be achieved entirely through pygame). Renderings are drawn in a single frame, and when env.render() is called, the drawing elements recorded in the current self.viewer.geoms are rendered. The basic elements of the environment are designed as follows:

  • Status: Represented by two-dimensional numpy.array, no placeholder value is 0, blue placeholder value is 1, and red placeholder value is -1.
  • Action: Designed as a dictionary with formatting: action = {'mark':'blue','pos': (x, y)}, where'mark'denotes the color of placeholders to distinguish players and'POS' denotes placeholder positions.
  • Reward: Lock the blue perspective, Win + 1, lose - 1, draw + 0.

TicTacToe_ The overall code for env.py is as follows:

import gym
import random
import time
import numpy as np
from gym.envs.classic_control import rendering

class TicTacToeEnv(gym.Env):
    def __init__(self):
        self.state = np.zeros([3, 3])
        self.winner = None
        WIDTH, HEIGHT = 300, 300 
        self.viewer = rendering.Viewer(WIDTH, HEIGHT)
    
    def reset(self):
        self.state = np.zeros([3, 3])
        self.winner = None
        self.viewer.geoms.clear() # Empty the elements on the drawing board that need to be drawn
        self.viewer.onetime_geoms.clear()
    
    def step(self, action):
        # Format of action: action = {'mark':'circle'/'cross', 'pos':(x,y)}# Generation State
        x = action['pos'][0]
        y = action['pos'][1]
        if action['mark'] == 'blue':  
            self.state[x][y] = 1
        elif action['mark'] == 'red': 
            self.state[x][y] = -1
        # reward
        done = self.judgeEnd() 
        if done:
            if self.winner == 'blue': 
                reward = 1 
            else:
                reward = -1
        else: reward = 0
        # Presentation
        info = {}
        return self.state, reward, done, info
      
    def judgeEnd(self):
        # Check two diagonals
        check_diag_1 = self.state[0][0] + self.state[1][1] + self.state[2][2]
        check_diag_2 = self.state[2][0] + self.state[1][1] + self.state[0][2]
        if check_diag_1 == 3 or check_diag_2 == 3:
            self.winner = 'blue'
            return True
        elif check_diag_1 == -3 or check_diag_2 == -3:
            self.winner = 'red'
            return True
        # Check three rows and three columns
        state_T = self.state.T
        for i in range(3):
            check_row = sum(self.state[i]) # Check line
            check_col = sum(state_T[i]) # Check Columns
            if check_row == 3 or check_col == 3:
                self.winner = 'blue'
                return True
            elif check_row == -3 or check_col == -3:
                self.winner = 'red'
                return True
        # Check to see if the whole board is still empty
        empty = []
        for i in range(3):
            for j in range(3):
                if self.state[i][j] == 0: empty.append((i,j))
        if empty == []: return True
        
        return False
    
    def render(self, mode='human'):
        SIZE = 100
        # Draw Separator Line
        line1 = rendering.Line((0, 100), (300, 100))
        line2 = rendering.Line((0, 200), (300, 200))
        line3 = rendering.Line((100, 0), (100, 300))
        line4 = rendering.Line((200, 0), (200, 300))
        line1.set_color(0, 0, 0)
        line2.set_color(0, 0, 0)
        line3.set_color(0, 0, 0)
        line4.set_color(0, 0, 0)
        # Add drawing elements to the drawing board
        self.viewer.add_geom(line1)
        self.viewer.add_geom(line2)
        self.viewer.add_geom(line3)
        self.viewer.add_geom(line4)
        # according to self.state Draw placeholder
        for i in range(3):
            for j in range(3):
                if self.state[i][j] == 1:
                    circle = rendering.make_circle(30) # Draw a circle with a diameter of 30
                    circle.set_color(135/255, 206/255, 250/255) # mark = blue
                    move = rendering.Transform(translation=(i * SIZE + 50, j * SIZE + 50)) # Create Pan Operation
                    circle.add_attr(move) # Add a translation operation to the properties of a circle
                    self.viewer.add_geom(circle) # Add a circle to the drawing board
                if self.state[i][j] == -1:
                    circle = rendering.make_circle(30)
                    circle.set_color(255/255, 182/255, 193/255) # mark = red
                    move = rendering.Transform(translation=(i * SIZE + 50, j * SIZE + 50))
                    circle.add_attr(move)
                    self.viewer.add_geom(circle)
            
        return self.viewer.render(return_rgb_array=mode == 'rgb_array')

In u Init_u. Introducing class information in py, adding:

from gym.envs.user.TicTacToe_env import TicTacToeEnv

Step 3: Register Environment

To the directory: D:\Anaconda\envs\pytorch1.1Lib\site-packages\gym, open_u Init_u. Py, add code:

register(
    id="TicTacToeEnv-v0",
    entry_point="gym.envs.user:TicTacToeEnv",
    max_episode_steps=20,    
)

Step 4: Test environment

In the test code, we keep the game going in the main loop. The blue and red players randomly select a space action at a 0.5s interval with the following code:

import gym
import random
import time

# View all registered environments
# from gym import envs
# print(envs.registry.all()) 

def randomAction(env_, mark): # Randomly Select Unoccupied Grid Action
    action_space = []
    for i, row in enumerate(env_.state):
        for j, one in enumerate(row):
            if one == 0: action_space.append((i,j))  
    action_pos = random.choice(action_space)
    action = {'mark':mark, 'pos':action_pos}
    return action

def randomFirst():
    if random.random() > 0.5: # Random First Hand
        first_, second_ = 'blue', 'red'
    else: 
        first_, second_ = 'red', 'blue'
    return first_, second_

env = gym.make('TicTacToeEnv-v0')
env.reset() # For the first time step Reset environment first or error will occur
first, second = randomFirst()
while True:
    # First-hand action
    action = randomAction(env, first)
    state, reward, done, info = env.step(action)
    env.render()
    time.sleep(0.5)
    if done: 
        env.reset()
        env.render()
        first, second = randomFirst()
        time.sleep(0.5)
        continue
    # Behind-the-scenes action
    action = randomAction(env, second)
    state, reward, done, info = env.step(action)
    env.render()
    time.sleep(0.5)
    if done: 
        env.reset()
        env.render()
        first, second = randomFirst()
        time.sleep(0.5)
        continue

The effect is as follows:

 

 

Posted by paruby on Sun, 05 Dec 2021 11:04:47 -0800