Question 2 (5 points): Minimax
Firstly, it is clear that there are two kinds of nodes in the pattern tree of minimax algorithm: Max node and mini node. Because minimax algorithm is a pessimistic algorithm, that is, we need to find the maximum value in the minimum income we can obtain, so we are the max node and the other is the mini node.
In Question2, we need to solve the Pac Man game with minimax algorithm. First, initialize a max node maxval, and traverse all feasible solutions from the initial position of the pac man. At this time, the value to be solved should be the value on the mini node and compared with the initialized Max node. If value > maxval, update the value of maxval, write down the best choice, and return to the best choice after traversal.
Through analysis, we can know that to perform the above operations, two functions need to be defined, one is to obtain the value getMin of the mini node, and the other is to obtain the value getMax of the max node. The ideas of these two functions are similar. They are all used to calculate the next action through recursive calls. The termination condition is that the depth reaches the limit or there is no feasible next step.
` "*** YOUR CODE HERE ***" #initialization maxVal = -float('inf') bestAction = None # From the pac man's original position, traverse all feasible next steps for action in gameState.getLegalActions(0): value = self._getMin(gameState.generateSuccessor(0, action)) if value is not None and value > maxVal: maxVal = value bestAction = action # Finally, return to the best choice return bestAction def _getMax(self, gameState, depth=0, agentIndex=0): # Obtain all legal operations of pac man in the next step legalActions = gameState.getLegalActions(agentIndex) # Termination conditions if depth == self.depth or len(legalActions) == 0: return self.evaluationFunction(gameState) maxVal = -float('inf') for action in legalActions: # Start with the first ghost value = self._getMin(gameState.generateSuccessor(agentIndex, action), depth, 1) if value is not None and value > maxVal: maxVal = value return maxVal def _getMin(self, gameState, depth=0, agentIndex=1): # Get the next legal operation of the ghosts legalActions = gameState.getLegalActions(agentIndex) # Termination conditions if depth == self.depth or len(legalActions) == 0: return self.evaluationFunction(gameState) minVal = float('inf') # ergodic for action in legalActions: # If the current is the last ghost, the next round is to calculate the pac man's behavior, that is, call the MAX function if agentIndex == gameState.getNumAgents() - 1: value = self._getMax(gameState.generateSuccessor(agentIndex, action), depth + 1, 0) else: value = self._getMin(gameState.generateSuccessor(agentIndex, action), depth, agentIndex + 1) if value is not None and value < minVal: minVal = value return minVal
Question 3 (5 points): Alpha-Beta Pruning
Alpha beta algorithm is the optimization of minimax algorithm. Minimax algorithm is an exhaustive algorithm, which needs to traverse all nodes. Alpha beta algorithm can improve the efficiency of the algorithm by pruning and subtracting unnecessary nodes. In this algorithm α Represents the maximum lower bound of all possible solutions at present, β Represents the minimum upper bound of all possible solutions at present. In the process of solving, α and β Will gradually approach. If for a node, a α > β In this case, it shows that this point will not produce the optimal solution. Therefore, pruning is completed without extending it.
The code of alpha beta algorithm is similar to that of minimaxs algorithm, but the judgment condition is to judge the current value and α,β Relationship.
"*** YOUR CODE HERE ***" # Expand from the root node to find the MAX value return self._getMax(gameState)[1] def _getMax(self, gameState, depth=0, agentIndex=0, alpha=-float('inf'), beta=float('inf')): # Termination conditions legalActions = gameState.getLegalActions(agentIndex) if depth == self.depth or len(legalActions) == 0: return self.evaluationFunction(gameState), None # Traversing the possible next step of Pac Man maxVal = None bestAction = None for action in legalActions: # Traverse all ghosts value = self._getMin(gameState.generateSuccessor(agentIndex, action), depth, 1, alpha, beta)[0] if value is not None and (maxVal == None or value > maxVal): maxVal = value bestAction = action # according to α-β Pruning algorithm, if v > β, Returns v directly if value is not None and value > beta: return value, action # according to α-β Pruning algorithm needs to be updated here α Value of if value is not None and value > alpha: alpha = value return maxVal, bestAction def _getMin(self, gameState, depth=0, agentIndex=0, alpha=-float('inf'), beta=float('inf')): # Termination conditions legalActions = gameState.getLegalActions(agentIndex) if depth == self.depth or len(legalActions) == 0: return self.evaluationFunction(gameState), None # Traverse the next possible step of the current ghost minVal = None bestAction = None for action in legalActions: if agentIndex >= gameState.getNumAgents() - 1: # It's much different from minimax α and β Value of value = self._getMax(gameState.generateSuccessor(agentIndex, action), depth + 1, 0, alpha, beta)[0] else: # If it is not the last ghost, continue to traverse the next ghost, that is, agentIndex+1 value = \ self._getMin(gameState.generateSuccessor(agentIndex, action), depth, agentIndex + 1, alpha, beta)[0] if value is not None and (minVal == None or value < minVal): minVal = value bestAction = action # according to α-β Pruning algorithm, if v< α, Returns v directly if value is not None and value < alpha: return value, action # according to α-β Pruning algorithm needs to be updated here β Value of if value is not None and value < beta: beta = value return minVal, bestAction`
Question 4 (5 points): Expectimax
stay α-β In the pruning algorithm, we cut unnecessary search branches, α-β The pruning algorithm defines a( α-β) In the process of our downward search, there may be more than α If the value is large, we return it immediately. We don't need to know its definite value, so we cut it. We don't know the node sorting of cutting. If possible, the children of max node are arranged in descending order and the children of min node are arranged in ascending order, then we can search for the least number of nodes, but this obviously doesn't hold, and we can't know in advance. So we Using iteration depth limit and loop deepening, search down k steps and replace with the maximum expected value to avoid missing better cases.
def _getExpectation(self, gameState, depth=0, agentIndex=0): legalActions = gameState.getLegalActions(agentIndex) # If the search depth exceeds the limit or there is no next step, the evaluation function value is returned if depth == self.depth or len(legalActions) == 0: return self.evaluationFunction(gameState) # Total initialization utility values totalUtil = 0 numActions = len(legalActions) # Poll the current ghost for all possible next steps for action in legalActions: # Similarly, if it is the last ghost, the next step is to calculate the MAX value of the pac man and count it into the total utility if agentIndex >= gameState.getNumAgents() - 1: totalUtil += self._getMax(gameState.generateSuccessor(agentIndex, action), depth + 1, 0) # Otherwise, go through the ghosts one by one, calculate the Expectation value, and count it into the total utility else: totalUtil += self._getExpectation(gameState.generateSuccessor(agentIndex, action), depth, agentIndex + 1) # Finally, we need to average all possible utility values in the next step and return them return totalUtil / float(numActions)
Question 5 (6 points): Evaluation Function
In this problem, we need to design a better evaluation function to make PAC eaters score higher and more efficient.
A better evaluation function should take into account both the position of beans and the score of monsters, so that bean eaters can eat more beans and avoid collision with monsters. Manhattan distance is used here.
# Distance assessment of nearest beans distancesToFoodList = [util.manhattanDistance(newPos, foodPos) for foodPos in newFood.asList()] if len(distancesToFoodList) > 0: score += WEIGHT_FOOD / min(distancesToFoodList) else: score += WEIGHT_FOOD
# Ghost distance assessment for ghost in newGhostStates: distance = manhattanDistance(newPos, ghost.getPosition()) if distance > 0: if ghost.scaredTimer > 0: # If scared, add points score += WEIGHT_SCARED_GHOST / distance else: # If not, decrease points score += WEIGHT_GHOST / distance else: return -INF # Pacman is dead at this point