本篇博客向大家介绍一个利用强化Q学习求解迷宫问题的实例。
在这个问题中,机器人只能向上下左右四个方向移动。在每一步,基于机器人动作的结果,它被教导和再教导是否是一个好的动作,最终整个过程被一次又一次地重复,直到它到达目的地。在这一点上,该过程将再次开始,以便可以验证所学到的东西,并且可以忘记第一遍中所做的不必要的动作,等等。这是一个很好的教学例子,在这种情况下,学习必须在旅途中进行,即不使用训练例子。可用于游戏中,学习和提高人工智能算法与人类玩家和其他几种场景的竞争能力。在小迷宫中,收敛速度很快,而在大迷宫中,收敛可能需要一些时间。您可以通过修改代码来提高收敛速度,从而提高问题学习的效率。
实例中一共有四个m文件,实现的功能如下:
共包含两个地图,分别是:
下面是一个9*9的地图:
% 3rd party file used for reading the maze
function C = textscanu(filename, encoding, del_sym, eol_sym, wb)
% C = textscanu(filename, encoding) reads Unicode
% strings from a file and outputs a cell array of strings.
%
% Syntax:
% -------
% filename - string with the file's name and extension
% example: 'unicode.txt'
% encoding - encoding of the file
% default: UTF-16LE
% examples: UTF16-LE (little Endian), UTF8.
% See http://www.iana.org/assignments/character-sets
% MS Notepad saves in UTF-16LE ('Unicode'),
% UTF-16BE ('Unicode big endian'), UTF-8 and ANSI.
% del_sym - column delimitator symbol in ASCII numeric code
% default: 9 (tabulator)
% eol_sym - end of line delimitator symbol in ASCII numeric code
% default: 13 (carriage return) [Note: line feed=10]
% wb - displays a waitbar if wb = 'waitbar'
%
% Example:
% -------
% C = textscanu('unicode.txt', 'UTF8', 9, 13);
% Reads the UTF8 encoded file 'unicode.txt', which has
% columns and lines delimited by tabulators, respectively
% carriage returns. Shows a waitbar to make the progress
% of the functions action visible.
%
% Created by: Vlad Atanasiu / [email protected]
switch nargin
case 5
if strcmp(wb, 'waitbar') == 1;
h = waitbar(0,''); % display waitbar
end
case 4
h = 0;
case 3
h = 0;
eol_sym = 13;
case 2
h = 0;
eol_sym = 13; % end of line symbol (CR=13, LF=10)
del_sym = 9; % column delimitator symbol (TAB=9)
case 1
h = 0;
eol_sym = 13;
del_sym = 9;
encoding = 'UTF16-LE';
end
warning off MATLAB:iofun:UnsupportedEncoding;
% read input
fid = fopen(filename, 'r', 'l', encoding);
S = fscanf(fid, '%c');
fclose(fid);
% remove Byte Order Marker and add an
% end of line mark at the end of the file
S = [S(2:end) char(eol_sym)];
% locates column delimitators and end of lines
del = find(abs(S) == del_sym);
eol = find(abs(S) == eol_sym);
% get number of rows and columns in input
row = numel(eol);
col = 1 + numel(del) / row;
C = cell(row,col); % output cell array
% catch errors in file
if col - fix(col) ~= 0
error(['Error: The file has an odd number of columns ',...
'or line ends are malformed.'])
end
m = 1;
n = 1;
sos = 1;
% parse input
if col == 1
% single column input
for r = 1:row
if h ~= 0
waitbar( r/row, h, [num2str(r), '/', num2str(row)] )
end
eos = eol(n) - 1;
C(r,col) = {S(sos:eos)};
n = n + 1;
sos = eos + 3;
end
else
% multiple column input
for r = 1:row
if h ~= 0
waitbar( r/row, h, [num2str(r), '/', num2str(row)] )
end
for c = 1:col-1
eos = del(m) - 1;
C(r,c) = {S(sos:eos)};
sos = eos + 2;
m = m + 1;
end
% last string in the row
sos = eos + 2;
eos = eol(n) - 1;
C(r,col) = {S(sos:eos)};
n = n + 1;
sos = eos + 3;
end
end
%close(h)
%Copyright (c) Asad Ali
%Website: https://sites.google.com/site/asad82/code
%Email: [email protected]
function [maze2D,row,col] = Read_Maze(fileName)
% read the maze from file
C = textscanu(fileName, 'UTF8', 9, 13);
% convert the maze into a 2D matrix
maze1D = C{1};
[xx,yy] = find(maze1D == 10);
numCol = round(size(maze1D,2)/size(xx,2));
numRow = size(xx,2);
%maze2D = zeros(numRow,numCol);
rowIndex = 1; colIndex = 1;
for i=1:size(maze1D,2)
if maze1D(1,i) == 10
% carriage return
rowIndex = rowIndex + 1;
colIndex = 1;
elseif maze1D(1,i) == 'G'
% goal
maze2D(rowIndex,colIndex) = 100;
colIndex = colIndex + 1;
elseif maze1D(1,i) == 'S'
% start point
maze2D(rowIndex,colIndex) = 60;
row = rowIndex; col = colIndex;
colIndex = colIndex + 1;
elseif maze1D(1,i) == ' '
% space
maze2D(rowIndex,colIndex) = 50;
colIndex = colIndex + 1;
else
% bump
maze2D(rowIndex,colIndex) = 0;
colIndex = colIndex + 1;
end
end
% This work was done as part of a course while I was a graduate student in
% the University of Tokyo in spring 2011 while working for late Professor Carson
% Reynolds of the Masatoshi Ishikawa Lab, Graduate School of Information
% Science and Technology
% This code demonstrates the reinforcement learning (Q-learning) algorithm using an example of a maze
% in which a robot has to reach its destination by moving in the left, right,
% up and down directions only. At each step, based on the outcome of the
% robot action it is taught and re-taught whether it was a good move or not
% eventually the whole process is repeated time and again until it reaches
% its destination. At this point the process will start again so
% that what ever has been learned can be verified and un-necessary moves
% made during the first pass can be forgotten and so on. It is good tutorial example
% for situations in which learning has to be done on the go i.e. without
% the use of training examples. Can be used in games to learn and improve the
% competitive capability of AI algorithm with that of human players and
% several other scenarios.
% This is a random version for comparison of convergence time with that of
% Q-learning algorithm
% There are four m-files
% QLearning_Maze_Walk.m - demonstrates the working of Q-learning algorithm on a selected maze
% Random_Maze_Walk.m - demonstrates the working of random selection for comparison
% Read_Maze.m - will read the maze provided as input and translate into numeric representation for processing
% Textscanu.m - reads the raw maze text file
% Two maze files are included:
% maze-9-9.txt
% maze-61-21.txt
% which can be provided as input by changing the fileName in the code
function Random_Maze_Walk
clear all;
close all;
global maze2D;
global tempMaze2D;
DISPLAY_FLAG = 1; % 1 means display maze and 0 means no display
NUM_ITERATIONS = 10; % change this value to set max iterations
% initialize global variable about robot orientation
currentDirection = 1; % robot is facing up
% row col will be initalized with the position of starting point
% in the loop in which maze is read below
fileName = 'maze-9-9.txt';
[maze2D,row,col] = Read_Maze(fileName);
% show the maze
imagesc(maze2D),colorbar
% make some copies of maze to use later for display
orgMaze2D = maze2D;
orgMaze2D(row,col) = 50;
[goalX,goalY,val] = find(orgMaze2D == 100);
tempMaze2D = orgMaze2D;
% robots starting position
startX = row;
startY = col;
% Direction selection
% 0 means turn left
% 1 means turn right
% 2 means Move Ahead in current direction
% if this value is set to 2 then random walker will have one more action
NUM_DIRECTIONS = 1; % 1, 2
for j=1:NUM_ITERATIONS
status = -1;
countActions = 0;
countSteps = 0;
tempMaze2D(goalX,goalY) = 100;
row = startX; col = startY;
currentDirection = 1;
while status ~= 3
% select whether to call Turn Left or Turn Right below
direction = round(rand*NUM_DIRECTIONS);
% get a rand number between 0 - 3 to turn left right in selected direction
% that many times
randMove = round(rand*3);
for i=0:randMove
if direction == 0
% Turn Left and then move ahead
currentDirection = TurnLeft(currentDirection);
elseif direction == 1
% Turn Right and then move ahead
currentDirection = TurnRight(currentDirection);
end
end
[row,col,status] = MoveAhead(row,col,currentDirection);
% count the steps required to reach the goal
if status == 1
countSteps = countSteps + 1;
end
% count actions taken to reach the goal
countActions = countActions + randMove + 1;
% display the maze after some steps
if rem(countActions,1) == 0 & DISPLAY_FLAG == 1
% calculate Manhattan distance between current location and goal
X = [row col];
Y = [goalX goalY];
dist = norm(X-Y,1);
s = sprintf('Manhattan Distance = %f',dist);
imagesc(tempMaze2D)%,colorbar;
title(s);
drawnow
end
end
% display the final maze
imagesc(tempMaze2D);
disp(countActions);
disp(countSteps);
iterationCountA(j,1) = countActions;
iterationCountS(j,1) = countSteps;
%bar(iterationCountA);
%drawnow
end
figure,bar(iterationCountS); title('Steps Plot')
figure,bar(iterationCountA); title('Actions Plot')
meanA = mean(iterationCountA);
disp('----Mean Result Actions -----')
disp(meanA);
disp('----Mean Result Steps -----')
meanS = mean(iterationCountS);
disp(meanS);
%-------------------------------%
% 1
% 2 3
% 4
% Current Direction
% 1 - means robot facing up
% 2 - means robot facing left
% 3 - means robot facing right
% 4 - means robot facing down
%------------------------------%
% based on the current direction and convention rotate the robot left
function currentDirection = TurnLeft(currentDirection)
if currentDirection == 1
currentDirection = 2;
elseif currentDirection == 2
currentDirection = 4;
elseif currentDirection == 4
currentDirection = 3;
elseif currentDirection == 3
currentDirection = 1;
end
% based on the current direction and convention rotate the robot right
function currentDirection = TurnRight(currentDirection)
if currentDirection == 1
currentDirection = 3;
elseif currentDirection == 3
currentDirection = 4;
elseif currentDirection == 4
currentDirection = 2;
elseif currentDirection == 2
currentDirection = 1;
end
% return the information just in front of the robot (local)
function [val,valid] = LookAhead(row,col,currentDirection)
global maze2D;
valid = 0;
if currentDirection == 1
if row-1 >= 1 & row-1 <= size(maze2D,1)
val = maze2D(row-1,col);
valid = 1;
end
elseif currentDirection == 2
if col-1 >= 1 & col-1 <= size(maze2D,2)
val = maze2D(row,col-1);
valid = 1;
end
elseif currentDirection == 3
if col+1 >= 1 & col+1 <= size(maze2D,2)
val = maze2D(row,col+1);
valid = 1;
end
elseif currentDirection == 4
if row+1 >= 1 & row+1 <= size(maze2D,1)
val = maze2D(row+1,col);
valid = 1;
end
end
% status = 1 then move ahead successful
% status = 2 then bump into wall or boundary
% status = 3 then goal achieved
% Move the robot to the next location if no bump
function [row,col,status] = MoveAhead(row,col,currentDirection)
global tempMaze2D;
% based on the current direction check whether next location is space or
% bump and get information of use below
[val,valid] = LookAhead(row,col,currentDirection);
% check if next location for moving is space
% other wise set the status
% this checks the collision with boundary of maze
if valid == 1
% now check if the next location for space or bump
% this is for walls inside the maze
if val > 0
oldRow = row; oldCol = col;
if currentDirection == 1
row = row - 1;
elseif currentDirection == 2
col = col - 1;
elseif currentDirection == 3
col = col + 1;
elseif currentDirection == 4
row = row + 1;
end
status = 1;
if val == 100
% goal achieved
status = 3;
disp(status);
end
% update the current position of the robot in maze for display
tempMaze2D(oldRow,oldCol) = 50;
tempMaze2D(row,col) = 60;
elseif val == 0
% bump into wall
status = 2;
end
else
% return a bump signal if valid is 0
status = 2;
end
% This work was done as part of a course while I was a graduate student in
% the University of Tokyo in spring 2011 while working for late Professor Carson
% Reynolds of the Masatoshi Ishikawa Lab, Graduate School of Information
% Science and Technology
% This code demonstrates the reinforcement learning (Q-learning) algorithm using an example of a maze
% in which a robot has to reach its destination by moving in the left, right,
% up and down directions only. At each step, based on the outcome of the
% robot action it is taught and re-taught whether it was a good move or not
% eventually the whole process is repeated time and again until it reaches
% its destination. At this point the process will start again so
% that what ever has been learned can be verified and un-necessary moves
% made during the first pass can be forgotten and so on. It is good tutorial example
% for situations in which learning has to be done on the go i.e. without
% the use of training examples. Can be used in games to learn and improve the
% competitive capability of AI algorithm with that of human players and
% several other scenarios.
% On small maze the convergence will be fast where as on large maze
% convergence can take some time. You can improve convergence speed by
% modifying the code to make Q-learning efficient.
% There are four m-files
% QLearning_Maze_Walk.m - demonstrates the working of Q-learning algorithm on a selected maze
% Random_Maze_Walk.m - demonstrates the working of random selection for comparison
% Read_Maze.m - will read the maze provided as input and translate into numeric representation for processing
% Textscanu.m - reads the raw maze text file
% Two maze files are included:
% maze-9-9.txt
% maze-61-21.txt
% which can be provided as input by changing the fileName in the code
function QLearning_Maze_Walk
clear all;
close all;
global maze2D;
global tempMaze2D;
DISPLAY_FLAG = 1; % 1 means display maze and 0 means no display
NUM_ITERATIONS = 100; % change this value to set max iterations
% initialize global variable about robot orientation
currentDirection = 1; % robot is facing up
% row col will be initalized with the position of starting point of robot
% in the loop in which maze is read below
fileName = 'maze-9-9.txt';
[maze2D,row,col] = Read_Maze(fileName);
imagesc(maze2D) % show the maze
% make some copies of maze to use later for display
orgMaze2D = maze2D;
orgMaze2D(row,col) = 50;
[goalX,goalY,val] = find(orgMaze2D == 100);
tempMaze2D = orgMaze2D;
% record robots starting location for use later
startX = row;
startY = col;
% build a state action matrix by finding all valid states from maze
% we have four actions for each state.
Q = zeros(size(maze2D,1),size(maze2D,2),4);
% only used for priority visiting for larger maze
%visitFlag = zeros(size(maze2D,1),size(maze2D,2));
% status message for goal and bump
GOAL = 3;
BUMP = 2;
% learning rate settings
alpha = 0.8;
gamma = 0.5;
for i=1:NUM_ITERATIONS
tempMaze2D(goalX,goalY) = 100;
row = startX; col = startY;
status = -1;
countActions = 0;
currentDirection = 1;
% only used for priority visiting for larger maze
% visitFlag = zeros(size(maze2D,1),size(maze2D,2));
% visitFlag(row,col) = 1;
while status ~= GOAL
% record the current position of the robot for use later
prvRow = row; prvCol = col;
% select an action value i.e. Direction
% which has the maximum value of Q in it
% if more than one actions has same value then select randomly from them
[val,index] = max(Q(row,col,:));
[xx,yy] = find(Q(row,col,:) == val);
if size(yy,1) > 1
index = 1+round(rand*(size(yy,1)-1));
action = yy(index,1);
else
action = index;
end
% based on the selected actions correct the orientation of the
% robot to conform to rules of simulator
while currentDirection ~= action
currentDirection = TurnLeft(currentDirection);
% count the actions required to reach the goal
countActions = countActions + 1;
end
% do the selected action i.e. MoveAhead
[row,col,status] = MoveAhead(row,col,currentDirection);
% count the actions required to reach the goal
countActions = countActions + 1;
% Get the reward values i.e. if final state then max reward
% if bump into a wall then -1 is the reward for that action
% other wise the reward value is 0
if status == BUMP
rewardVal = -1;
elseif status == GOAL
rewardVal = 1;
else
rewardVal = 0;
end
% enable this piece of code if testing larger maze
% if visitFlag(row,col) == 0
% rewardVal = rewardVal + 0.2;
% visitFlag(row,col) = 1;
% else
% rewardVal = rewardVal - 0.2;
% end
% update information for robot in Q for later use
Q(prvRow,prvCol,action) = Q(prvRow,prvCol,action) + alpha*(rewardVal+gamma*max(Q(row,col,:)) - Q(prvRow,prvCol,action));
% display the maze after some steps
if rem(countActions,1) == 0 & DISPLAY_FLAG == 1
X = [row col];
Y = [goalX goalY];
dist = norm(X-Y,1);
s = sprintf('Manhattan Distance = %f',dist);
imagesc(tempMaze2D);%,colorbar;
title(s);
drawnow
end
end
iterationCount(i,1) = countActions;
% display the final maze
imagesc(tempMaze2D);%,colorbar;
disp(countActions);
%bar(iterationCount);
drawnow
end
figure,bar(iterationCount)
disp('----- Mean Result -----')
meanA = mean(iterationCount);
disp(meanA);
%save Q_Learn_9-9.mat;
%-------------------------------%
% 1
% 2 3
% 4
% Current Direction
% 1 - means robot facing up
% 2 - means robot facing left
% 3 - means robot facing right
% 4 - means robot facing down
%------------------------------%
% based on the current direction and convention rotate the robot left
function currentDirection = TurnLeft(currentDirection)
if currentDirection == 1
currentDirection = 2;
elseif currentDirection == 2
currentDirection = 4;
elseif currentDirection == 4
currentDirection = 3;
elseif currentDirection == 3
currentDirection = 1;
end
% based on the current direction and convention rotate the robot right
function currentDirection = TurnRight(currentDirection)
if currentDirection == 1
currentDirection = 3;
elseif currentDirection == 3
currentDirection = 4;
elseif currentDirection == 4
currentDirection = 2;
elseif currentDirection == 2
currentDirection = 1;
end
% return the information just in front of the robot (local)
function [val,valid] = LookAhead(row,col,currentDirection)
global maze2D;
valid = 0;
if currentDirection == 1
if row-1 >= 1 & row-1 <= size(maze2D,1)
val = maze2D(row-1,col);
valid = 1;
end
elseif currentDirection == 2
if col-1 >= 1 & col-1 <= size(maze2D,2)
val = maze2D(row,col-1);
valid = 1;
end
elseif currentDirection == 3
if col+1 >= 1 & col+1 <= size(maze2D,2)
val = maze2D(row,col+1);
valid = 1;
end
elseif currentDirection == 4
if row+1 >= 1 & row+1 <= size(maze2D,1)
val = maze2D(row+1,col);
valid = 1;
end
end
% status = 1 then move ahead successful
% status = 2 then bump into wall or boundary
% status = 3 then goal achieved
% Move the robot to the next location if no bump
function [row,col,status] = MoveAhead(row,col,currentDirection)
global tempMaze2D;
% based on the current direction check whether next location is space or
% bump and get information of use below
[val,valid] = LookAhead(row,col,currentDirection);
% check if next location for moving is space
% other wise set the status
% this checks the collision with boundary of maze
if valid == 1
% now check if the next location for space or bump
% this is for walls inside the maze
if val > 0
oldRow = row; oldCol = col;
if currentDirection == 1
row = row - 1;
elseif currentDirection == 2
col = col - 1;
elseif currentDirection == 3
col = col + 1;
elseif currentDirection == 4
row = row + 1;
end
status = 1;
if val == 100
% goal achieved
status = 3;
disp(status);
end
% update the current position of the robot in maze for display
tempMaze2D(oldRow,oldCol) = 50;
tempMaze2D(row,col) = 60;
elseif val == 0
% bump into wall
status = 2;
end
else
% return a bump signal if valid is 0
status = 2;
end