【Matlab】强化Q学习算法求解迷宫问题

本篇博客向大家介绍一个利用强化Q学习求解迷宫问题的实例。

在这个问题中,机器人只能向上下左右四个方向移动。在每一步,基于机器人动作的结果,它被教导和再教导是否是一个好的动作,最终整个过程被一次又一次地重复,直到它到达目的地。在这一点上,该过程将再次开始,以便可以验证所学到的东西,并且可以忘记第一遍中所做的不必要的动作,等等。这是一个很好的教学例子,在这种情况下,学习必须在旅途中进行,即不使用训练例子。可用于游戏中,学习和提高人工智能算法与人类玩家和其他几种场景的竞争能力。在小迷宫中,收敛速度很快,而在大迷宫中,收敛可能需要一些时间。您可以通过修改代码来提高收敛速度,从而提高问题学习的效率。

实例中一共有四个m文件,实现的功能如下:

  • QLearning_Maze_Walk.m - Q-learning 算法
  • Random_Maze_Walk.m - 随机游走算法
  • Read_Maze.m - 读取迷宫
  • Textscanu.m - 加载文件

共包含两个地图,分别是:

  • maze-9-9.txt
  • maze-61-21.txt

下面是一个9*9的地图:

【Matlab】强化Q学习算法求解迷宫问题_第1张图片

1. 加载迷宫地图

% 3rd party file used for reading the maze
function C = textscanu(filename, encoding, del_sym, eol_sym, wb)

% C = textscanu(filename, encoding) reads Unicode 
% strings from a file and outputs a cell array of strings. 
% 
% Syntax:
% -------
% filename - string with the file's name and extension
%                 example: 'unicode.txt'
% encoding - encoding of the file
%                 default: UTF-16LE
%                 examples: UTF16-LE (little Endian), UTF8.
%                 See http://www.iana.org/assignments/character-sets
%                 MS Notepad saves in UTF-16LE ('Unicode'), 
%                 UTF-16BE ('Unicode big endian'), UTF-8 and ANSI.
% del_sym - column delimitator symbol in ASCII numeric code
%                 default: 9 (tabulator)
% eol_sym - end of line delimitator symbol in ASCII numeric code
%                 default: 13 (carriage return) [Note: line feed=10]
% wb          - displays a waitbar if wb = 'waitbar'
% 
% Example:
% -------
% C = textscanu('unicode.txt', 'UTF8', 9, 13);
% Reads the UTF8 encoded file 'unicode.txt', which has
% columns and lines delimited by tabulators, respectively 
% carriage returns. Shows a waitbar to make the progress 
% of the functions action visible.
%
% Created by: Vlad Atanasiu / [email protected]

switch nargin
    case 5
        if strcmp(wb, 'waitbar') == 1;
            h = waitbar(0,''); % display waitbar
        end
    case 4
        h = 0;
    case 3
        h = 0;
        eol_sym = 13;
    case 2
        h = 0;
        eol_sym = 13;   % end of line symbol (CR=13, LF=10)
        del_sym = 9;    % column delimitator symbol (TAB=9)
    case 1
        h = 0;
        eol_sym = 13;
        del_sym = 9;
        encoding = 'UTF16-LE';
end
warning off MATLAB:iofun:UnsupportedEncoding;

% read input
fid = fopen(filename, 'r', 'l', encoding);
S = fscanf(fid, '%c');
fclose(fid);

% remove Byte Order Marker and add an 
% end of line mark at the end of the file
S = [S(2:end) char(eol_sym)]; 

% locates column delimitators and end of lines
del = find(abs(S) == del_sym); 
eol = find(abs(S) == eol_sym);

% get number of rows and columns in input
row = numel(eol);
col = 1 + numel(del) / row;
C = cell(row,col); % output cell array

% catch errors in file
if col - fix(col) ~= 0
    error(['Error: The file has an odd number of columns ',...
        'or line ends are malformed.'])
end

m = 1;
n = 1;
sos = 1;

% parse input
if col == 1
    % single column input
    for r = 1:row
        if h ~= 0
            waitbar( r/row, h, [num2str(r), '/', num2str(row)] )
        end
        eos = eol(n) - 1;
        C(r,col) = {S(sos:eos)};
        n = n + 1;
        sos = eos + 3;
    end
else
    % multiple column input
    for r = 1:row
        if h ~= 0
            waitbar( r/row, h, [num2str(r), '/', num2str(row)] )
        end
        for c = 1:col-1
            eos = del(m) - 1;
            C(r,c) = {S(sos:eos)};
            sos = eos + 2;
            m = m + 1;
        end
        % last string in the row
        sos = eos + 2;
        eos = eol(n) - 1;
        C(r,col) = {S(sos:eos)};
        n = n + 1;
        sos = eos + 3;
    end
end
%close(h)
%Copyright (c) Asad Ali 
%Website: https://sites.google.com/site/asad82/code
%Email: [email protected]

function [maze2D,row,col] = Read_Maze(fileName)

% read the maze from file
C = textscanu(fileName, 'UTF8', 9, 13);

% convert the maze into a 2D matrix
maze1D = C{1};
[xx,yy] = find(maze1D == 10);
numCol = round(size(maze1D,2)/size(xx,2));
numRow = size(xx,2);
%maze2D = zeros(numRow,numCol);
rowIndex = 1; colIndex = 1;
for i=1:size(maze1D,2)
    if maze1D(1,i) == 10
        % carriage return
        rowIndex = rowIndex + 1;
        colIndex = 1;
    elseif maze1D(1,i) == 'G'
        % goal
        maze2D(rowIndex,colIndex) = 100;
        colIndex = colIndex + 1;        
    elseif maze1D(1,i) == 'S'
        % start point
        maze2D(rowIndex,colIndex) = 60;
        row = rowIndex; col = colIndex;
        colIndex = colIndex + 1;        
    elseif maze1D(1,i) == ' '
        % space
        maze2D(rowIndex,colIndex) = 50;
        colIndex = colIndex + 1;        
    else
        % bump
        maze2D(rowIndex,colIndex) = 0;
        colIndex = colIndex + 1;        
    end
end

2. 随机游走算法 

% This work was done as part of a course while I was a graduate student in 
% the University of Tokyo in spring 2011 while working for late Professor Carson
% Reynolds of the Masatoshi Ishikawa Lab, Graduate School of Information 
% Science and Technology

% This code demonstrates the reinforcement learning (Q-learning) algorithm using an example of a maze 
% in which a robot has to reach its destination by moving in the left, right,
% up and down directions only. At each step, based on the outcome of the
% robot action it is taught and re-taught whether it was a good move or not
% eventually the whole process is repeated time and again until it reaches
% its destination. At this point the process will start again so
% that what ever has been learned can be verified and un-necessary moves
% made during the first pass can be forgotten and so on. It is good tutorial example
% for situations in which learning has to be done on the go i.e. without
% the use of training examples. Can be used in games to learn and improve the
% competitive capability of AI algorithm with that of human players and
% several other scenarios.

% This is a random version for comparison of convergence time with that of
% Q-learning algorithm

% There are four m-files
% QLearning_Maze_Walk.m - demonstrates the working of Q-learning algorithm on a selected maze
% Random_Maze_Walk.m - demonstrates the working of random selection for comparison
% Read_Maze.m - will read the maze provided as input and translate into numeric representation for processing
% Textscanu.m - reads the raw maze text file

% Two maze files are included:
% maze-9-9.txt
% maze-61-21.txt
% which can be provided as input by changing the fileName in the code


function Random_Maze_Walk
clear all;
close all;

global maze2D;
global tempMaze2D;

DISPLAY_FLAG = 1; % 1 means display maze and 0 means no display
NUM_ITERATIONS = 10; % change this value to set max iterations 

% initialize global variable about robot orientation
currentDirection = 1; % robot is facing up
% row col will be initalized with the position of starting point 
% in the loop in which maze is read below
fileName = 'maze-9-9.txt';
[maze2D,row,col] = Read_Maze(fileName);
% show the maze
imagesc(maze2D),colorbar

% make some copies of maze to use later for display
orgMaze2D = maze2D;
orgMaze2D(row,col) = 50;
[goalX,goalY,val] = find(orgMaze2D == 100);
tempMaze2D = orgMaze2D;

% robots starting position
startX = row;
startY = col;

% Direction selection
% 0 means turn left
% 1 means turn right
% 2 means Move Ahead in current direction 
% if this value is set to 2 then random walker will have one more action 
NUM_DIRECTIONS = 1; % 1, 2

for j=1:NUM_ITERATIONS
    status = -1;
    countActions = 0;
    countSteps = 0;
    tempMaze2D(goalX,goalY) = 100;
    row = startX; col = startY;
    currentDirection = 1;
    
    while status ~= 3
        % select whether to call Turn Left or Turn Right below
        direction = round(rand*NUM_DIRECTIONS);
                
        % get a rand number between 0 - 3 to turn left right in selected direction
        % that many times
        randMove = round(rand*3);
        
        for i=0:randMove
            if direction == 0
                % Turn Left and then move ahead                
                currentDirection = TurnLeft(currentDirection);            
            elseif direction == 1
                % Turn Right and then move ahead                
                currentDirection = TurnRight(currentDirection);            
            end                
        end
        
        [row,col,status] = MoveAhead(row,col,currentDirection);
        
        % count the steps required to reach the goal
        if status == 1
            countSteps = countSteps + 1;
        end
        % count actions taken to reach the goal
        countActions = countActions + randMove + 1;
        
        % display the maze after some steps
        if rem(countActions,1) == 0 & DISPLAY_FLAG == 1
            % calculate Manhattan distance between current location and goal
            X = [row col];
            Y = [goalX goalY];
            dist = norm(X-Y,1);
            s = sprintf('Manhattan Distance = %f',dist);
            imagesc(tempMaze2D)%,colorbar;
            title(s);
            drawnow
        end
    end
    % display the final maze
    imagesc(tempMaze2D);
    disp(countActions);    
    disp(countSteps);
    iterationCountA(j,1) = countActions;    
    iterationCountS(j,1) = countSteps;     
    %bar(iterationCountA);  
    %drawnow    
end

figure,bar(iterationCountS); title('Steps Plot')
figure,bar(iterationCountA); title('Actions Plot')
meanA = mean(iterationCountA); 
disp('----Mean Result Actions -----')
disp(meanA);
disp('----Mean Result Steps -----')
meanS = mean(iterationCountS);
disp(meanS);


%-------------------------------%
%  1
% 2 3
%  4
% Current Direction
% 1 - means robot facing up
% 2 - means robot facing left
% 3 - means robot facing right
% 4 - means robot facing down
%------------------------------%

% based on the current direction and convention rotate the robot left
function currentDirection = TurnLeft(currentDirection)
if currentDirection == 1
    currentDirection = 2;
elseif currentDirection == 2
    currentDirection = 4;
elseif currentDirection == 4
    currentDirection = 3;
elseif currentDirection == 3
    currentDirection = 1;
end

% based on the current direction and convention rotate the robot right
function currentDirection = TurnRight(currentDirection)
if currentDirection == 1
    currentDirection = 3;
elseif currentDirection == 3
    currentDirection = 4;
elseif currentDirection == 4
    currentDirection = 2;
elseif currentDirection == 2
    currentDirection = 1;
end

% return the information just in front of the robot (local)
function [val,valid] = LookAhead(row,col,currentDirection)  
global maze2D;
valid = 0;
if currentDirection == 1
    if row-1 >= 1 & row-1 <= size(maze2D,1)
        val = maze2D(row-1,col);
        valid = 1;
    end
elseif currentDirection == 2
    if col-1 >= 1 & col-1 <= size(maze2D,2)
        val = maze2D(row,col-1);
        valid = 1;
    end
elseif currentDirection == 3
    if col+1 >= 1 & col+1 <= size(maze2D,2)
        val = maze2D(row,col+1);
        valid = 1;
    end
elseif currentDirection == 4
    if row+1 >= 1 & row+1 <= size(maze2D,1)
        val = maze2D(row+1,col);
        valid = 1;
    end
end

% status = 1 then move ahead successful
% status = 2 then bump into wall or boundary
% status = 3 then goal achieved
% Move the robot to the next location if no bump 
function [row,col,status] = MoveAhead(row,col,currentDirection)  
global tempMaze2D;

% based on the current direction check whether next location is space or
% bump and get information of use below
[val,valid] = LookAhead(row,col,currentDirection);
% check if next location for moving is space
% other wise set the status
% this checks the collision with boundary of maze
if valid == 1
    % now check if the next location for space or bump
    % this is for walls inside the maze
    if val > 0
        oldRow = row; oldCol = col;
        if currentDirection == 1
            row = row - 1;
        elseif currentDirection == 2 
            col = col - 1;
        elseif currentDirection == 3 
            col = col + 1;
        elseif currentDirection == 4 
            row = row + 1;    
        end
        status = 1;        
        
        if val == 100
            % goal achieved             
            status = 3;
            disp(status);            
        end
        
        % update the current position of the robot in maze for display
        tempMaze2D(oldRow,oldCol) = 50;                 
        tempMaze2D(row,col) = 60; 
    elseif val == 0
        % bump into wall
        status = 2;        
    end
else
    % return a bump signal if valid is 0
    status = 2;
end 

3. 强化Q学习算法 

% This work was done as part of a course while I was a graduate student in 
% the University of Tokyo in spring 2011 while working for late Professor Carson
% Reynolds of the Masatoshi Ishikawa Lab, Graduate School of Information 
% Science and Technology

% This code demonstrates the reinforcement learning (Q-learning) algorithm using an example of a maze 
% in which a robot has to reach its destination by moving in the left, right,
% up and down directions only. At each step, based on the outcome of the
% robot action it is taught and re-taught whether it was a good move or not
% eventually the whole process is repeated time and again until it reaches
% its destination. At this point the process will start again so
% that what ever has been learned can be verified and un-necessary moves
% made during the first pass can be forgotten and so on. It is good tutorial example
% for situations in which learning has to be done on the go i.e. without
% the use of training examples. Can be used in games to learn and improve the
% competitive capability of AI algorithm with that of human players and
% several other scenarios.

% On small maze the convergence will be fast where as on large maze
% convergence can take some time. You can improve convergence speed by
% modifying the code to make Q-learning efficient.

% There are four m-files
% QLearning_Maze_Walk.m - demonstrates the working of Q-learning algorithm on a selected maze
% Random_Maze_Walk.m - demonstrates the working of random selection for comparison
% Read_Maze.m - will read the maze provided as input and translate into numeric representation for processing
% Textscanu.m - reads the raw maze text file

% Two maze files are included:
% maze-9-9.txt
% maze-61-21.txt
% which can be provided as input by changing the fileName in the code

function QLearning_Maze_Walk
clear all;
close all;

global maze2D;
global tempMaze2D;

DISPLAY_FLAG = 1; % 1 means display maze and 0 means no display
NUM_ITERATIONS = 100; % change this value to set max iterations 
% initialize global variable about robot orientation
currentDirection = 1; % robot is facing up

% row col will be initalized with the position of starting point of robot
% in the loop in which maze is read below
fileName = 'maze-9-9.txt';
[maze2D,row,col] = Read_Maze(fileName);
imagesc(maze2D) % show the maze

% make some copies of maze to use later for display
orgMaze2D = maze2D;
orgMaze2D(row,col) = 50;
[goalX,goalY,val] = find(orgMaze2D == 100);
tempMaze2D = orgMaze2D;

% record robots starting location for use later
startX = row;
startY = col;

% build a state action matrix by finding all valid states from maze
% we have four actions for each state.
Q = zeros(size(maze2D,1),size(maze2D,2),4);

% only used for priority visiting for larger maze
%visitFlag = zeros(size(maze2D,1),size(maze2D,2));

% status message for goal and bump
GOAL = 3;
BUMP = 2;

% learning rate settings
alpha = 0.8; 
gamma = 0.5;

for i=1:NUM_ITERATIONS   
    tempMaze2D(goalX,goalY) = 100;
    row = startX; col = startY;
    status = -1;
    countActions = 0;
    currentDirection = 1;

    % only used for priority visiting for larger maze 
%    visitFlag = zeros(size(maze2D,1),size(maze2D,2));
%    visitFlag(row,col) = 1;            
    
    while status ~= GOAL
        % record the current position of the robot for use later
        prvRow = row; prvCol = col;
        
        % select an action value i.e. Direction
        % which has the maximum value of Q in it
        % if more than one actions has same value then select randomly from them
        [val,index] = max(Q(row,col,:));
        [xx,yy] = find(Q(row,col,:) == val);
        if size(yy,1) > 1            
            index = 1+round(rand*(size(yy,1)-1));
            action = yy(index,1);
        else
            action = index;
        end

        % based on the selected actions correct the orientation of the
        % robot to conform to rules of simulator
        while currentDirection ~= action
            currentDirection = TurnLeft(currentDirection);
            % count the actions required to reach the goal
            countActions = countActions + 1;            
        end
                
        % do the selected action i.e. MoveAhead
        [row,col,status] = MoveAhead(row,col,currentDirection);

        % count the actions required to reach the goal        
        countActions = countActions + 1;            
        
        % Get the reward values i.e. if final state then max reward
        % if bump into a wall then -1 is the reward for that action
        % other wise the reward value is 0                
        if status == BUMP
            rewardVal = -1;
        elseif status == GOAL
            rewardVal = 1;
        else
            rewardVal = 0;
        end

        % enable this piece of code if testing larger maze
%         if visitFlag(row,col) == 0
%             rewardVal = rewardVal + 0.2;
%             visitFlag(row,col) = 1;            
%         else
%             rewardVal = rewardVal - 0.2;
%         end
                
        % update information for robot in Q for later use
        Q(prvRow,prvCol,action) = Q(prvRow,prvCol,action) + alpha*(rewardVal+gamma*max(Q(row,col,:)) - Q(prvRow,prvCol,action));
        
        % display the maze after some steps
        if rem(countActions,1) == 0 & DISPLAY_FLAG == 1
            X = [row col];
            Y = [goalX goalY];        
            dist = norm(X-Y,1);            
            s = sprintf('Manhattan Distance = %f',dist);
            imagesc(tempMaze2D);%,colorbar;
            title(s);            
            drawnow
        end
    end
    
    iterationCount(i,1) = countActions;
    
    % display the final maze
    imagesc(tempMaze2D);%,colorbar;
    disp(countActions);
    %bar(iterationCount);  
    drawnow
end

figure,bar(iterationCount)
disp('----- Mean Result -----')
meanA = mean(iterationCount);
disp(meanA);
%save Q_Learn_9-9.mat;


%-------------------------------%
%  1
% 2 3
%  4
% Current Direction
% 1 - means robot facing up
% 2 - means robot facing left
% 3 - means robot facing right
% 4 - means robot facing down
%------------------------------%
% based on the current direction and convention rotate the robot left
function currentDirection = TurnLeft(currentDirection)
if currentDirection == 1
    currentDirection = 2;
elseif currentDirection == 2
    currentDirection = 4;
elseif currentDirection == 4
    currentDirection = 3;
elseif currentDirection == 3
    currentDirection = 1;
end

% based on the current direction and convention rotate the robot right
function currentDirection = TurnRight(currentDirection)
if currentDirection == 1
    currentDirection = 3;
elseif currentDirection == 3
    currentDirection = 4;
elseif currentDirection == 4
    currentDirection = 2;
elseif currentDirection == 2
    currentDirection = 1;
end


% return the information just in front of the robot (local)
function [val,valid] = LookAhead(row,col,currentDirection)  
global maze2D;
valid = 0;
if currentDirection == 1
    if row-1 >= 1 & row-1 <= size(maze2D,1)
        val = maze2D(row-1,col);
        valid = 1;
    end
elseif currentDirection == 2
    if col-1 >= 1 & col-1 <= size(maze2D,2)
        val = maze2D(row,col-1);
        valid = 1;
    end
elseif currentDirection == 3
    if col+1 >= 1 & col+1 <= size(maze2D,2)
        val = maze2D(row,col+1);
        valid = 1;
    end
elseif currentDirection == 4
    if row+1 >= 1 & row+1 <= size(maze2D,1)
        val = maze2D(row+1,col);
        valid = 1;
    end
end

% status = 1 then move ahead successful
% status = 2 then bump into wall or boundary
% status = 3 then goal achieved
% Move the robot to the next location if no bump 
function [row,col,status] = MoveAhead(row,col,currentDirection)  
global tempMaze2D;

% based on the current direction check whether next location is space or
% bump and get information of use below
[val,valid] = LookAhead(row,col,currentDirection);
% check if next location for moving is space
% other wise set the status
% this checks the collision with boundary of maze
if valid == 1
    % now check if the next location for space or bump
    % this is for walls inside the maze
    if val > 0
        oldRow = row; oldCol = col;
        if currentDirection == 1
            row = row - 1;
        elseif currentDirection == 2 
            col = col - 1;
        elseif currentDirection == 3 
            col = col + 1;
        elseif currentDirection == 4 
            row = row + 1;    
        end
        status = 1;        
        
        if val == 100
            % goal achieved             
            status = 3;
            disp(status);            
        end
        
        % update the current position of the robot in maze for display
        tempMaze2D(oldRow,oldCol) = 50;                 
        tempMaze2D(row,col) = 60; 
    elseif val == 0
        % bump into wall
        status = 2;        
    end
else
    % return a bump signal if valid is 0
    status = 2;
end 

 

你可能感兴趣的:(Matlab编程与绘图)