Q Learning Game (Artificial Intelligence)
by Suman_Shashi in Circuits > Computers
681 Views, 2 Favorites, 0 Comments
Q Learning Game (Artificial Intelligence)
Hello, friends, I guess most of DIY users here already know about Q Learning. It is a branch of Machine Learning or better say a branch of Re-enforcement Learning.
Q learning is an unsupervised way of learning where you are rewarded for correct moves and punished for wrong moves by making a Q matrix. Let me explain it in a better way.
Consider the image of Square.
Our bot starts at the yellow circle i.e box 16. She has to reach the green box. If she touches the red box she will have to give a penalty of -1 If she touches black box she will have to give a penalty of -2. If she reaches Green box she will be rewarded +100. If she makes a move to any other box she will be rewarded -0.2.
So she has to find the shortest path which will reward her the most. Clearly to our eyes the shortest path is
Path a = 16 - 17 - 18 - 19 - 20 - 15 - 10 - 5
Reward received -0.2*7+100 = 98.6 pts
Path b = 16 - 17 - 12 - 7 - 2 - 3 - 4 - 5
Reward received = -0.2*7+100 = 98.6 pts
So here each box is a state and each state has a reward. Thus the Q matrix will be (See Images)
In matrix 0 refers move not possible -1 refers a penalty -0.2 refers a penalty +100 refers a reward.
This Q matrix is the BRAIN of the artificial bot. The bot will take random actions moving only in 4 directions which are Up, Down, Left and Right.Thus by using this matrix data the bot will try to maximize the reward and minimize the penalty by trial and error.
Installing Python 3.5 and Module
Go to the official python website and download Python 3.5 64bit.
Then go the folder where Python got installed and find Scripts folder. Open it. Then at some random white space Shift+Right Click to open Command Prompt (Admin) at that location. You cmd should show address like this
C:/Python3/Scripts>
Else if you have windows 10 anniversary then Right click windows icon and open command prompt as admin.
Then type cd \
Then type E: or F: or the drive letter where you have installed Python. Leave it if you have installed it in C drive.
The to locate Scripts folder type like this (I have mine in E drive)
E:\>cd E:Python3\Scripts
You will get this
E:\Python\Scripts>
After you reach here type this
pip install numpy
Let it install till then do something productive
then type
pip install pygame
then type
pip install tensorflow
pip install tflearn
If done correctly proceed
Chase the Pong Ball Code
Our game is to catch the ball that falls from above. If the bot misses then he will be rewarded with -1 else he will be rewarded with +1.
The code begins like this
import sys
from pygame.locals import *
import pygame as py #IMPORTING PYGAME
import numpy as np # IMPORTING NUMPY
import random # IMPORTING RANDOM FOR RANDOM GENERATION
Choose your game FPS (Sounds cool right)
FPS = 30 #Lower for slow and higher for fast
Clk = py.time.Clock()
py.init() # pygame initialization
py.display.set_mode((800, 600)) #Width = 800 #Height = 600
py.display.set_caption('Q learning Example!')
Left = 400
Top = 370
Width = 200
Height = 20
BLACK = (0,0,0)
GREEN = (0,255,0)
WHITE = (255,255,255)
rectangle = py.rect(Left,Top,Width,Height)
dict = {} # dictionary for Q temp matrix
Brain = np.zeros([4000,3])
action = 1 # 0 for still, -1 for left, 1 for right
score = 0
reward = 0
penalty = 0
cirX = 400 #circle center x corrdinate
cirY = 0 # circle center y corrdinate
radius = 10
y = 0.98 #gamma
class State:
def __init__(self,rect,circle):
self.rect = rect
self.circle = circle
class Circle:
def __init__(self,cirX,cirY):
self.cirX=cirX
self.cirY=cirY
def changeCirX(radius):
newx = 100-radius
mul = random.randint(1,8)
newx *=mul
return newx
def calscore(rectangle,circle):
if rectangle.Left <=circle.cirX <= rectangle.Right:
return 1;
else:
return -1
def convert(state):
alpha = float(str(state.rect.Left)+str(state.rect.Right)+str(state.circle.cirX)+str(state.circle.cirY))
if alpha in dict:
return dict[alpha]
elif len(dict)>=1:
dict[alpha] = max(dict, key=dict.get) + 2
else:
dict[alpha] = 1
def best_action:
np.argmax(Brain[convert(state),:])
def afteraction(state,act):
if action==1:
if rectangle.Right + rectangle.Width > 800:
rectangle = state.rect
else:
rectangle = py.rect(rectangle.Left+rectangle.Width,rectangle.Top, rectangle.Width, rectangle.Height)
elif action == -1:
if rectangle.Left + rectangle.Width < 0:
rectangle = state.rect
else:
rectangle = py.rect(rectangle.Left-rectangle.Width, rectangle.Top, rectangle,Width, rectangle.Height)
else:
rectangle = state.rect
C = Circle(state.cirX,state.cirY+pixeljump)
return State(rectangle,C)
def newRectangle(rectangle,circle):
if action==1:
if rectangle.Right + rectangle.Width > 800:
return(rectangle)
else:
rectangle = py.rect(rectangle.Left+rectangle.Width,rectangle.Top, rectangle.Width, rectangle.Height)
elif action == -1:
if rectangle.Left + rectangle.Width < 0:
return (rectangle)
else:
rreturn(py.rect(rectangle.Left-rectangle.Width, rectangle.Top, rectangle,Width, rectangle.Height))
return (rectangle)
while True:
for event in py.event.get():
if event.type == QUIT:
py.quit()
sys.exit()
surface.fill(BLACK)
if CirY >=600-radius:
reward = calscore(rectangle,circle(cirX,cirY))
cirX = changecirX(Radius)
cirY = 0
else:
reward = 0
cirY += pixeljump
state = State(rectangle,circle(cirX,cirY))
action = best_Action(state)
x = calscore()
newstate = afteraction(state,action)
Brain[convert(state),act] += x + [y*np.max(Brain[convert(newstate),:])]
rectangle = newRectangle(state.rectangle,action)
cirX = State.circle.CirX
cirY = State.circle.CirY
if reward == 1:
score += reward
else:
score -= reward
Penalty += reward
text = font.render('Score: ' + str(score), True, (243, 160, 90)) # update the score on the screen
text1 = font.render('Penalty: ' + str(penalty), True, (125, 157, 207)) # update the score on the screen
text2 = font.render('Time Taken : ' + str(int(el/60))+'m'+str(int(el%60))+'s', True, (0, 0, 255))
surface.blit(text, (670, 10)) # render score
surface.blit(text1, (10, 10)) # render penalty
surface.blit(text2, (250, 10))
py.display.update() # update display
Clk.tick(FPS)
FOR CODE CLIKE HERE