Gopesh Bajaj

Session
Session 2
Board Number
24

Analyzing Reward-Penalty Algorithms in virtually simulated environments to Multi-Robot Systems

The progress of robotic automation is steadily increasing and robots perform medical, educational, and logistical tasks consistently. The field of robotic automation connecting with psychology gives rise to a very interesting model. The idea behind this model is to utilize Q-learning (an algorithm for Reinforcement Learning) to learn how to automate physical tasks like Taxi-drop off. Reinforcement learning is a branch of Machine learning that deals specifically with the process of self-learning by the principles of trial and error. This method is useful in generating improvised results over multiple iterations [4].
The Open-AI gym [2, 3] provides numerous environments to develop and benchmark Reinforcement Learning algorithms. Considering the futuristic scope of vehicle automation and increasing the effectiveness of the process, we are utilizing the Taxi-drop off environment for the scope of this research project. The goal of the taxi is to pick up passengers and drop them off at the desired location in the least amount of time.
Hypothesis: Utilizing known reward shaping algorithms [1] into a project tailored to reward shaping ideas could permit the robot to achieve more efficient results. This efficiency could be reciprocated into various automation projects and real-world applications.