Sasha Hydrie

Session
Session 1
Board Number
9

Keypoint Extraction for State-Space Representation in Reinforcement Learning

Reinforcement learning is a form of machine learning where an agent explores an environment to determine how to complete a given task. The agent receives a stream of data, known as observations, and chooses actions to take. In the context of robotics, observations are typically frames from a camera feed. Frames are represented as vectors with tens of thousands of entries. At each time step, only a few components of the environment are relevant. Our technique trains a model by algorithmically determining and tracking keypoints across the frames then using those as observations instead of raw pixel data. Ideally, we will find that keypoints distill the essential information into a simpler format for faster learning.

We created a virtual environment with the CausalWorld library to train the keypoint and control agents across a variety of grasping and manipulation tasks. We trained agents using state-of-the-art learning algorithms, namely SAC and PPO, with DeepMind's Transporter architecture as a preprocessing layer to extract keypoints.

Training with virtual camera data rather than ground-truth pose data slows the process by two orders of magnitude, prohibitively slow on available hardware, so we don't have a mature model yet. Our preliminary results suggest that the agent can learn exclusively from keypoint data.