I'm very interested in AI methods that improve learning efficiency. I found this new paper from DeepMind interesting and thought you might too.
Motivation:
Humans explicitly learn notions of objects, relations, geometry and cardinality in a task-agnostic manner and re-purpose this knowledge to future tasks
- Transporter learns across commonly used RL-proposed architecture robust to varying #, size & motion of objects
- Using learned keypoints as state input leads to policies that perform better than model-free & model-based RL
- Demonstrate drastic reductions in the search complexity
First hypothesis: task-agnostic learning of object keypoints can enable fast learning of goal-directed policies.
Second hypothesis: learned keypoints can enable significantly better task-independent exploration.
Search efficiency improvement:
A random action agent would need to search in the space of 18^100 raw actions. However, observing 5 keypoints and T = 20 only has (5×4)^100/20, giving a search space reduction of 10^100
Results:
- Transporter consistently tracks the salient object keypoints over long time horizons and outperforms
- Using the learned keypoints and corresponding features within a reinforcement learning context can lead to data-efficient learning in Atari games
- Surprisingly, learned options model able to play several Atari games via random sampling of options - possible by learning skills to move discovered game avatar as far as possible without dying
- Learned keypoint options consistently outperform the random actions baseline by a large margin
- Most notably, this is achieved without rewards or (extrinsic) task-directed learning - Therefore learned keypoints are stable enough to learn complex object-oriented skills in the Atari domain