Robots have long captured our imagination, from helpful companions like R2D2 and Wall-E to nightmarish villains in sci-fi thrillers. But the reality of robotics technology has lagged far behind fiction.
Robots still struggle with even basic real-world tasks like picking up unfamiliar objects.
Now, Google DeepMind’s new Robotic Transformer 2 (RT-2) AI model brings us markedly closer to the goal of adaptable helper robots.
Training Robots is Hard…Until Now
Traditionally, robots must be painstakingly trained through billions of real-world repetitions on every single object and scenario. This makes training robots impractical for most researchers and companies.
As Vincent Vanhoucke, Google DeepMind’s head of robotics, explained: “Learning is a challenging endeavour, and even more so for robots.”
With RT-2, robots can learn more like humans do – by transferring concepts from one situation to another.
RT-2 leverages two existing natural language models: Pathways Language and Image Model (PaLI-X) and Pathways Language Model Embodied (PaLM-E).
After ‘watching’ extensive real-world images and descriptions, RT-2 develops an innate understanding of visual concepts and language.
RT-2 Bridges the Gap Between Seeing, Understanding, and Doing
RT-2 represents a groundbreaking new ability for robots. Translating visual and language concepts directly into physical actions. This eliminates the need for tedious explicit training on individual tasks.
For example, a traditional robot vacuum can only vacuum floors. But with RT-2’s flexible learning approach, a robot that learned to pick up balls could figure out how to pick up cubes or other objects it’s never seen before.
Also read:
Huge Performance Gains in Early Testing
In over 6,000 test trials by DeepMind researchers, RT-2 performed on par with DeepMind’s prior model RT-1 on trained tasks. But it delivered nearly double the success rate – 62% versus 32% – on completely novel, untrained scenarios.
This demonstrates RT-2’s unparalleled ability to adapt. And RT-2 achieved 90% success on external benchmark tasks from the Language Table suite, far exceeding previous results.
Even More Lifelike Robot Reasoning on the Horizon
Looking ahead, DeepMind aims to enhance RT-2’s AI-powered reasoning abilities using a technique called chain-of-thought prompting.
As Vanhoucke described: “Chain-of-thought reasoning enables learning a self-contained model that can both plan long-horizon skill sequences and predict robot actions.”
For example, RT-2 could determine that it needs stamps before mailing a package, then plan the steps to acquire stamps and complete the task.
Towards Helpful Robots of the Future
While still an early research achievement, RT-2 represents remarkable progress towards flexible, adaptable robots thanks to advances in artificial intelligence.
In Vanhoucke’s words: “RT-2 shows enormous promise for more general-purpose robots. While there is still a tremendous amount of work to be done…RT-2 shows us an exciting future for robotics just within grasp.”
With further development, RT-2 could enable robots to lend a hand around the house, assist elderly and disabled people, deliver packages, and take over dangerous work.
But significant challenges around safety and control will need to be resolved first.
Still, thanks to the power of artificial intelligence, sci-fi’s vision of helpful household robots now seems tantalisingly close. The coming years promise to see rapid evolution in real-world robotics.