In a move that shows both foresight and responsibility, OpenAI has unveiled plans to tackle one of the most pressing issues of our decade: superintelligence alignment.
This project aims to ensure that future AI systems significantly more intelligent than humans continue to serve human intent.
Despite sounding like the plot of a sci-fi novel, the superintelligence concept is a very real, looming entity.
Though the advent of true superintelligence might seem a distant prospect now, predictions suggest it may surface within this decade.
It carries the potential to tackle pressing global issues, but it has its perils.
It is these potential dangers, ranging from the disempowerment of humanity to our outright extinction, that OpenAI is stepping up to address.
Tackling the Alignment Conundrum
Current techniques to ensure AI alignment, such as reinforcement learning from human feedback, depend largely on human supervision.
However, as we venture into AI systems outpacing human intelligence, such supervision becomes an impractical solution. This is where the notion of ‘super alignment’ comes in, aimed at automating the alignment of AI systems, reducing reliance on human monitoring, and paving the way for safe superintelligence.
“Our current alignment techniques will not scale to superintelligence,” explains Ilya Sutskever, co-founder and Chief Scientist at OpenAI. “We need new scientific and technical breakthroughs.”
OpenAI plans to construct a human-level automated alignment researcher, thereby aiming to align superintelligence iteratively through the use of computational power.
This ambitious task will involve developing scalable training methods, validating the resulting models, and subjecting the entire alignment pipeline to stringent stress tests.
The Research Plan
AI systems will be employed to evaluate other AI systems in tasks that present evaluation difficulties for humans to facilitate effective training.
The goal is to gain an understanding and control over how these models generalise their oversight to tasks beyond human supervision.
In order to validate system alignment, a search will be automated to detect problematic behaviour and internals.
“Our focus is to automate the search for problematic behaviour,” adds Jan Leike, Head of Alignment at OpenAI, emphasising the need for robustness and interpretability.
OpenAI also plans to subject its entire pipeline to adversarial testing, deliberately training misaligned models and confirming whether their techniques can detect the gravest misalignments.
Building the Team and the Future
OpenAI is bringing together a high-calibre team of machine learning researchers and engineers to tackle this problem.
With 20% of their computational resources allocated to the task over the next four years, the OpenAI’s super alignment team will focus on solving the core technical challenges of superintelligence alignment.
The organisation recognises the enormity of the task at hand but remains optimistic about the possibilities.
“There are many ideas that have shown promise in preliminary experiments, and we can use today’s models to study many of these problems empirically,” Sutskever points out.
The super alignment team’s efforts will augment OpenAI’s existing work on enhancing the safety of current models and understanding and mitigating other AI risks, such as misuse and bias.
OpenAI plans to share the findings of this crucial venture widely, contributing to the alignment and safety of non-OpenAI models as a significant aspect of its work.
Open Call to Join the Mission
OpenAI has issued an open call for outstanding researchers and engineers to join this critical mission.
In the words of Sutskever, “Superintelligence alignment is one of the most important unsolved technical problems of our time. We need the world’s best minds to solve this problem.”
Interested researchers are encouraged to apply for various positions at OpenAI, where their machine-learning expertise can contribute significantly to the alignment efforts.
The goal of this initiative is not just to build more advanced AI but to ensure that as AI grows in power, it remains a beneficial and controlled tool for humanity.