A SPACE ODYSSEY (1968)

EVER since Alan Turing鈥檚 鈥渋mitation game,鈥 we鈥檝e been acutely aware of the importance of measuring the capabilities of computers against our own miraculous brains. The British pioneer鈥檚 method, outlined in 1950, is primitive today, but it sought to answer a persistent question: How will we tell when a machine has become as (or more) intelligent than a human being?

Defining such progress is imperative for productive conversations about artificial intelligence (AI). Specifically, the question of what can be considered artificial general intelligence (AGI) 鈥 a 鈥渕ind鈥 as adaptable as our own 鈥 needs to be considered using a set of shared parameters. Currently, the term lacks precise definitions, making predictions of AGI鈥檚 arrival and impact simultaneously both unnecessarily alarmist or insufficiently concerned.

Consider the hopeless spread of predictions on AGI. Earlier this year, the preeminent AI researcher Geoffrey Hinton predicted 鈥渨ithout much confidence鈥 that AGI could be present within five to 20 years. One attempt to collate a sample of approximately 1,700 experts offered timing estimates from next year to never. One reason for the chasm is that we haven鈥檛 decided collectively what we鈥檙e even talking about. 鈥淚f you were to ask 100 AI experts to define what they mean by 鈥楢GI,鈥 you would likely get 100 related but different definitions,鈥 notes a recent paper from a team at DeepMind, the AI unit within Google.

One of the paper鈥檚 co-authors, Shane Legg, is credited with popularizing the AGI term. Now he and his team are seeking to set up a sensible framework with which to measure and define the technology 鈥 a taxonomy that can be used to help assuage or heighten fears and offer straightforward context to non-experts and legislators.

The effort is modeled on the system for describing the capabilities of self-driving cars. In 2014, SAE International (formerly the Society of Automotive Engineers) defined six distinct levels of autonomous capability, from Level 0 鈥 human driver in full control of vehicle鈥檚 operation 鈥 to Level 5 鈥 full automation of all the vehicle鈥檚 functions in all conditions. The scale has proved useful for lawmakers to set rules of the road and for the public to understand their cars鈥 capabilities. A car with Level 2 automation 鈥 steering, lane changes, acceleration and deceleration, in some settings, mostly on highways 鈥 can be legally driven on the road today on the condition that a human is sitting alert to take over immediately. But Level 4 or 5 cars, such as Alphabet鈥檚 Waymo cars on trial in San Francisco, need special permission to be used in public and are subject to additional oversight on their performance.

Classifying AGI will be much more complex than autonomous vehicles because the latter is merely a subset of the former. But the leveling system is useful for AI, too. In assessing capabilities, the DeepMind team split AI into two groups: narrow and general. A narrow AI, for instance, could have superhuman capability for one application, such as protein folding, but be incapable of writing a simple short story. To be considered AGI, according to DeepMind, a system must demonstrate a 鈥渨ide range of non-physical tasks, including metacognitive abilities like learning new skills.鈥

Levels are determined by their capabilities when compared with humans. At Level 1, 鈥淓merging,鈥 an AGI should be 鈥渆qual to or somewhat better than an unskilled human.鈥 That鈥檚 where the famous chatbots like ChatGPT are today, just about. Level 2, 鈥淐ompetent,鈥 would require performing at the standard of the top 50% of skilled adults. No AGI has yet achieved Level 2, the DeepMind team determined. From there, it envisioned 鈥淓xpert鈥 (more capable than 90% of skilled humans), 鈥淰irtuoso鈥 (99%), and 鈥淪uperhuman鈥 (100%).

But these levels alone wouldn鈥檛 be sufficient to determine the capability 鈥 or danger 鈥 of AGI. One distinct fear among those who worry about existential risk is the possibility that a smart-enough machine could act autonomously, possibly against humans 鈥 otherwise known as the 鈥淚鈥檓 sorry Dave, I鈥檓 afraid I can鈥檛 do that鈥 scenario. For this reason, the DeepMind team applies an accompanying classification for levels of AI autonomy 鈥 in which Level 1 is human in full control, automating mundane tasks 鈥 up to Level 5, a fully autonomous AI capable of working without human oversight.

Understanding these levels helps us better classify risk and react accordingly. A company developing an AGI with Level 1 autonomy (like ChatGPT) is of relatively little regulatory concern. But expert AGI, with Level 4 autonomy, is the point at which researchers foresee mass labor displacement and the 鈥渄ecline of human exceptionalism.鈥

As well as protecting against societal harm, an agreed-upon standard will also come in particularly useful in dispelling disingenuous attempts to overplay the capabilities of an AI as a marketing gimmick. It has helped, for example, that the SAE standard for autonomy means Tesla鈥檚 claim of 鈥淎utopilot鈥 can be more accurately described as merely Level 2 automation.

For the system to work, relevant and rigorous tests will be needed to determine the appropriate level on this scale. The nature of these tests is still to be determined, but, researchers said, they should cover mathematical tasks, spatial reasoning, social intelligence, and more 鈥 an AI pentathlon of sorts*, benchmarks that must be iterated and improved with the same vigor and regularity as the AI systems they鈥檙e seeking to measure.

Proper classification will settle some nerves and bring some much needed composure to the AGI conversation. It serves everyone to have clear definitions in that space between 鈥渂enign鈥 and 鈥渁nnihilation of the human race.鈥

Bloomberg Opinion

* Our imagination could run wild with these. My favorite is an idea for a test generally attributed to Apple co-founder Steve Wozniak, though it鈥檚 not clear if he鈥檚 paraphrasing others. His 鈥渃offee test鈥 would assess if an AI-powered robot was smart enough 鈥 without specific programming 鈥 to enter a typical American home, find a coffee machine and make a cup of coffee. Now that鈥檚 the kind of progress I can get behind.