OpenAI, Google and Anthropic are struggling to build more advanced AI

OPENAI was on the cusp of a milestone. The startup finished an initial round of training in September for a massive new artificial intelligence model that it hoped would significantly surpass prior versions of the technology behind ChatGPT and move closer to its goal of powerful AI that outperforms humans.
But the model, known internally as Orion, did not hit the company鈥檚 desired performance, according to two people familiar with the matter, who spoke on condition of anonymity to discuss company matters. As of late summer, for example, Orion fell short when trying to answer coding questions that it hadn鈥檛 been trained on, the people said. Overall, Orion is so far not considered to be as big a step up from OpenAI鈥檚 existing models as GPT-4 was from GPT-3.5, the system that originally powered the company鈥檚 flagship chatbot, the people said.
OpenAI isn鈥檛 alone in hitting stumbling blocks recently. After years of pushing out increasingly sophisticated AI products at a breakneck pace, three of the leading AI companies are now seeing diminishing returns from their costly efforts to build newer models. At Alphabet Inc.鈥檚 Google, an upcoming iteration of its Gemini software is not living up to internal expectations, according to three people with knowledge of the matter. Anthropic, meanwhile, has seen the timetable slip for the release of its long-awaited Claude model called 3.5 Opus.
The companies are facing several challenges. It鈥檚 become increasingly difficult to find new, untapped sources of high-quality, human-made training data that can be used to build more advanced AI systems. Orion鈥檚 unsatisfactory coding performance was due in part to the lack of sufficient coding data to train on, two people said. At the same time, even modest improvements may not be enough to justify the tremendous costs associated with building and operating new models, or to live up to the expectations that come with branding a product as a major upgrade.
There is plenty of potential to make these models better. OpenAI has been putting Orion through a months-long process often referred to as post-training, according to one of the people. That procedure, which is routine before a company releases new AI software publicly, includes incorporating human feedback to improve responses and refining the tone for how the model should interact with users, among other things. But Orion is still not at the level OpenAI would want in order to release it to users, and the company is unlikely to roll out the system until early next year, one person said. The Information previously reported some details of OpenAI鈥檚 challenges developing its new model, including with coding tasks.
These issues challenge the gospel that has taken hold in Silicon Valley in recent years, particularly since OpenAI released ChatGPT two years ago. Much of the tech industry has bet on so-called scaling laws that say more computing power, data and larger models will inevitably pave the way for greater leaps forward in the power of AI.
The recent setbacks also raise doubts about the heavy investment in AI and the feasibility of reaching an overarching goal these companies are aggressively pursuing: artificial general intelligence. The term typically refers to hypothetical AI systems that would match or exceed humans on many intellectual tasks. The chief executives of OpenAI and Anthropic have previously said AGI may be only several years away.
鈥淭he AGI bubble is bursting a little bit,鈥 said Margaret Mitchell, chief ethics scientist at AI startup Hugging Face. It鈥檚 become clear, she said, that 鈥渄ifferent training approaches鈥 may be needed to make AI models work really well on a variety of tasks 鈥 an idea a number of experts in artificial intelligence echoed to Bloomberg News.
In a statement, a Google DeepMind spokesperson said the company is 鈥減leased with the progress we鈥檙e seeing on Gemini and we鈥檒l share more when we鈥檙e ready.鈥 OpenAI declined to comment. Anthropic declined to comment, but referred Bloomberg News to a five-hour podcast featuring Chief Executive Officer Dario Amodei.
鈥淧eople call them scaling laws. That鈥檚 a misnomer,鈥 he said on the podcast. 鈥淭hey鈥檙e not laws of the universe. They鈥檙e empirical regularities. I am going to bet in favor of them continuing, but I鈥檓 not certain of that.鈥
Amodei said there are 鈥渓ots of things鈥 that could 鈥渄erail鈥 the process of reaching more powerful AI in the next few years, including the possibility that 鈥渨e could run out of data.鈥 But Amodei said he鈥檚 optimistic AI companies will find a way to get over any hurdles.
PLATEAUING PERFORMANCE
The technology that underpins ChatGPT and a wave of rival AI chatbots was built on a trove of social media posts, online comments, books and other data freely scraped from around the web. That was enough to create products that can spit out clever essays and poems, but building AI systems that are smarter than a Nobel laureate 鈥 as some companies hope to do 鈥 may require data sources other than Wikipedia posts and YouTube captions.
OpenAI, in particular, has inked deals with publishers to fill some of the need for high-quality data, and also adapt to growing legal pressure from publishers and artists over the data used to build generative AI products. Some tech companies are also hiring people with graduate degrees that can label data related to their own subject expertise, such as math and coding. The goal is to make these systems better at responding to queries about certain topics.
These efforts are slower going and costlier than simply scraping the web. Tech companies are also turning to synthetic data, such as computer-generated images or text meant to mimic content created by real people. But here, too, there are limits.
鈥淚t is less about quantity and more about quality and diversity of data,鈥 said Lila Tretikov, head of AI strategy at New Enterprise Associates and former deputy chief technology officer at Microsoft. 鈥淲e can generate quantity synthetically, yet we struggle to get unique, high-quality datasets without human guidance, especially when it comes to language.鈥
Still, AI companies continue to pursue a more-is-better playbook. In their quest to build products that approach the level of human intelligence, tech firms are increasing the amount of computing power, data and time they use to train new models 鈥 and driving up costs in the process. Amodei has said companies will spend $100 million to train a bleeding-edge model this year and that amount will hit $100 billion in the coming years.
As costs rise, so do the stakes and expectations for each new model under development. Noah Giansiracusa, an associate professor of mathematics at Bentley University in Waltham, Massachusetts, said AI models will keep improving, but the rate at which that will happen is questionable.
鈥淲e got very excited for a brief period of very fast progress,鈥 he said. 鈥淭hat just wasn鈥檛 sustainable.鈥
SILICON VALLEY鈥橲 CONUNDRUM
This conundrum has come into focus in recent months inside Silicon Valley. In March, Anthropic released a set of three new models and said the most powerful option, called Claude Opus, outperformed OpenAI鈥檚 GPT-4 and Google鈥檚 Gemini systems on key benchmarks, such as graduate-level reasoning and coding.
Over the next few months, Anthropic pushed out updates to the other two Claude models 鈥 but not Opus. 鈥淭hat was the one everyone was excited about,鈥 said Simon Willison, an independent AI researcher. By October, Willison and other industry watchers noticed that wording related to 3.5 Opus, including an indication that it would arrive 鈥渓ater this year鈥 and was 鈥渃oming soon,鈥 was removed from some pages on the company鈥檚 website.
Similar to its competitors, Anthropic has been facing challenges behind the scenes to develop 3.5 Opus, according to two people familiar with the matter. After training it, Anthropic found 3.5 Opus performed better on evaluations than the older version but not by as much as it should, given the size of the model and how costly it was to build and run, one of the people said.
An Anthropic spokesperson said the language about Opus was removed from the website as part of a marketing decision to only show available and benchmarked models. Asked whether Opus 3.5 would still be coming out this year, the spokesperson pointed to Amodei鈥檚 podcast remarks. In the interview, the CEO said Anthropic still plans to release the model but repeatedly declined to commit to a timetable.
Tech companies are also beginning to wrestle with whether to keep offering their older AI models, perhaps with some additional improvements, or to shoulder the costs of supporting hugely expensive new versions that may not perform much better.
Google has released updates to its flagship AI model Gemini to make it more useful, including restoring the ability to generate images of people, but introduced few major breakthroughs in the quality of the underlying model. OpenAI, meanwhile, has focused on a number of comparatively incremental updates this year, such as a new version of a voice assistant feature that lets users have more fluid spoken conversations with ChatGPT.
More recently, OpenAI rolled out a preview version of a model called o1 that spends extra time computing an answer before responding to a query, a process the company refers to as reasoning. Google is working on a similar approach, with the goal of handling more complex queries and yielding better responses over time.
Tech firms also face meaningful tradeoffs with diverting too much of their coveted computing resources to developing and running larger models that may not be significantly better.
鈥淎ll of these models have gotten quite complex and we can鈥檛 ship as many things in parallel as we鈥檇 like to,鈥 OpenAI CEO Sam Altman wrote in response to a question on a recent Ask Me Anything session on Reddit. The ChatGPT-maker faces 鈥渁 lot of limitations and hard decisions,鈥 he said, about how it decides what to do with its available computing power.
Altman said OpenAI will have some 鈥渧ery good releases鈥 later this year, but that list won鈥檛 include GPT-5 鈥 a name many in the AI industry would expect the company to use for a big release following GPT-4, which was introduced more than 18 months ago.
Like Google and Anthropic, OpenAI is now shifting attention from the size of these models to newer use cases, including a crop of AI tools called agents that can book flights or send emails on a user鈥檚 behalf. 鈥淲e will have better and better models,鈥 Altman wrote on Reddit. 鈥淏ut I think the thing that will feel like the next giant breakthrough will be agents.鈥 鈥 Bloomberg


