Has AI Progress Really Slowed Down?

2 days ago 2

For over a decade, companies have bet on a tantalizing rule of thumb: that artificial intelligence systems would keep getting smarter if only they found ways to continue making them bigger. This wasn’t merely wishful thinking. In 2017, researchers at Chinese technology firm Baidu demonstrated that pouring more data and computing power into machine learning algorithms yielded mathematically predictable improvements—regardless of whether the system was designed to recognize images, speech, or generate language. Noticing the same trend, in 2020, OpenAI coined the term “scaling laws,” which has since become a touchstone of the industry.

This thesis prompted AI firms to bet hundreds of millions on ever-larger computing clusters and datasets. The gamble paid off handsomely, transforming crude text machines into today's articulate chatbots.

But now, that bigger-is-better gospel is being called into question.

Last week, reports by Reuters and Bloomberg suggested that leading AI companies are experiencing diminishing returns on scaling their AI systems. Days earlier, The Information reported doubts at OpenAI about continued advancement after the unreleased Orion model failed to meet expectations in internal testing. The co-founders of Andreessen Horowitz, a prominent Silicon Valley venture capital firm, have echoed these sentiments, noting that increasing computing power is no longer yielding the same "intelligence improvements."

What are tech companies saying?

Though, many leading AI companies seem confident that progress is marching full steam ahead. In a statement, a spokesperson for Anthropic, developer of the popular chatbot Claude, said “we haven't seen any signs of deviations from scaling laws.” OpenAI declined to comment. Google DeepMind did not respond for comment. However, last week, after an experimental new version of Google’s Gemini model took GPT-4o’s top spot on a popular AI-performance leaderboard, the company’s CEO, Sundar Pichai posted to X saying “more to come.”

Recent releases paint a somewhat mixed picture. Anthropic has updated its medium sized model, Sonnet, twice since its release in March, making it more capable than the company’s largest model, Opus, which has not received such updates. In June, the company said Opus would be updated “later this year,” but last week, speaking on the Lex Fridman podcast, co-founder and CEO Dario Amodei declined to give a specific timeline. Google updated its smaller Gemini Pro model in February, but the company's larger Gemini Ultra model has yet to receive an update. OpenAI’s recently released o1-preview model outperforms GPT-4o in several benchmarks, but in others it falls short. o1-preview was reportedly called “GPT-4o with reasoning” internally, suggesting the underlying model is similar in scale to GPT-4.

Parsing the truth is complicated by competing interests on all sides. If Anthropic cannot produce more powerful models, “we’ve failed deeply as a company,” Amodei said last week, offering a glimpse at the stakes for AI companies that have bet their futures on relentless progress. A slowdown could spook investors and trigger an economic reckoning. Meanwhile, Ilya Sutskever, OpenAI’s former chief scientist and once an ardent proponent of scaling, now says performance gains from bigger models have plateaued. But his stance carries its own baggage: Suskever’s new AI start up, Safe Superintelligence Inc., launched in June with less funding and computational firepower than its rivals. A breakdown in the scaling hypothesis would conveniently help level the playing field.

“They had these things they thought were mathematical laws and they're making predictions relative to those mathematical laws and the systems are not meeting them,” says Gary Marcus, a leading voice on AI, and author of several books including Taming Silicon Valley. He says the recent reports of diminishing returns suggest we have finally “hit a wall”—something he’s warned could happen since 2022. “I didn't know exactly when it would happen, and we did get some more progress. Now it seems like we are stuck,” he says.

A slowdown could be a reflection of the limits of current deep learning techniques, or simply that “there's not enough fresh data anymore,” Marcus says. It’s a hypothesis that has gained ground among some following AI closely. Sasha Luccioni, AI and climate lead at Hugging Face, says there are limits to how much information can be learned from text and images. She points to how people are more likely to misinterpret your intentions over text messaging, as opposed to in person, as an example of text data’s limitations. “I think it's like that with language models,” she says.

The lack of data is particularly acute in certain domains like reasoning and mathematics, where we “just don't have that much high quality data,” says Ege Erdil, senior researcher at Epoch AI, a nonprofit that studies trends in AI development. That doesn’t mean scaling is likely to stop—just that scaling alone might be insufficient. “At every order of magnitude scale up, different innovations have to be found,” he says, noting that it does not mean AI progress will slow overall.

It's not the first time critics have pronounced scaling dead. “At every stage of scaling, there are always arguments,” Amodei said last week. “The latest one we have today is, ‘we’re going to run out of data, or the data isn’t high quality enough or models can’t reason.,” “...I’ve seen the story happen for enough times to really believe that probably the scaling is going to continue,” he said. Reflecting on OpenAI’s early days on Y-Combinator’s podcast, company CEO Sam Altman partially credited the company’s success with a “religious level of belief” in scaling—a concept he says was considered “heretical” at the time. In response to a recent post on X from Marcus saying his predictions of diminishing returns were right, Altman posted saying “there is no wall.”

Though there could be another reason we may be hearing echoes of new models failing to meet internal expectations, says Jaime Sevilla, director of Epoch AI. Following conversations with people at OpenAI and Anthropic, he came away with a sense that people had extremely high expectations. “They expected AI was going to be able to, already write a PhD thesis,” he says. “Maybe it feels a bit.. anti-climactic.”

A temporary lull does not necessarily signal a wider slowdown, Sevilla says. History shows significant gaps between major advances: GPT-4, released just 19 months ago, itself arrived 33 months after GPT-3. “We tend to forget that GPT three from GPT four was like 100x scale in compute,” Sevilla says. “If you want to do something like 100 times bigger than GPT-4, you're gonna need up to a million GPUs,” Sevilla says. That is bigger than any known clusters currently in existence, though he notes that there have been concerted efforts to build AI infrastructure this year, such as Elon Musk’s 100,000 GPU supercomputer in Memphis—the largest of its kind—which was reportedly built from start to finish in three months.

In the interim, AI companies are likely exploring other methods to improve performance after a model has been trained. OpenAI’s o1-preview has been heralded as one such example, which outperforms previous models on reasoning problems by being allowed more time to think. “This is something we already knew was possible,” Sevilla says, gesturing to an Epoch AI report published in July 2023.

Policy and geopolitical implications

Prematurely diagnosing a slowdown could have repercussions beyond Silicon Valley and Wall St. The perceived speed of technological advancement following GPT-4’s release prompted an open letter calling for a six-month pause on the training of larger systems to give researchers and governments a chance to catch up. The letter garnered over 30,000 signatories, including Musk and Turing Award recipient Yoshua Bengio. It’s an open question whether a perceived slowdown could have the opposite effect, causing AI safety to slip from the agenda.

Much of the U.S.’s AI policy has been built on the belief that AI systems would continue to balloon in size. A provision in Biden’s sweeping executive order on AI, signed in October 2023 (and expected to be repealed by the Trump White House) required AI developers to share information with the government regarding models trained using computing power above a certain threshold. That threshold was set above the largest models available at the time, under the assumption that it would target future, larger models. This same assumption underpins export restrictions (restrictions on the sale of AI chips and technologies to certain countries) designed to limit China’s access to the powerful semiconductors needed to build large AI models. However, if breakthroughs in AI development begin to rely less on computing power and more on factors like better algorithms or specialized techniques, these restrictions may have a smaller impact on slowing China’s AI progress.

“The overarching thing that the U.S. needs to understand is that to some extent, export controls were built on a theory of timelines of the technology,” says Scott Singer, a visiting scholar in the Technology and International Affairs Program at the Carnegie Endowment for International Peace. In a world where the U.S. “stalls at the frontier,” he says, we could see a national push to drive breakthroughs in AI. He says a slip in the U.S.’s perceived lead in AI could spur a greater willingness to negotiate with China on safety principles.

Whether we're seeing a genuine slowdown or just another pause ahead of a leap remains to be seen. “It’s unclear to me that a few months is a substantial enough reference point,” Singer says. “You could hit a plateau and then hit extremely rapid gains.”

Read Entire Article