Close Menu
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Facebook X (Twitter) WhatsApp
Trending
  • How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.
  • ‘The system is likely to reach a breaking point’: Major Italian volcano is speeding toward a transition, and a major eruption could be on the way
  • China installs world’s largest floating wind turbine in deep water test — it generates enough energy to power 4,200 homes annually
  • Diagnostic dilemma: A biopsy of a woman’s cancerous tumor caused it to vanish
  • Physicists confirm ‘negative time’ is real in mind-bending quantum experiment
  • Complex animals evolved up to 10 million years earlier than previously thought, fossil discovery shows
  • Scurvy-plagued whalers’ remains discovered at ‘Corpse Point’ on Arctic island
  • Some cancer patients don’t respond to immunotherapy. An existing asthma drug could change that.
Facebook X (Twitter) WhatsApp
Baynard Media
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Baynard Media
Home»Lifestyle»How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.
Lifestyle

How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.

EditorBy EditorMay 21, 2026No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

While the evolution of artificial intelligence (AI) systems has shown no sign of slowing, there’s a growing concern that large language models (LLMs) will soon run out of human-made data to ingest and learn from.

Once this happens, scientists say, AI models will increasingly rely on synthetic AI-made information, which will lead to an effect called “model collapse.” This is where LLMs spout gibberish and the AI systems they underpin deliver inaccurate answers and hallucinate information to queries far more commonly than they do today.

“That’s especially worrying considering some experts think that we will run out of high-quality human-generated data by the end of the year — so if you’re relying on this synthetic data, but there’s an almost existential threat it will sink your AI, you’re in trouble,” Yasser Roudi, a professor of disordered systems in the Department of Mathematics at King’s College London (KCL), told Live Science. “If, for example, you had LLMs that were used in hospitals to analyze brain scans and find cancers, if while training another model they experienced model collapse, these machines could misdiagnose people.”


You may like

However, Roudi recently found that model collapse can be bypassed by adding a single human-made data point to an AI’s training data, even if all the other data is AI-generated.

The study ‪—‬ which involved researchers from KCL, the Norwegian University of Science and Technology, and the Abdus Salam International Centre for Theoretical Physics in Italy ‪—‬ was published May 14 in the journal Physical Review Letters.

While AI model collapse hasn’t happened in a real-world scenario with an actively deployed AI system, anyone who uses tools like ChatGPT or Gemini to generate answers or text has very likely experienced errors or hallucinations. However, Roudi hopes the new findings might outline a method to sidestep this potential emergent threat.

Countering collapse

Beyond widely known hallucinations in primitive generative AI products, we may not have yet seen any dramatic examples of model collapse in the form of sophisticated AIs seemingly “going mad” and outputting complete nonsense. But signs of minor collapse could be observed when AI delivers increasingly inaccurate or bland answers to queries, or completely fabricates information while trying to generate some kind of output it assumes a user desires.

Get the world’s most fascinating discoveries delivered straight to your inbox.

By repeatedly training LLMs on data generated by other LLMs, the core truth and source of information ‪—‬ and spikes of variance between generations of models ‪—‬ get “smoothed out,” delivering homogenized answers and outputs. For example, text that might read well enough at first glance could lack any real detail or nuance. Essentially, model collapse can be split into ‘early’ and ‘late’ stages, where the former sees an AI lose the ability to serve up edge-case (rare and or less common) information and produce bland, synthetic-feeling responses, and the latter sees LLMs deliver gibberish information.

The huge scale of LLMs and the data they process can make it hard to establish how and why they hallucinate information, and how certain choices lead to model collapse.

To tackle this, the researchers used smaller models that belong to exponential families — a catch-all term for a number of probability distributions, like ascertaining the likely outcomes from random events. The bell curve is one such example, as is figuring out the chance that a coin flip will land on heads.


What to read next

“By looking at analytically tractable models such as the exponential families, you can answer those ‘why’ and ‘how’ questions,” Roudi said. “By that same logic, you can come up with ways to mitigate its dangerous effects, how those ways work, and ultimately apply them to real-life examples.”

The researchers discovered that by introducing a single external human-made data point to a pool of synthetic data used by a model undergoing closed-loop training, whereby a new model is trained on data generated by a previous models, they avoided model collapse.

Roudi said one example could be an AI-based image or video classifier, whereby an LLM is trained on data that includes a real image correctly classified by a human, rather than AI-generated media or media classified by an AI.

“In other words, this data point would be linked to a ‘ground truth,’ something we know undeniably to be true and independently verifiable,” Roudi said.

The next step for Roudi and the researchers is to apply this approach to larger and more complex models to see if this principle still holds true. This method could mitigate potentially “disastrous” scenarios of model collapse, especially within the AI models we use in everyday life, the team said.

“This research is the first step in setting out some ground rules for preventing this [from] happening in the future,” Roudi concluded. “While more work should be done, AI engineers making things like the next ChatGPT can use what we’ve found to develop models that don’t collapse.”

Jangjoo, F., Di Sarra, G., Marsili, M., & Roudi, Y. (2026). Lost in Retraining: Closed-Loop learning and model collapse in exponential families. Physical Review Letters, 136(19). https://doi.org/10.1103/156q-3ngc

Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous Article‘The system is likely to reach a breaking point’: Major Italian volcano is speeding toward a transition, and a major eruption could be on the way
Editor
  • Website

Related Posts

Lifestyle

‘The system is likely to reach a breaking point’: Major Italian volcano is speeding toward a transition, and a major eruption could be on the way

May 21, 2026
Lifestyle

China installs world’s largest floating wind turbine in deep water test — it generates enough energy to power 4,200 homes annually

May 21, 2026
Lifestyle

Diagnostic dilemma: A biopsy of a woman’s cancerous tumor caused it to vanish

May 21, 2026
Add A Comment

Comments are closed.

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Recent Posts
  • How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.
  • ‘The system is likely to reach a breaking point’: Major Italian volcano is speeding toward a transition, and a major eruption could be on the way
  • China installs world’s largest floating wind turbine in deep water test — it generates enough energy to power 4,200 homes annually
  • Diagnostic dilemma: A biopsy of a woman’s cancerous tumor caused it to vanish
  • Physicists confirm ‘negative time’ is real in mind-bending quantum experiment
calendar
May 2026
M T W T F S S
 123
45678910
11121314151617
18192021222324
25262728293031
« Apr    
Recent Posts
  • How can we prevent AI models from cannibalizing themselves when human-generated data runs out? Scientists say they’ve found the answer.
  • ‘The system is likely to reach a breaking point’: Major Italian volcano is speeding toward a transition, and a major eruption could be on the way
  • China installs world’s largest floating wind turbine in deep water test — it generates enough energy to power 4,200 homes annually
About

Welcome to Baynard Media, your trusted source for a diverse range of news and insights. We are committed to delivering timely, reliable, and thought-provoking content that keeps you informed
and inspired

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Facebook X (Twitter) Pinterest WhatsApp
  • Contact Us
  • About Us
  • Privacy Policy
  • Disclaimer
  • UNSUBSCRIBE
© 2026 copyrights reserved

Type above and press Enter to search. Press Esc to cancel.