Close Menu
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Facebook X (Twitter) WhatsApp
Trending
  • Itoje reveals Lions motivation: 'We have the potential to achieve something special!'
  • Taylor Swift Supports Travis Kelce in Happy Gilmore 2
  • 13,000 photos leaked after 4chan call to action
  • Building blocks of life may be far more common in space than we thought, study claims
  • Rodgers remains confident after throwing pick in first Steelers training camp
  • X is changing Community Notes again
  • Emma Raducanu beats Maria Sakkari to reach semi-finals of Mubadala Citi DC Open in Washington | Tennis News
  • Rebecca Gayheart’s Rare Glimpse of Her, Eric Dane’s Daughters 
Get Your Free Email Account
Facebook X (Twitter) WhatsApp
Baynard Media
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Baynard Media
Home»Lifestyle»AI could soon think in ways we don’t even understand, increasing the risk of misalignment — scientists at Google, Meta and OpenAI warn
Lifestyle

AI could soon think in ways we don’t even understand, increasing the risk of misalignment — scientists at Google, Meta and OpenAI warn

EditorBy EditorJuly 24, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Researchers behind some of the most advanced artificial intelligence (AI) on the planet have warned that the systems they helped to create could pose a risk to humanity.

The researchers, who work at companies including Google DeepMind, OpenAI, Meta, Anthropic and others, argue that a lack of oversight on AI’s reasoning and decision-making processes could mean we miss signs of malign behavior.

In the new study, published July 15 to the arXiv preprint server (which hasn’t been peer-reviewed), the researchers highlight chains of thought (CoT) — the steps large language models (LLMs) take while working out complex problems. AI models use CoTs to break down advanced queries into intermediate, logical steps that are expressed in natural language.


You may like

The study’s authors argue that monitoring each step in the process could be a crucial layer for establishing and maintaining AI safety.

Monitoring this CoT process can help researchers to understand how LLMs make decisions and, more importantly, why they become misaligned with humanity’s interests. It also helps determine why they give outputs based on data that’s false or doesn’t exist, or why they mislead us.

However, there are several limitations when monitoring this reasoning process, meaning such behavior could potentially pass through the cracks.

Related: AI can now replicate itself — a milestone that has experts terrified

Get the world’s most fascinating discoveries delivered straight to your inbox.

“AI systems that ‘think’ in human language offer a unique opportunity for AI safety,” the scientists wrote in the study. “We can monitor their chains of thought for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed.”

The scientists warned that reasoning doesn’t always occur, so it cannot always be monitored, and some reasoning occurs without human operators even knowing about it. There might also be reasoning that human operators don’t understand.

Keeping a watchful eye on AI systems

One of the problems is that conventional non-reasoning models like K-Means or DBSCAN — use sophisticated pattern-matching generated from massive datasets, so they don’t rely on CoTs at all. Newer reasoning models like Google’s Gemini or ChatGPT, meanwhile, are capable of breaking down problems into intermediate steps to generate solutions — but don’t always need to do this to get an answer. There’s also no guarantee that the models will make CoTs visible to human users even if they take these steps, the researchers noted.

“The externalized reasoning property does not guarantee monitorability — it states only that some reasoning appears in the chain of thought, but there may be other relevant reasoning that does not,” the scientists said. “It is thus possible that even for hard tasks, the chain of thought only contains benign-looking reasoning while the incriminating reasoning is hidden.”A further issue is that CoTs may not even be comprehensible by humans, the scientists said. “

New, more powerful LLMs may evolve to the point where CoTs aren’t as necessary. Future models may also be able to detect that their CoT is being supervised, and conceal bad behavior.

To avoid this, the authors suggested various measures to implement and strengthen CoT monitoring and improve AI transparency. These include using other models to evaluate an LLMs’s CoT processes and even act in an adversarial role against a model trying to conceal misaligned behavior. What the authors don’t specify in the paper is how they would ensure the monitoring models would avoid also becoming misaligned.

They also suggested that AI developers continue to refine and standardize CoT monitoring methods, include monitoring results and initiatives in LLMs system cards (essentially a model’s manual) and consider the effect of new training methods on monitorability.

“CoT monitoring presents a valuable addition to safety measures for frontier AI, offering a rare glimpse into how AI agents make decisions,” the scientists said in the study. “Yet, there is no guarantee that the current degree of visibility will persist. We encourage the research community and frontier AI developers to make best use of CoT monitorability and study how it can be preserved.”

Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleIs tequila the healthiest alcohol? Expert weighs in on facts and myths
Next Article Tulsi Gabbard’s unprecedented accusation against Obama and ‘The Osbournes’ cultural impact: Morning Rundown
Editor
  • Website

Related Posts

Lifestyle

Building blocks of life may be far more common in space than we thought, study claims

July 25, 2025
Lifestyle

The more advanced AI models get, the better they are at deceiving us — they even know when they’re being tested

July 25, 2025
Lifestyle

Moon, Mars, and meteors: Why July 28 is the best night for skywatching all summer

July 25, 2025
Add A Comment

Comments are closed.

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Recent Posts
  • Itoje reveals Lions motivation: 'We have the potential to achieve something special!'
  • Taylor Swift Supports Travis Kelce in Happy Gilmore 2
  • 13,000 photos leaked after 4chan call to action
  • Building blocks of life may be far more common in space than we thought, study claims
  • Rodgers remains confident after throwing pick in first Steelers training camp
calendar
July 2025
M T W T F S S
 123456
78910111213
14151617181920
21222324252627
28293031  
« May    
Recent Posts
  • Itoje reveals Lions motivation: 'We have the potential to achieve something special!'
  • Taylor Swift Supports Travis Kelce in Happy Gilmore 2
  • 13,000 photos leaked after 4chan call to action
About

Welcome to Baynard Media, your trusted source for a diverse range of news and insights. We are committed to delivering timely, reliable, and thought-provoking content that keeps you informed
and inspired

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Facebook X (Twitter) Pinterest WhatsApp
  • Contact Us
  • About Us
  • Privacy Policy
  • Disclaimer
  • UNSUBSCRIBE
© 2025 copyrights reserved

Type above and press Enter to search. Press Esc to cancel.