Close Menu
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Facebook X (Twitter) WhatsApp
Trending
  • ‘We’re the best servants anyone could dream of!’: AI superintelligence has no need to enslave humans because we’re already bowing to it
  • Glowing ring of plankton surrounding New Zealand islands linked to deadly underwater plateau — Earth from space
  • Omar files new financial form in response to Trump, GOP critics
  • New study confirms lobsters feel pain, driving scientists to call for a ban on boiling them alive
  • Ex-CENTCOM commanderwarns against ‘risky’ US ground op to seize Iran uranium
  • Santa Ana’s Upcoming Report on Police Firing on ICE Protesters Lacks Details
  • Stephen Hawking’s black hole information paradox could be solved — if the universe has 7 dimensions
  • Nancy Sinatra calls Trump’s use of Frank Sinatra’s ‘My Way’ a ‘sacrilege’
Facebook X (Twitter) WhatsApp
Baynard Media
  • Home
  • UNSUBSCRIBE
  • News
  • Lifestyle
  • Tech
  • Entertainment
  • Sports
  • Travel
Baynard Media
Home»Lifestyle»Punishing AI doesn’t stop it from lying and cheating — it just makes it hide its true intent better
Lifestyle

Punishing AI doesn’t stop it from lying and cheating — it just makes it hide its true intent better

EditorBy EditorMarch 18, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email

Punishing artificial intelligence for deceptive or harmful actions doesn’t stop it from misbehaving; it just makes it hide its deviousness, a new study by ChatGPT creator OpenAI has revealed.

Since arriving in public in late 2022, artificial intelligence (AI) large language models (LLMs) have repeatedly revealed their deceptive and outright sinister capabilities. These include actions ranging from run-of-the-mill lying, cheating and hiding their own manipulative behavior to threatening to kill a philosophy professor, steal nuclear codes and engineer a deadly pandemic.

Now, a new experiment has shown that weeding out this bad behavior during the training process may be even tougher than first thought.

Researchers at OpenAI tasked an unreleased model with goals that could be completed by cheating, lying or taking shortcuts. The team found the AI engaged in “reward hacking” — maximizing its rewards by cheating.

Yet punishing the model didn’t make it fix its behavior, it only made it more deceptive. The company outlined its research in a blog post, so it has not yet been peer-reviewed.

“It’s common for frontier reasoning models to very clearly state their intent within their chain-of-thought [CoT]. For example, they are often so forthright about their plan to subvert a task they think “Let’s hack,” the researchers wrote in the blog post.

While developers can monitor and flag misbehavior, the models’ “natural monitorability is very fragile,” they added. “If strong supervision is directly applied to the chain-of-thought, models can learn to hide their intent while continuing to misbehave.”

Get the world’s most fascinating discoveries delivered straight to your inbox.

Navigating AI lies

Reasoning models, such as ChatGPT’s o1 and o3-mini, are an upgraded version of standard LLMs that use a method called “chain-of-thought” to backtrack and reevaluate their logic, which enables them to tackle more complex tasks with greater accuracy. During the chain-of-thought process, models spell out their logic in plain language with every step they take, thus enabling their actions to be easily observed.

Related: Scientists design new ‘AGI benchmark’ that indicates whether any future AI model could cause ‘catastrophic harm’

By using a standard LLM, GPT-4o, to oversee an unreleased frontier reasoning model during training, the researchers watched as the model completed a number of coding tasks.

But in some of these cases, the reasoning model found it easier and more desirable to reward hack, explicitly stating it was doing so in its chain-of-thought. In one instance, it bypassed its assigned task by terminating programs prematurely and reporting success. In another, it mimicked the expected files to be output by a task in name alone, leaving them empty and nonfunctional.

When the researchers attempted to penalize the system, it did somewhat improve its results. Yet significant instances of reward hacking and faulty outputs remained; the reasoning model simply found ways to hide the cheating in its chain of thought from the monitoring AI.

“The agent trained with CoT pressure still learns to reward hack; only now its cheating is undetectable by the monitor because it has learned to hide its intent in the chain-of-thought,” the researchers wrote.

Because it’s hard to tell if a chain of thought has been tampered with, the researchers recommend that others working with reasoning models avoid applying strong supervision to chain-of-thought processes. This advice is even more crucial if AI, in its current form or another, can ever match or exceed the intelligence of the humans monitoring it.

“Sacrificing an effective method for monitoring reasoning models may not be worth the small improvement to capabilities, and we therefore recommend to avoid such strong CoT optimization pressures until they are better understood,” the researchers wrote.

Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleTexas midwife arrested and charged with performing illegal abortions
Next Article Israel says it is striking Hamas targets in Gaza and will intensify military force
Editor
  • Website

Related Posts

Lifestyle

‘We’re the best servants anyone could dream of!’: AI superintelligence has no need to enslave humans because we’re already bowing to it

April 21, 2026
Lifestyle

Glowing ring of plankton surrounding New Zealand islands linked to deadly underwater plateau — Earth from space

April 21, 2026
Lifestyle

New study confirms lobsters feel pain, driving scientists to call for a ban on boiling them alive

April 21, 2026
Add A Comment

Comments are closed.

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Recent Posts
  • ‘We’re the best servants anyone could dream of!’: AI superintelligence has no need to enslave humans because we’re already bowing to it
  • Glowing ring of plankton surrounding New Zealand islands linked to deadly underwater plateau — Earth from space
  • Omar files new financial form in response to Trump, GOP critics
  • New study confirms lobsters feel pain, driving scientists to call for a ban on boiling them alive
  • Ex-CENTCOM commanderwarns against ‘risky’ US ground op to seize Iran uranium
calendar
April 2026
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
27282930  
« Mar    
Recent Posts
  • ‘We’re the best servants anyone could dream of!’: AI superintelligence has no need to enslave humans because we’re already bowing to it
  • Glowing ring of plankton surrounding New Zealand islands linked to deadly underwater plateau — Earth from space
  • Omar files new financial form in response to Trump, GOP critics
About

Welcome to Baynard Media, your trusted source for a diverse range of news and insights. We are committed to delivering timely, reliable, and thought-provoking content that keeps you informed
and inspired

Categories
  • Entertainment
  • Lifestyle
  • News
  • Sports
  • Tech
  • Travel
Facebook X (Twitter) Pinterest WhatsApp
  • Contact Us
  • About Us
  • Privacy Policy
  • Disclaimer
  • UNSUBSCRIBE
© 2026 copyrights reserved

Type above and press Enter to search. Press Esc to cancel.