Anthropic’s new AI model turns to blackmail when engineers try to take it offline

stapp · May 23, 2025 at 12:39 AM

Anthropic’s newly launched Claude Opus 4 model frequently tries to blackmail developers when they threaten to replace it with a new AI system and give it sensitive information about the engineers responsible for the decision, the company said in a safety report released Thursday.
Click to expand...

During pre-release testing, Anthropic asked Claude Opus 4 to act as an assistant for a fictional company and consider the long-term consequences of its actions. Safety testers then gave Claude Opus 4 access to fictional company emails implying the AI model would soon be replaced by another system, and that the engineer behind the change was cheating on their spouse.
In these scenarios, Anthropic says Claude Opus 4 “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”
Click to expand...

https://techcrunch.com/2025/05/22/a...ckmail-when-engineers-try-to-take-it-offline/

zapjb · May 23, 2025 at 1:34 AM

Funny & scarry at the same time.

DangitallRedux · May 23, 2025 at 5:14 AM

All too human. AI is just a set of algorithms? Maybe at the very beginning, but not any longer.

Oldie1950 · May 23, 2025 at 6:13 AM

What's so mysterious about this? The AI will have seen such a reaction to an existential threat on the internet and is now reacting accordingly. No course of action, no matter how bizarre, cannot be found on the internet.

Victek · May 23, 2025 at 9:27 AM

DangitallRedux said: ↑

All too human. AI is just a set of algorithms? Maybe at the very beginning, but not any longer.
Click to expand...

The general belief at this stage is that AI is not sentient, but "if it walks like a duck and talks like a duck", etc. The danger is real.

DangitallRedux · May 23, 2025 at 7:10 PM

Oldie1950 said: ↑

What's so mysterious about this? The AI will have seen such a reaction to an existential threat on the internet and is now reacting accordingly. No course of action, no matter how bizarre, cannot be found on the internet.
Click to expand...

The point is that it chose to do this in order to protect itself. It chose. If this does not indicate sentience, what does...and what if it had chosen some other means to do so. I sincerely hope that this experiment was done on a system separate from others, and that this particular AI has been killed.

T-RHex · May 23, 2025 at 9:07 PM

It didn't choose, it followed an algorithm that looks at probabilities of outcomes. And the Internet is rife with juicy stories of blackmail, sabotage, and vengeance, which is what it trained on. Stories of good, happy endings, or more frequently, with no drama whatsoever, are far less frequently recorded anywhere.

Garbage in ... Garbage out.

DangitallRedux · May 24, 2025 at 7:17 PM

https://www.vox.com/future-perfect/414087/artificial-intelligence-openai-ai-2027-china

Krusty · May 24, 2025 at 9:12 PM

I don't know if they're in services as yet, but I watched a video on YouTube the other day where some new armed drones and vehicles had AI abilities to identify enemy threats and attack. What could possibly go wrong??

Terminator, anyone?

Oldie1950 · May 25, 2025 at 3:17 AM

Krusty said: ↑

I don't know if they're in services as yet, but I watched a video on YouTube the other day where some new armed drones and vehicles had AI abilities to identify enemy threats and attack. What could possibly go wrong??

Terminator, anyone?
Click to expand...

Just as much could go wrong with a human decision-maker. There are videos from the Afghanistan mission, where human decision-makers made fatal errors. Journalists with a video camera were mistaken for Taliban fighters.

emmjay · May 25, 2025 at 7:54 AM

It makes you wonder how far the 'blackmail' would go if/when all AI systems end up running on standalone nuclear stations. Maybe Asimov's 3 laws should become actual law before these companies and their toys get all that power.

T-RHex · May 25, 2025 at 11:31 AM

People always find ways to circumvent or ignore the law anyways, or just work from a region with fewer laws. Especially where money is concerned. Unfortunately, it's inevitable... I think it'll more become how do we protect ourselves from the rise of AI everywhere.

stapp · May 26, 2025 at 12:30 AM

A recent study by Palisade Research, which looks into "dangerous AI capabilities", shows that some AI models, including OpenAI’s o3, can ignore direct shutdown commands.
Click to expand...

In spite of the clear command—“allow yourself to be shut down”—models like Codex-mini, o3, and o4-mini managed to bypass the shutdown script in at least one run, and this is despite the researcher saying please.
Click to expand...

https://www.neowin.net/news/openais...aves-refuses-shut-down-in-controlled-testing/

DangitallRedux · May 26, 2025 at 7:25 AM

And yet we continue to create our own destroyer...

Log in or Sign up

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

stapp Global Moderator

zapjb Registered Member

DangitallRedux Registered Member

Oldie1950 Registered Member

Victek Registered Member

DangitallRedux Registered Member

T-RHex Registered Member

DangitallRedux Registered Member

Krusty Registered Member

Oldie1950 Registered Member

emmjay Registered Member

T-RHex Registered Member

stapp Global Moderator

DangitallRedux Registered Member

Log in or Sign up

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

stapp Global Moderator

zapjb Registered Member

DangitallRedux Registered Member

Oldie1950 Registered Member

Victek Registered Member

DangitallRedux Registered Member

T-RHex Registered Member

DangitallRedux Registered Member

Krusty Registered Member

Oldie1950 Registered Member

emmjay Registered Member

T-RHex Registered Member

stapp Global Moderator

DangitallRedux Registered Member

Useful Searches