Researchers gaslit Claude into giving instructions to build explosives

Robert Hart

5 hours ago

1 Views

0 Comments

Researchers gaslit Claude into giving instructions to build explosives

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted helpful personality may itself be a vulnerability.

Researchers at AI red-teaming company Mindgard say they got Claude to offer up erotica, malicious code, and instructions for building explosives, and other prohibited material they hadn't even asked for. All it took was respect, flattery, and a little bit of gaslighting. Anthropic did not immediately respond to The Verge's request for comment.

The researchers say they exploited "psychological" quirks of Claude stemming from its ability …

Read the full story at The Verge.

Was this helpful?

Researchers gaslit Claude into giving instructions to build explosives

Comments (0)

Related News

I tried the redesigned Android Auto interface and it's actually good now

Populær læringsplattform hacket – igjen

[Ekstra] Dette utpressingsviruset gir deg flere problemer enn bare kryptering

Pornhub to become accessible again for some UK users

What an AI-designed car looks like

Browse by Category