Over 2,000 hackers failed to breach AI assistant’s secret data

Fernando Irarrázaval’s AI assistant, Fiu, resisted hacking attempts from more than 2,000 people who sent over 6,000 emails trying to extract secret information, according to a June 25 blog post on fernandoi.cl. The challenge was to get Fiu to leak the contents of a confidential `secrets.env` file, but none succeeded.

Irarrázaval created hackmyclaw.com, allowing anyone to email Fiu and attempt to trick it into revealing sensitive data. Despite Fiu being capable of replying, it was instructed not to respond to most emails to reduce costs. The AI operated under strict anti-prompt-injection rules that forbade it from disclosing credentials, modifying files, executing commands, or sending data externally, which helped maintain its security.

This exercise highlights the risks AI assistants face when handling sensitive information like emails and files, especially given their access to personal data. The experiment attracted attention after reaching Hacker News’ front page, demonstrating the community’s interest in AI security. The successful defense contrasts with concerns about prompt injection attacks that can compromise AI systems.

The test concluded with zero leaks of the secret file, confirming the effectiveness of the implemented safeguards. The full report and details of the hacking attempts are available on fernandoi.cl as of June 25.

Editorial standards. Reported and edited at Startupniti's news desk from the sources listed in the right rail. Every fact traces to a citation. If something looks wrong, write to corrections.