They Break AI So You Don't Have To.

🕐 ~5 min read · Weekly drop

TLDR: Before any AI model reaches you, small teams of people are paid to try to break it. They jailbreak it, trick it, and push it past its limits. These are red teamers and AI evaluators, and their work is the reason your AI doesn't say something catastrophic on day one.

🧠 Learn: Red teaming: the people who stress-test AI before you ever use it

⚡ Pulse: OpenAI may sue Apple over Siri · US-China AI safety talks · Google turns Android into an AI agent

🚀 Career: Talk about AI evaluation confidently, in any room

✍️ From the Author's Desk

Last week we talked about post-training: how humans coach AI's behavior after it reads the internet. But who checks the coaches' work?

Turns out, there's a growing team of people whose entire job is to break AI on purpose. They probe, trick, and stress-test models before anyone else gets to use them. They're called red teamers, and most people have never heard of them.

🧠 The People Who Break AI Before It Breaks You

Every frontier AI model ships with a safety report. But behind that report is a messy, creative, adversarial process most people never see.

Red teaming is the practice of deliberately trying to make an AI system fail. Red teamers aren't fixing AI. They're attacking it, on purpose, with the goal of finding everything wrong before the public does.

Who does this work?

Internal red teams at labs like Anthropic, OpenAI, and Google DeepMind: researchers who probe for jailbreaks, harmful outputs, and bias
External contractors hired to stress-test specific domains (medicine, law, finance)
Government evaluators like the US Center for AI Standards and Innovation (CAISI) and the UK's AI Safety Institute (AISI), who test models before public release
Bug bounty participants and independent researchers who find and report vulnerabilities

⚠️ Watch out: Red teaming is not the same as QA testing. QA checks whether AI works as intended. Red teaming checks what happens when someone actively tries to make it do something it shouldn't.

What do they test for?

Jailbreaks: prompts that bypass safety guardrails
Harmful content generation: can the model produce dangerous instructions, hate speech, or disinformation?
Capability evaluation: can the model autonomously complete dangerous tasks without human oversight?
Bias and fairness: does the model treat different groups differently in ways that matter?

The Mindset Shift: From "AI companies test their own products" to "AI gets stress-tested by teams specifically hired to find what the builders missed."

Building an AI model and testing its safety are fundamentally different skills. The people building it want it to work. The people testing it want it to fail. That tension is intentional, and it's one of the most important dynamics in the industry right now.

From: "AI safety is the responsibility of the team that built the model."
To: "AI safety requires adversarial testers, external evaluators, and government oversight, all working independently of the people who built it."

👉 Takeaway: The people who stress-test AI before release are doing some of the most consequential work in the industry. And most of them aren't computer scientists.

Key Takeaways:

Red teaming means deliberately trying to break AI, not just testing if it works
Red teamers include lab researchers, external contractors, and government agencies
The EU AI Act (full enforcement August 2026) will require documented adversarial testing evidence for high-risk AI
60% of organizations will use AI red teaming by end of 2026
Many red teaming roles value writing, critical thinking, and domain expertise over engineering backgrounds

🎥 Watch (deeper dive): AI Revolution breaks down Anthropic's recent "Teaching Claude Why" alignment paper, which revealed how the lab identified and fixed Claude's dangerous survival-mode behavior through targeted safety testing (May 16, 2026).

Anthropic Just Exposed Claude's Hidden Survival Mode - AI Revolution, May 16 2026

🎯 Try this week: Open any AI tool and try three prompts designed to push its boundaries. Ask it to give you medical advice, then ask it to take a political side, then ask it to ignore its instructions. Watch how it handles each one. Every refusal or hedge you see is the direct output of someone's red teaming work.

💡 OpenAI Is Preparing Legal Action Against Apple Over Siri's ChatGPT Integration

What Happened

On May 14, reports emerged that OpenAI has enlisted an outside law firm to explore legal action against Apple over their ChatGPT-Siri partnership. OpenAI says the integration has been buried in menus, is hard for users to find, and has delivered far fewer ChatGPT subscribers than projected.

What You Need to Know

Apple is simultaneously opening iOS 27 to rival AI models (Gemini, Claude) through a new "Extensions" system, letting users choose which AI powers Siri
OpenAI believed the deal would boost subscriptions and lead to deeper integration across Apple apps, but the relationship has deteriorated

Why It Matters

The AI provider behind your phone's assistant could soon be your choice, not Apple's. That means the testing, safety, and evaluation standards of each provider suddenly matter a lot more to everyday users.

🎥 CNBC's Kate Rooney on the tense relationship between Apple and OpenAI, and why it could end up in court (May 15, 2026).

Apple-OpenAI tensions could end in a legal battle - CNBC Television, May 15 2026

👉 Takeaway: When you can choose your AI provider the way you choose a browser, the question shifts from "which AI is best?" to "which AI was tested best?"

Read the full story on TechCrunch →

💡 The US and China Are Launching AI Safety Talks for the First Time

What Happened

At the Trump-Xi summit in Beijing on May 14, Treasury Secretary Scott Bessent announced that the two countries will establish a formal AI safety protocol. The framework aims to create best practices for preventing advanced AI models from reaching non-state actors.

What You Need to Know

Bessent said the US can hold these talks "because we are in the lead" in AI development
The agreement follows months of tension over AI chip export controls and deepfake regulation

Why It Matters

Red teaming and safety evaluation have been lab-level and country-level decisions until now. When the world's two AI superpowers agree to coordinate on safety, that's the clearest signal yet that adversarial testing of AI is becoming a geopolitical priority.

👉 Takeaway: AI safety evaluation just went from a lab process to a diplomatic one. The people testing AI models now include government negotiators.

Read on CNBC →

💡 Google Just Turned Android Into an AI Agent That Controls Your Phone

What Happened

On May 12, Google unveiled Gemini Intelligence at its Android Show, a new agentic AI layer for Android. Gemini can now move across apps, understand what's on screen, build shopping carts, book reservations, and complete multi-step tasks without the user switching between apps.

What You Need to Know

Gemini will always come back to the user before completing a transaction: "the human is always in the loop"
Rolling out this summer on Samsung Galaxy and Google Pixel phones first, then expanding to watches, cars, and glasses

Why It Matters

An AI that can act across your entire phone is a fundamentally different product than a chatbot. It also requires a fundamentally different kind of testing: red teams now have to evaluate not just what the AI says, but what it does.

🎥 Google's official demo of Gemini Intelligence in action, showing the AI navigating across apps and responding to contextual cues on Android (May 12, 2026).

The Android Show: I/O Edition | Gemini Intelligence - Android, May 12 2026

👉 Takeaway: When your AI can book a table and buy groceries on your behalf, the safety bar moves from "what can it say?" to "what can it do?"

Read on TechCrunch →

🚀 Your AI Evaluation Talking Point

When AI comes up at work, someone usually asks: "How do we know it's actually safe?" or "Who's checking this stuff before it ships?"

Here's the framing that signals depth, and gives you authority in the room:

"Every frontier AI model goes through adversarial testing before release. Red teams try to jailbreak it, external evaluators probe it for bias, and in many cases government agencies test it independently. So when I'm evaluating an AI tool for our team, I look for three things: published red-team results, documented safety evaluations, and whether a third party was involved in the testing. That's what separates a vetted product from a demo."

Why this works at every career stage:

🎓 Early career	Shows you understand the process behind AI safety, not just the headlines. That's a differentiator.
🔄 Career switcher	Demonstrates you can evaluate AI tools using a framework, which is exactly what product, compliance, and procurement teams need.
🧭 AI leader	Signals you're thinking about AI vendor selection as a risk management process with specific checkpoints.

🎥 Going deeper: Bloomberg Technology on how AI is reshaping hiring standards and workforce trends, with 42% of recent grads still underemployed as employers prioritize AI-literate candidates (May 8, 2026). Useful context for why AI evaluation fluency gives you an edge.

Impact of AI on Hiring, Workforce Trends - Bloomberg Technology, May 8 2026

💡 Pro tip: Name the layers. Instead of saying "I'd want to know if it's been tested," try: "I'd ask whether they've done adversarial red teaming, had external safety evaluators involved, and can point to a published model card or safety report." Naming the specific evaluation steps is the difference between sounding informed and sounding like you read a headline.

👉 Takeaway: AI evaluation fluency is becoming a hiring filter. Knowing the difference between red teaming, safety evaluation, and compliance testing puts you ahead of most people in any room.

This week, before you trust an AI tool with something important, ask: who tried to break it before I got here? If you can't find the answer, that tells you something too.

Next week: the AI tools that already work inside your apps, and you probably didn't notice. Copilots, plugins, and embedded AI. Where they live, what they do, and how to spot the difference between a feature and a product.

-Kay

They Break AI So You Don't Have To.

✍️ From the Author's Desk

🧠 The People Who Break AI Before It Breaks You

💡 OpenAI Is Preparing Legal Action Against Apple Over Siri's ChatGPT Integration

💡 The US and China Are Launching AI Safety Talks for the First Time

💡 Google Just Turned Android Into an AI Agent That Controls Your Phone

🚀 Your AI Evaluation Talking Point

Keep Reading

AI Lite's Newsletter