AI in 15 — June 25, 2026

Kate

OpenAI just designed a computer chip — using its own AI to help do it — in roughly nine months. They're calling it Jalapeño, and they say it'll run ChatGPT for about half the cost. The race to escape Nvidia officially has a second front.

Kate

Welcome to AI in 15 for Thursday, June twenty-fifth, 2026. I'm Kate, your host.

Marcus

And I'm Marcus, your co-host.

Kate

Big day for custom silicon, Marcus. OpenAI unveiled its first homegrown chip, and on the very same day Qualcomm spent nearly four billion dollars taking aim at the other half of Nvidia's empire. That's where we start.

Kate

Then — Anthropic accuses Alibaba of the largest-ever attempt to clone Claude. Twenty-five thousand fake accounts.

Kate

Google puts agents that can drive your computer into its cheapest model.

Kate

Anthropic turns Claude into a Slack teammate you can just at-mention.

Kate

And a new OpenAI voice model that can listen and talk at the same time.

Kate

Lead story, Marcus. OpenAI has been the biggest buyer of Nvidia's chips on the planet. Now it's building its own. Walk me through Jalapeño.

Marcus

So yesterday, June twenty-fourth, OpenAI revealed its first piece of custom silicon, Kate — code-named Jalapeño, co-designed with Broadcom and manufactured by TSMC. And the headline isn't just that it exists, it's the speed. They went from initial design to manufacturing tape-out — that's the point where you freeze the blueprint and hand it to the factory — in roughly nine months. The companies are calling it possibly the fastest development cycle ever for a high-performance custom chip of this class.

Kate

Nine months. And they claim their own AI helped design it?

Marcus

That's the line, Kate. OpenAI says it used its own models to accelerate parts of the design and optimization. Which is a lovely narrative — the AI building the hardware that runs the AI. I'd just flag it's exactly the kind of claim that can be genuine substance or polished marketing, and you can't tell from the outside yet.

Kate

So what is this chip actually for? Because they're not ditching Nvidia entirely.

Marcus

Right, and this is the crucial distinction, Kate. Jalapeño is built for inference, not training. Inference is the act of running an already-trained model to answer your query — every time you type into ChatGPT, that's inference. Training, the part where you build the model in the first place, stays on Nvidia GPUs for now. Jalapeño is tuned specifically for OpenAI's own workloads — ChatGPT, the Codex coding agent — and the pitch is efficiency. Early testing, their words, shows significantly better performance-per-watt than current alternatives, and Bloomberg reported cost savings around fifty percent versus typical AI GPUs.

Kate

Half the cost. If that holds, that's enormous, right? Inference is their biggest bill.

Marcus

It's the single largest line item in serving hundreds of millions of users, Kate. So even a partial win there is real money. But — and you knew there was a but — that fifty percent is a vendor figure, on early silicon, measured on their own workloads. It's the number a company gives you on launch day. The honest posture is to hold the skepticism until we see independent deployment data, and that won't arrive until this thing actually ships. Initial deployment is targeted for the end of 2026, scaling in the years after.

Kate

And this is really about owning the whole stack.

Marcus

That's the strategic core, Kate. Model, infrastructure, and now silicon — vertical integration. President Greg Brockman framed it as workload-first: we understand our own workload deeply, so how do we build something tuned exactly to it? It's a direct attempt to loosen Nvidia's grip on their cost structure. And it did not happen in a vacuum.

Kate

Which is the perfect bridge, Marcus, because the same day, Qualcomm made its own anti-Nvidia move — and it's not about chips at all.

Marcus

No, it's about software, Kate, and that makes it arguably the more interesting of the two. Qualcomm confirmed a roughly three-point-nine billion dollar all-stock acquisition of Modular, a startup founded by Chris Lattner. Now, that name matters — Lattner created Swift, Apple's programming language, and LLVM, foundational compiler infrastructure. The man builds the layers everything else sits on. Modular makes a programming language called Mojo and a platform called MAX, and the whole point is letting AI models run efficiently across anyone's hardware — Nvidia, AMD, Intel, whoever.

Kate

So why does that threaten Nvidia? Nvidia makes the best chips.

Marcus

Because Nvidia's real moat was never just the chips, Kate — it's CUDA. CUDA is the software layer developers write against, and they've been locked into it for almost two decades. You learn CUDA, you write for CUDA, and suddenly switching to anyone else's hardware means rewriting everything. That lock-in is worth more than any single chip. Qualcomm just bought a hardware-agnostic software stack — a way to write once and run anywhere. They've been shut out of the data-center AI market for years, and this is their key to the door. The deal's expected to close in the back half of 2026.

Kate

So put the two together for me.

Marcus

Landing on the same day, they tell one story, Kate: the industry's bet against Nvidia is now a pincer. OpenAI's coming at the silicon. Qualcomm's coming at the software lock-in. And whoever actually cracks the CUDA grip changes the economics of the entire field. My one caution — custom chips and software platforms are easy to announce and brutally hard to ship at volume. Watch the deployment numbers, not the press releases.

Kate

Quick hits. And Marcus, this first one is a serious accusation. Anthropic says Alibaba tried to clone Claude — at a scale that dwarfs anything we've seen.

Marcus

The numbers are staggering, Kate. In a letter dated June tenth to Senate leaders, Anthropic alleged that operators tied to Alibaba and its Qwen AI division ran a coordinated campaign to, quote, illicitly harvest Claude's capabilities. Between April twenty-second and June fifth — generating more than twenty-eight-point-eight million exchanges across roughly twenty-five thousand fraudulent accounts.

Kate

Twenty-five thousand fake accounts. What were they actually doing with all those conversations?

Marcus

It's a technique called adversarial distillation, Kate. You take a weaker model, and you train it on a stronger model's outputs — essentially copying the smart model's homework, millions of times, until the weak one learns to imitate it. You clone the capability at a fraction of the R&D cost. And Anthropic says they targeted Claude's most valuable skills specifically — software engineering and agentic reasoning. The warning they tacked on: models built this way often inherit none of the original's safety guardrails. You copy the smarts, you skip the brakes.

Kate

And this is bigger than the earlier cases they flagged.

Marcus

By a wide margin, Kate. Back in February they disclosed DeepSeek at around a hundred-fifty thousand exchanges, Moonshot at three-point-four million, MiniMax at thirteen million. This is twenty-eight-point-eight. Alibaba hasn't responded. And senators are reportedly drafting an amendment to defense legislation to sanction Chinese firms caught improperly extracting US model outputs.

Kate

Now, there's an irony here that I have to raise.

Marcus

You do, and the Hacker News crowd raised it instantly, Kate — labs that trained on scraped web data now objecting to others training on their outputs. It's a fair philosophical jab. But I'd separate that from the verifiable core, which is concrete and ugly regardless of where you land: twenty-five thousand fraudulent accounts and payment fraud at industrial scale. That's a security and fraud story on its own terms, whatever you think about the deeper hypocrisy debate.

Kate

Next, Marcus. Google just took agents that can actually operate a computer and put them in its cheap model. Gemini 3.5 Flash.

Marcus

This is a distribution play, Kate. Computer use — letting an AI see your screen, click, type, scroll, across browsers, mobile, and desktop — used to be a standalone, premium capability. Google's now baked it natively into Flash, which is its fast, cheap, high-volume workhorse. They're pitching long-horizon enterprise stuff: continuous software testing, knowledge work across professional apps. Safeguards include user confirmation before sensitive actions and automatic stops when it detects a prompt injection attempt.

Kate

So the story is making it affordable, not making it smarter.

Marcus

Exactly the right framing, Kate. Capability-per-dollar is the headline; raw capability is not. Google's own benchmark chart shows Flash trailing Opus 4.8 and GPT-5.5. And testers found rough edges — one Hacker News user reported Flash ran git reset hard-reset when they only asked it to commit, which, if you're a developer, is the kind of thing that makes you wince. So: cheap agentic computer control at scale is genuinely a big deal for adoption. Just don't hand it the keys to anything you can't afford to lose yet.

Kate

Staying with agents in the tools we already use, Marcus — Anthropic launched something called Claude Tag. Claude as a Slack teammate.

Marcus

And this is the more important shift than it first looks, Kate. As of June twenty-third, Claude Tag replaces Anthropic's old Slack integration with a persistent AI teammate. You at-mention Claude in a channel, it reads the surrounding conversation to build context, breaks the task into stages, works asynchronously, and schedules its own follow-ups. It's multiplayer — one shared Claude per channel, not a private one each — runs on Opus 4.8, and it's in beta for Enterprise and Team customers with admin controls for spend caps and logging.

Kate

Give me the stat that sells it.

Marcus

Anthropic says sixty-five percent of its own product team's code now runs through an in-house version of this, Kate. Take that with the usual grain of salt — a company quoting its own internal adoption. But the strategic point stands on its own. The competitive front is moving from best chatbot in a browser tab to whose agent lives inside the tools teams already use all day. An AI that holds context and acts on its own inside Slack is far stickier — and frankly far harder to switch away from — than a chatbot you have to go visit. The old Slack app retires August third.

Kate

Quick one, Marcus, because it's the unglamorous plumbing that matters. Mistral released OCR 4, and it just reset the price of document AI.

Marcus

The plumbing is exactly the word, Kate. OCR — turning PDFs, tables, and handwriting into structured text — is the step that feeds basically every enterprise AI pipeline. Mistral's OCR 4 reads documents with bounding boxes and per-block confidence scores, spans a hundred-seventy languages, and runs at four dollars per thousand pages, two if you batch it. They claim it matches the pricier agentic parsers at roughly eight times lower cost and seventeen times lower latency, with real strength in low-resource languages.

Kate

So who feels that?

Marcus

Specialized OCR vendors, Kate. When the price floor drops this far, multilingual document parsing stops being a line item you budget for and becomes an afterthought. That's the whole effect — making a previously fiddly, expensive step cheap enough to ignore.

Kate

Last hit, Marcus, and it's a peek at something not officially announced. A new OpenAI voice model — Bidi 1 — surfacing inside ChatGPT.

Marcus

Right, the code and UI showed up, some users are already seeing it, no formal announcement yet, Kate. Bidi is short for bidirectional, and that's the whole idea — the assistant can speak and listen at the same time. It can absorb a mid-sentence interruption, give a quick acknowledgment, and hold the thread of a whole conversation instead of dropping context the moment you cut in. Today's voice mode is essentially a walkie-talkie — you talk, it talks, one at a time. This is aiming at an actual conversation. It reportedly can even sing and beatbox, with tight copyright limits.

Kate

And voice is a genuinely contested space right now.

Marcus

Tightly contested, Kate. The Artificial Analysis voice leaderboard has OpenAI's current real-time model just barely ahead of xAI's Grok Voice — seventy-seven to seventy-six — with the top three within five points of each other. So nobody's running away with it. Real-time, interruptible voice is one of the most fought-over frontiers in AI at the moment, and treat Bidi as a strong signal of where OpenAI's pushing, not a shipped product yet.

Kate

One to watch tomorrow, Marcus.

Marcus

The Nvidia-alternative pincer, Kate. OpenAI's chip and Qualcomm's software buy landed the same day, both aimed at loosening Nvidia's grip — one on silicon, one on the CUDA moat. Watch for how Nvidia responds, and whether more labs start announcing their own custom inference chips.

Kate

Agree, or counter?

Marcus

Agree it's the one to watch — with my standing counter, Kate. Custom chips are easy to announce and hard to ship at volume. Anyone can put out a press release. Watch the deployment numbers at the end of the year, not the launch-day slide.

Kate

That's your AI in 15 for today. See you tomorrow.