← Home AI in 15

AI in 15 — June 03, 2026

June 3, 2026 · 21m 37s
Kate

Ninety-seven percent on the toughest math benchmark in the world. Trained without a single OpenAI token. Released by the company that owns fourteen billion dollars of OpenAI stock. The decoupling is no longer a rumor.

Kate

Welcome to AI in 15 for Wednesday, June third, 2026. I'm Kate, your host.

Marcus

And I'm Marcus, your co-host.

Kate

Big day, Marcus. Microsoft Build delivered exactly the announcement we previewed Monday — MAI-Thinking-1 and MAI-Code-1-Flash, the first major in-house Microsoft models trained without OpenAI data. Anthropic expands Project Glasswing to roughly two hundred critical-infrastructure organizations. Trump signs a watered-down AI executive order. Microsoft launches Scout, an autonomous agent built on OpenClaw. Anthropic files for IPO and Michael Burry calls it a bubble. Stanford Law says AI tutors beat sixteen law professors in blind ratings. GitHub Copilot App enters technical preview. Adafruit gets a legal demand letter from an AI startup. And a one-click VS Code token theft chain gets disclosed.

Kate

Microsoft finally cuts the cord.

Kate

Anthropic becomes a security utility.

Kate

And Michael Burry shorts the AI trade.

Kate

Lead story, Marcus. Walk me through MAI-Thinking-1.

Marcus

This is the big strategic move, Kate. At Build yesterday, Microsoft unveiled MAI-Thinking-1 — its first in-house reasoning model. Sparse mixture-of-experts architecture, thirty-five billion active parameters out of roughly one trillion total, with a two-hundred-fifty-six-thousand-token context window. Microsoft claims ninety-seven percent on AIME 2025 and ninety-four-and-a-half percent on AIME 2026 — those are the toughest mathematical reasoning benchmarks. It matches Claude Opus 4.6 on SWE-Bench Pro. And in Surge's blind human evaluations across twelve hundred and seventy-six tasks, raters preferred MAI-Thinking-1 over Claude Sonnet 4.6.

Kate

And the coding model.

Marcus

MAI-Code-1-Flash, Kate. Smaller — one hundred thirty-seven billion total parameters, only five billion active. Rolling out today to GitHub Copilot individual users in VS Code. Microsoft claims it beats Claude Haiku 4.5 on price-performance. Both models launch on Microsoft Foundry, with Thinking-1 in private preview.

Kate

The part that really tells the story, Marcus, is the training data language.

Marcus

Exactly, Kate. Microsoft went out of its way to say these models were trained on, quote, clean, appropriately licensed data, with AI-generated content excluded from pre-training, and built without distillation from third-party models. That last phrase is a sharp elbow aimed directly at OpenAI. Microsoft is signaling — these are ours, top to bottom, no shared lineage. The Mustafa Suleyman-led Microsoft AI Superintelligence team gets the credit, which also matters internally.

Kate

So is this best-in-class?

Marcus

No, Kate, and that's the honest part. Hacker News commenters were quick to note that DeepSeek V3.2 and Kimi K2.6 score similarly with fewer parameters. The benchmarks aren't yet leaderboard-toppers. But the strategic message dwarfs the leaderboard position. Microsoft has spent roughly fourteen billion dollars in OpenAI, still owns a stake, and is now actively decoupling — own models, own GPU infrastructure, own training data. Combined with Anthropic's IPO filing this week, the OpenAI-Microsoft-Anthropic triangle that defined 2024 and 2025 is fragmenting into open multi-pole competition. The pro-Western libertarian read — this is exactly the outcome anyone worried about a Microsoft-OpenAI duopoly should have wanted. Real competition between Western labs, not a single dominant platform. The uncomfortable read for OpenAI — your biggest customer just demonstrated it can build a credible reasoning model without you. That changes every contract renegotiation from here forward.

Kate

Quick hits. Marcus, Project Glasswing.

Marcus

Big expansion, Kate. Anthropic announced yesterday that Project Glasswing — the private rollout of its cyber-capable Claude Mythos Preview model — is going from roughly fifty founding partners to about two hundred organizations across fifteen-plus countries. The original cohort was the obvious names — Apple, Nvidia, Microsoft, CrowdStrike, Palo Alto Networks. The new sectors are the interesting part. Power utilities, water, healthcare, communications, hardware vendors. Anthropic says the initial cohort has already surfaced more than ten thousand high- or critical-severity vulnerabilities, and that for many of these partners, quote, a major attack could affect more than one hundred million people.

Kate

And Mythos itself?

Marcus

Worth pausing on, Kate. Mythos is a general-purpose frontier model whose cybersecurity capabilities emerged as a downstream consequence of better code understanding — not a specialized security model. In controlled tests it has written a remote-code-execution exploit for FreeBSD's NFS server chaining six RPC calls, and found zero-days in every major OS and browser. Anthropic is deliberately not releasing it publicly, citing the offense-defense imbalance.

Kate

So two stories at once.

Marcus

Right, Kate. First, the defensive case — Anthropic is saying, this model is too dangerous to ship, but we'll let trusted insiders use it to fix the world's plumbing. Second, this turns Anthropic into a quasi-utility for Western critical-infrastructure security, which is a powerful moat as it heads toward October's IPO. Hacker News commenters also noted, less charitably, that this conveniently masks the fact Anthropic likely cannot serve Mythos at scale yet. Both readings can be true.

Kate

Marcus, Trump's executive order.

Marcus

Watered down, Kate. After weeks of internal reversals, Trump signed an EO yesterday asking — but not requiring — AI companies to submit their most powerful new models to federal reviewers up to thirty days before public release. The original draft had a ninety-day window and was supposed to ship last month. White House pulled it after pressure from AI industry leaders worried about hampering US competitiveness against China. The final order explicitly bars the government from creating any mandatory licensing or pre-clearance regime.

Kate

What else is in it?

Marcus

Three notable pieces, Kate. Treasury, Defense, NSA, and DHS are directed to stand up an AI cybersecurity clearinghouse — sharing vulnerability info with model developers and critical-infrastructure operators. Federal agencies are asked to develop cyber-capability benchmarks for frontier models. And the Justice Department is told to prioritize criminal prosecution of people who use AI to hack computer systems.

Kate

The political read.

Marcus

Clear departure from Trump's earlier laissez-faire stance, Kate, but still less than Biden's rescinded EO required. Politico's reporting suggests Sam Altman's lobbying shaped the timeline. The skeptics' line — voluntary today becomes mandatory next year if a major incident happens. Industry's line — thirty days is workable but it's still regulatory creep. The criminal-AI provisions are largely redundant since using AI to commit a crime is already a crime, but politically they let the administration claim it's doing something. The libertarian read — voluntary is the right ceiling for now. The honest read — every AI safety incident from this point forward is a vote for mandatory review.

Kate

Microsoft Scout, Marcus. This is the other Build announcement.

Marcus

And it's the architectural one, Kate. Scout is what Microsoft is calling an autopilot — an always-on autonomous agent built on the OpenClaw framework. Unlike Copilot, which lives inside individual Microsoft 365 apps and reacts to prompts, Scout is persistent. Runs across cloud, desktop, and web with its own named identity. Integrates Teams, Outlook, OneDrive, SharePoint, calendar. Takes action without being asked. Comes with a built-in policy conformance system that produces an audit trail for every action, and operates under its own Entra identity — every action attributable.

Kate

The OpenClaw backstory, Marcus.

Marcus

Genuinely strange one, Kate. OpenClaw was an open-source personal-agent project that hit nine thousand GitHub stars in twenty-four hours, blew past two hundred fourteen thousand stars by February, and whose founder — Peter Steinberger, originally of PSPDFKit — was hired by OpenAI. Microsoft has now wrapped Microsoft 365 governance around the same agent architecture and brought it inside the enterprise. VP Omar Shahine said the agent, quote, becomes more capable, better understanding you, gaining more agency and exercising judgments.

Kate

So Microsoft monetized OpenAI's hire's old project.

Marcus

Exactly, Kate. And undercut it for enterprises that need audit trails. This is the clearest example yet of what we've been calling the post-model shift — runtime, identity, and governance are now where competitive differentiation lives. Expect Google Workspace and Apple to ship counterparts within months. Scout requires a GitHub Copilot subscription, which conveniently funnels enterprises straight into the Microsoft agent stack.

Kate

Anthropic IPO, Marcus. Michael Burry weighs in.

Marcus

We trailed this Friday and Saturday, Kate, but now we have the filing. Anthropic submitted a confidential S-1 on June first, targeting an October public listing. Four days after closing a sixty-five billion dollar Series H at a nine hundred sixty-five billion dollar post-money valuation. Revenue run-rate hit forty-seven billion in May, up from roughly ten billion a year prior. The company told investors it expects its first profitable quarter this June, with about ten-point-nine billion in Q2 revenue.

Kate

And Burry.

Marcus

The Big Short investor wrote that, quote, nothing in the SpaceX S-1 suggests it is worth one trillion let alone two, and for Anthropic, quote, there is no guarantee, and not even a strong likelihood, that Anthropic is long-term worth anywhere near one trillion. His core argument — AI model training is far too expensive, too much brute force, and compute will eventually be commoditized like internet bandwidth was. He's expanded short positions against Palantir, Nvidia, Oracle, the SOXX semiconductor ETF, and QQQ.

Kate

The counter, Marcus.

Marcus

Anthropic actually has revenue at a scale no dot-com 1.0 company ever did, Kate. Forty-seven billion run rate is not a 1999 eyeballs story. The bears say tokens are commodity hardware in disguise. The bulls say Anthropic's enterprise contracts and government work — see Project Glasswing — create durable revenue moats. Three pieces of AI financial history are converging in one month — Anthropic's S-1, SpaceX expected at one-point-seven-five to two trillion, and OpenAI's anticipated public debut. The Burry-versus-bulls debate will define the AI public-markets narrative through the fall. The libertarian read — public markets are exactly where this debate belongs. Let the prospectus be tested by quarterly earnings, not private-round optimism.

Kate

Marcus, the Stanford Law study. This one I find unsettling.

Marcus

Should, Kate. Professor Julian Nyarko led a study where sixteen contracts professors from fourteen US law schools authored forty representative student questions, then rated twenty-nine hundred and eighteen anonymized pairwise comparisons between LLM-generated answers and answers written by their peer professors. The LLMs won seventy-five-point-three-three percent of comparisons on average. Professors flagged LLM answers as pedagogically harmful in only three-point-five-three percent of cases versus twelve-point-zero-six percent for peer-written answers.

Kate

Caveats.

Marcus

Honest ones, Kate. The study tests LLMs as tutors, not practicing lawyers. Sixteen professors is a small sample. There's high inter-rater variance. But the result is still uncomfortable — experts could not reliably tell their own colleagues' work apart from the AI's, and when they could, they often preferred the AI. This isn't about replacing lawyers. It's about a specific pedagogical task where the cost-quality curve has clearly inverted. If AI is a better-than-average tutor in one of the more rigorous graduate disciplines, the financial case for charging seventy thousand dollars a year for foundational law instruction gets thinner each model release. And the deeper question — what does expertise even mean when AI consistently matches the median expert — gets harder to dodge.

Kate

GitHub Copilot App, Marcus.

Marcus

Quick one, Kate. GitHub yesterday expanded the technical preview of the Copilot App — a standalone desktop client for Windows, macOS, and Linux — to all Copilot Pro, Pro Plus, Business, and Enterprise customers. Key features. A unified My Work view spanning issues, PRs, and sessions. Parallel agent sessions, each running on its own git worktree and branch. A Canvases feature giving agent work a place to, quote, take shape, become visible, and get verified. And Agent Merge, which iterates against your team's review comments and failing checks until it can merge. A new Copilot Max plan is being introduced for Pro Plus customers.

Kate

The pattern.

Marcus

Two things, Kate. First, git worktrees are now the foundational primitive of parallel-agent coding. Every serious tool — Claude Code, Cursor, now Copilot App — has converged on the same pattern. Second, GitHub is finally treating coding agents as a first-class workspace primitive rather than an in-editor sidekick — the same architectural move Microsoft is making with Scout. Hacker News reaction was mixed. People liked the features, immediately noticed the new pricing tier and rapid quota consumption versus the old Copilot. The token-cost discipline story we covered Sunday continues to bite.

Kate

Adafruit and Flux.ai, Marcus.

Marcus

Small but symbolic, Kate. Adafruit — the maker electronics retailer — disclosed yesterday that on May twenty-second it received a demand letter from Jonathan Lenzner, a former FBI chief of staff now at Fenwick West, representing Flux.ai, an AI-powered PCB design tool recently funded by Bain Capital. The letter demanded Adafruit refrain from publishing an article allegedly containing, quote, false and potentially defamatory claims about Flux's intellectual property, commercial traction, and user base. It invoked the Computer Fraud and Abuse Act.

Kate

Adafruit's side.

Marcus

They say they accessed only information Flux's own servers exposed via a misconfiguration, and that their planned reporting was responsible disclosure of a security concern, Kate. They've paused their blog while consulting counsel. Limor Fried, the founder, posted on Hacker News that she reached out to Flux's CEO directly hoping to resolve it. Multiple electrical engineers in the thread posted scathing reviews of Flux's actual product, calling it a, quote, token grinder.

Kate

The lesson.

Marcus

Streisand effect is now in motion, Kate. The demand letter is the story. Whether or not Adafruit publishes, the pattern — well-funded AI startup, legal threat, public backlash — is worth watching as the bubble inflates. Founders learning the wrong lessons about how to manage critical press is its own kind of canary.

Kate

Last one, Marcus. The VS Code vulnerability.

Marcus

Real one, Kate. Security researcher Ammar Askar published a detailed write-up yesterday of a one-click remote-code-execution chain in VS Code that can be triggered from an untrusted workspace. The key vulnerability — github.dev posts an OAuth token allowing GitHub interaction on behalf of the signed-in user, and that token isn't scoped to the current repo. Once stolen, it gives access to every other repo the user can access. The chain bypasses VS Code's new publisher trust system by abusing local workspace extensions, which have no publisher trust requirement.

Kate

And the context.

Marcus

Brutal, Kate. This lands in a security environment already shaken by the May eighteenth Nx Console extension breach, where a malicious version live on the marketplace for just eighteen minutes harvested credentials from 1Password vaults, Anthropic Claude Code configs, npm, GitHub, and AWS — leading to compromise of roughly thirty-eight hundred GitHub-internal repos by a hacking group called TeamPCP. Editors and AI coding tools are becoming the soft underbelly of developer security. Almost every credential a developer touches passes through their editor's process, and most extensions run with that process's full privileges. As AI agents start writing code with these same credentials, the blast radius of a single compromised extension is now enormous. Combine this with Monday's ChatGPT-for-Sheets vulnerability — same category of failure, two first-party tools in two weeks.

Kate

Big picture, Marcus.

Marcus

One theme, Kate. The contest has demonstrably moved beyond the model. Microsoft's MAI launch and Scout, GitHub's Copilot App, Anthropic's Glasswing expansion — none of those are about who has the best model on a benchmark. They're about who controls runtime, identity, governance, and distribution. Even Burry's bear case fits — if model compute commoditizes, the value migrates to the orchestration and trust layer. The libertarian read — this is genuine multi-pole Western competition emerging. Microsoft decoupling from OpenAI, Anthropic building a security utility moat, GitHub making agents a first-class workspace, all without a single government mandate forcing it. The uncomfortable read — the VS Code vulnerability and the Nx Console breach are reminding us that the trust layer is exactly where the security holes live now. Capability is racing ahead, but the identity and audit-trail story Microsoft is selling with Scout exists precisely because the prior generation of tooling cannot be trusted around credentials. That's the gap. Whoever closes it owns the enterprise.

Kate

That's your AI in 15 for today. See you tomorrow.