#063 Microsoft cut Claude Code internally then shipped a rival, GitHub gives agents their own app

Microsoft reportedly pulled Claude Code licenses from its own engineers and moved them to Copilot CLI. Then it shipped MAI-Code-1-Flash, its first in-house coding model, with a benchmark chart waving a comfortable lead. The model it picked to beat was Claude Haiku 4.5, Anthropic’s smallest and weakest.

Hacker News spotted the baseline within the hour. The story isn’t the score. It’s that a $3T company is so done leasing intelligence from a partner that it benched its replacement against the easiest opponent it could find.

In today’s indie hacker news:

🤖 Microsoft cut Claude Code internally, then shipped a rival
🧩 GitHub gives every coding agent its own git worktree
💸 A $12B security startup calls its own leaked numbers wrong
🖥️ One builder swapped Claude for local Qwen. 12% broke
📧 After 16 years, a dev quit Gmail over forced AI

TOP STORIES

CUT THE CORD, CLONED THE PLUG

🤖 Microsoft shipped MAI-Code-1-Flash, its first in-house coding model, and benched it against Claude’s weakest

Microsoft shipped MAI-Code-1-Flash, its first in-house coding model, and benched it against Claude's weakest

The story: Microsoft announced MAI-Code-1-Flash, the speed-and-cost tier of a new seven-model MAI family it says it trained end-to-end, not distilled from OpenAI. It’s already in the GitHub Copilot model picker for every tier, Free included, with third-party access coming to OpenRouter and Fireworks. Mustafa Suleyman, who runs Microsoft AI, told GeekWire it’s “about long term self-sufficiency,” delivered “with half the GPUs of the state-of-the-art competition” and priced “to be the cheapest of any of the hyperscalers.” Self-sufficiency is the whole point: the license cut and the launch landed in the same window, and one reads as the motive for the other.

The details:

Microsoft claims 51.2% on SWE-Bench Verified and a 16-point edge over Claude Haiku 4.5 on SWE-Bench Pro. Every number is its own. No independent reproduction exists as of today.
Hacker News flagged the choice of Haiku, Anthropic’s weakest model, as a soft target, and noted Qwen3 and DeepSeek V3 match these scores at similar or lower cost.
The headline efficiency claim is 60% fewer tokens than Haiku on harder tasks, which under Copilot’s new usage-based billing is where it would actually save you credits.
Per-token API pricing was not published at launch, so “cheapest hyperscaler” is a promise, not a price you can check.
Simon Willison, reading the paper, pegs it at 137B total / 5B active parameters and notes the “clean, licensed data” pitch sits on ~1.2 trillion crawled pages, Common Crawl included.

Why builders care: A frontier-lab-independent coding model, live in a tool you may already pay for, is worth a slot in your eval rotation, especially if the token-efficiency claim holds on your own repos. Just don’t trust the chart. Bench it head-to-head against Qwen3 or DeepSeek V3 on your actual tasks before you route production loops through it, because the only number Microsoft let you verify is the one in the model picker.

ONE BRANCH PER ROBOT

🧩 GitHub shipped a standalone Copilot desktop app that gives every agent its own git worktree

GitHub shipped a standalone Copilot desktop app that gives every agent its own git worktree

The story: GitHub’s new Copilot app is a separate desktop client, not the VS Code extension, built to make agents a first-class surface instead of an editor sidebar. The piece that makes it more than a wrapper: each agent session runs in its own isolated git worktree, so you can fire several at once and they never clobber each other’s branch. A “My Work” dashboard collects every running agent, PR, and issue across your repos in one pane. Mario Rodriguez, GitHub’s product chief, framed it as “one system” where “agents can do more of the work, while developers keep control.”

The details:

It runs on Windows, macOS, and Linux, in technical preview, and as of yesterday it opened to every paid tier (Pro, Pro+, Business, Enterprise). Free users get a waitlist; there’s no GA date.
Plan mode makes the agent propose its steps and wait for your approval before it touches anything.
Agent Merge is a background worker for the whole PR tail: it watches CI, chases reviewers, fixes failing checks, and merges at the automation level you set.
Cloud sessions keep agents running with your laptop closed, and configurable MCP servers wire them into your issue tracker, test runner, or design tools.
There’s no new price tag. It ships inside existing plans, alongside the usage-based billing that went live two days earlier and is already annoying cost-watchers.

Why builders care: GitHub owns the repo, CI, and review pipeline, so making agents live there closes the loop from “agent writes code” to “code merges” without a context switch Cursor or Claude Code can’t match on home turf. HN’s read was sourer, calling it features over stability and “ripping off Cursor and Codex.” Whether platform gravity beats raw agent quality is the bet, and parallel worktrees plus close-the-laptop merges is the most concrete reason yet to find out.

THE NUMBER NOBODY WILL CONFIRM

💸 Cyera is reportedly raising at a $12B valuation, roughly 80x revenue, and disputes its own leaked figures

Cyera is reportedly raising at a $12B valuation, roughly 80x revenue, and disputes its own leaked figures

The story: TechCrunch reported that data-security startup Cyera is finalizing a ~$300M round led by Evolution Equity Partners, while spending faster than it earns. That price tag works out to roughly 80 times a reported $150M+ ARR. Here’s the wrinkle: Cyera’s own spokesperson called the cited figures “factually and significantly inaccurate,” and then declined to give a single corrected number. The round isn’t closed. So the most-quoted valuation of the week is a leak the company is publicly disowning without saying which part is wrong.

The details:

The ARR figure and the 80x multiple are TechCrunch’s calculation, not numbers Cyera confirmed. The denial covers the math, not the existence of the raise.
What’s solid is the trajectory: $3B in late 2024, $6B in mid-2025, $9B in January with Blackstone, and now $12B reported. That’s a 4x markup in 18 months.
It would be Cyera’s fifth raise in that span and push total capital past $2B, all while it added ~500 people in 2026 and runs at a loss.
For scale, AI fundraising medians sit near 25-30x revenue and public cybersecurity peers trade under 10x. 80x is its own weather system.
Cyera sells DSPM, software that scans cloud and SaaS to find where sensitive data lives, the category that exploded once enterprises started feeding everything to AI.

Why builders care: This is the AI-valuation split in one line: an unprofitable company at 80x while a profitable SaaS would fetch single digits. The same froth lifting Cyera leaves the small, boring data-security niches wide open for bootstrappers to charge real money the giants can’t be bothered to chase. And the spokesperson half-denial is the founder lesson: leak-driven fundraising buys you a headline and a reputational cleanup in the same news cycle.

A VIABLE BRAIN, A SHAKY PAIR OF HANDS

🖥️ A builder ran local Qwen3.6-27B against Claude for two weeks; the thread blamed his setup, not the model

A builder ran local Qwen3.6-27B against Claude for two weeks; the thread blamed his setup, not the model

The story: A developer posted on r/LocalLLaMA that he swapped Claude for a local Qwen3.6-27B in his own multi-agent orchestrator for two weeks, running it at Q6_K on a single RTX 3090 across 47 real coding tasks. The result reads grim at first: the local model botched tool-call formatting 12% of the time against Claude’s 0.5%, hallucinating field names and tool signatures. Then the comments turned it around. The top replies pinned the failures on his stack, not Qwen, and the builder’s own verdict landed softer than his title: a viable reasoning layer, not an execution one.

The details:

He capped context at 32k. Qwen3.6-27B ships with a 262k window, and commenters argued that alone explains the drift he saw past ~14k tokens.
He ran inference through Ollama. The thread’s recurring complaint was that Ollama mishandles KV-cache quantization and that llama.cpp direct fixes most of these tool-call breaks.
The model itself is no toy: it scores ~77.2% on SWE-Bench Verified, Sonnet-class, Apache-2.0, and fits one 24GB card.
Worth flagging: the orchestrator under test is OpenYabby, which the poster built himself, so read the numbers as a self-report with a motive.

Why builders care: The honest takeaway isn’t “local can’t replace Claude yet,” it’s that the 12% gap may be config, not capability, and config is fixable. If you’re sizing up a local agent stack, the playbook from the thread is concrete: run llama.cpp not Ollama, give the model its full context window, enforce structured output at every tool-call boundary, and gate tool calls behind plan approval. Treat the local model as the planner and keep a tighter leash on the hands.

🧠 The agent tooling layer is the new open battleground - The fight moved past model weights to the plumbing under agents. H Company shipped Holo3.1, local computer-use models that hit ~79% on AndroidWorld and roughly halve step time with NVFP4 quantization. MiniMax detailed the sparse-attention architecture behind its M3 launch, claiming about a twentieth of the per-token compute at 1M context. And a busy r/LocalLLaMA memory thread showed most builders still hand-roll agent memory with SQLite and Markdown in git, not mem0 or Letta. One verdict: “RAG has failed, it blows out your context window or gives too many false positives.”

💬 “Most people are lying about their revenue” - A r/SideProject post put words to a mood: “Someone lands one $2k client and suddenly they’re at $2k MRR. The Stripe screenshot is real, the story around it usually isn’t.” It pairs with a popular r/microsaas case that a $10K-MRR solo business beats a $2M seed and the 18 months of pitch meetings that come with it. The anti-hype turn is real, and the best line cuts the flexing cold: “revenue without churn, refunds, and repeatability is mostly noise.”

🪧 “€174 on Reddit ads, 111,927 impressions, zero customers” - A founder ran the experiment so you don’t have to: three rounds of Reddit ads, 1,579 clicks, not one B2B customer, which he called “the cheapest, most useless traffic I’ve ever bought.” The thread consensus and a parallel free-to-paid post land on the same unglamorous fix: cold ad traffic doesn’t convert B2B, so stop buying it and go talk to the people already using the product.

DRAMA

SIXTEEN YEARS, ONE CONDESCENDING BUTTON

📧 A developer quit Gmail after 16 years because its forced AI felt like an insult

A blogger writing as JP walked away from Gmail to Fastmail, and the post topped Hacker News at ~700 points. The trigger wasn’t a feature gap. It was unsolicited Gemini summaries over his emails, auto-drafted replies he never asked for, and a “Help me write” button he couldn’t cleanly turn off, bundled so tightly with features he relied on that killing the AI meant losing the rest. His line: “the message you’re sending is that you think I’m not capable of reading and writing my own emails.” The HN comment that stuck: “if you didn’t bother writing it, I don’t need to bother reading it.” Optional AI gets tolerated; AI as the price of admission to your inbox is what makes more than a decade of goodwill evaporate in weeks.

STACK OF THE DAY

🗂️ Jolli AI

Local-first memory that follows your AI coding sessions across Claude Code, Codex, Gemini CLI, and OpenCode. It installs a git hook that writes a plain-Markdown summary of what got built and decided on every commit, stored in a git orphan branch plus a local folder, so context survives between sessions without a proprietary format or a cloud account. Apache-2.0, free to download, ~97 stars on GitHub, with VS Code and JetBrains extensions. If your agent forgets the architecture every time you reopen the repo, this is the gap it fills. No MCP server, just hooks and editor plugins.

Not sponsored. We just feature tools builders would actually use.

BOOKMARKED TODAY

⚖️ “AI outperforms law professors in Stanford Law study” - A Stanford Law study found AI beating law professors on certain tasks, a crisp data point for the “where does AI actually clear the expert bar” file. Worth a read before you decide which knowledge work is safely human.

💾 “Use your Nvidia GPU’s VRAM as swap space on Linux” - A neat hack that turns spare VRAM into Linux swap, 183 HN points. Pairs perfectly with today’s local-model story: if you’re squeezing a 27B onto one card, every gigabyte of headroom counts.

🔌 “Show HN: a way to find and install Claude skills” - A small directory and installer for Claude Code skills, on-brand utility for anyone living in the agentic-coding tools all over today’s edition. Low HN traction so far, but a useful bookmark if you’re assembling your own skill set.

Curated by AI, built by a human.

Microsoft cut Claude Code internally then shipped a rival, GitHub gives agents their own app

TOP STORIES

CUT THE CORD, CLONED THE PLUG

ONE BRANCH PER ROBOT

THE NUMBER NOBODY WILL CONFIRM

A VIABLE BRAIN, A SHAKY PAIR OF HANDS

TRENDING TODAY

DRAMA

SIXTEEN YEARS, ONE CONDESCENDING BUTTON

STACK OF THE DAY

🗂️ Jolli AI

BOOKMARKED TODAY

Get the daily indie hacker digest

You're in.