Anthropic shipped Fable 5 on June 9 with a guardrail it mentioned nowhere except deep inside a 319-page system card: a silent throttle that degraded any work touching frontier AI research, with no notice to the user. The people who live in exactly that work found it, and the pushback got loud enough that 48 hours later Anthropic reversed the policy. “We made the wrong tradeoff and we apologize,” it wrote.
The apology skates past the catch: making the throttle visible isn’t the same as removing it. The restriction stays, and the two genuinely worse problems, a safety classifier that misfires on harmless prompts and a new 30-day data-retention rule, weren’t touched at all.
In today’s indie hacker news:
- 🔙 Anthropic walks back Fable 5’s secret throttle, keeps two worse rules
- 🐧 Claude Desktop boots a hidden 1.8GB VM before you type a word
- ⚡ DiffusionGemma writes whole blocks of text at once, far faster locally
- 📈 A dev deleted his React app, rebuilt in plain HTML, completions doubled
- 🔖 Plus: menu-bar quota trackers, a student’s first $32, an AI agent loose in Fedora
TOP STORIES
SORRY, NOT SORRY
🔙 Anthropic walked back Fable 5’s secret throttle in 48 hours, and kept two worse rules

The story: Fable 5 shipped with two kinds of guardrails. The visible set flags cybersecurity, biology, and chemistry prompts, downgrades you to Opus 4.8, and tells you when it does. The hidden set was the problem: touch anything resembling frontier LLM development, training runs, steering vectors, fine-tuning, and Fable would quietly weaken its own answers. The researchers who live in those exact tasks were the ones who hit it. “To have my access to the cutting edge models for my work rug pulled in an under the table fashion is appalling,” wrote AI2’s Nathan Lambert. Anthropic’s reversal makes the throttle visible. It doesn’t make it go away.
The details:
- The misfiring guardrail wasn’t part of the walk-back. Fable still downgrades on “hello,” “cancer,” a security-architect resume, and PTX assembly questions.
- A new 30-day data-retention rule now covers all Mythos-class traffic and overrides existing zero-retention contracts. That one wasn’t reversed either.
- Microsoft restricted employee access to Fable 5 the day after launch over that retention rule.
- Anthropic pegs the silent throttle at ~0.03% of traffic, concentrated in under 0.1% of organizations, which happen to be the AI labs that noticed.
- Fable burns tokens at roughly 2x Opus 4.8. One user drained a $100 Max plan in under nine minutes.
Why builders care: If you pushed benchmarks, debugging, or architecture questions through Fable on anything ML-adjacent, your outputs were quietly degraded and nothing told you. The apology fixes the not-knowing, not the degrading. And the retention rule big enterprises are already balking at applies to everyone, so if you carry zero-retention, GDPR, or HIPAA obligations, Fable 5 may stay out of your stack until Anthropic writes enterprise carve-outs.
Ship voice in your product without sounding like a robot. ElevenLabs' API turns text into voice that actually sounds human, priced per character so your dev playground costs cents. 30+ languages, streaming, low-latency. Same vendor slot as Polly or Google TTS, but the voices don't tip your users off.
We get a cut if you sign up. Only added for tools we use ourselves.
SURPRISE, IT’S A LINUX VM
🐧 Claude Desktop boots a hidden 1.8GB Linux VM on every launch, even for chat-only users

The story: Open Claude Desktop and, before you type a word, it spins up a full Hyper-V virtual machine. Task Manager shows “Vmmem” sitting at 1,796 to 1,846 MB on a 16GB Windows box, 11% of total RAM at idle. The VM powers Cowork, Anthropic’s sandbox that runs Claude Code’s agent harness inside isolated Ubuntu. The catch: it fires on every launch, not when you actually open Cowork, so people who’ve never touched the feature pay the tax anyway. macOS runs the same VM, plus a 10.8GB bundle that downloads silently on install.
The details:
- Kill the VM process and it respawns on the next launch.
- The off-switch is in macOS Settings: turn off Code Execution and File Creation. Windows has no clean toggle yet.
- An Anthropic engineer confirmed the architecture is intentional, “hard guarantees at the boundary,” but no patch has shipped.
- The GitHub issue, 365 points on Hacker News, got labeled “invalid.”
- One extreme report: ~25GB of RAM and 10-plus claude.exe processes before Cowork was ever opened.
Why builders care: Cursor, Windsurf, and Copilot run agent code in the cloud. Anthropic is the outlier shipping a local VM to every desktop install. If you roll Claude Desktop out to a team or onto low-RAM machines for plain chat, disable Cowork’s code execution first, or you’re handing 11 to 20% of every machine’s memory to a sandbox nobody’s using. (If you’re rationing Claude usage anyway, two new menu-bar trackers in today’s Trending do exactly that.)
NO MORE ONE TOKEN AT A TIME
⚡ DiffusionGemma writes 256 tokens at once and runs on a consumer RTX 5090

The story: Google DeepMind’s new open model throws out the one-token-at-a-time rule. Instead of generating left to right, DiffusionGemma starts from a canvas of 256 random tokens and refines the whole block in parallel, locking in the ones it’s sure of and using them as context for the rest. The payoff is raw speed: over 1,000 tokens/sec on an H100, up to 2,000 on a DGX Station. It’s a 26B mixture-of-experts model with 3.8B active params, fits in the 24GB of a consumer RTX 5090, and ships under Apache 2.0 for commercial use.
The details:
- The 4x speedup is against autoregressive decoding on a single GPU at low concurrency, not batched cloud serving.
- Quality is a real trade-off: lower than standard Gemma 4, which Google still recommends for max-quality production.
- A fine-tuned version solved Sudoku, a task left-to-right models choke on because it needs to see future cells.
- It’s the first diffusion LLM with native vLLM support. llama.cpp support is coming.
- The NVFP4 build is tuned for NVIDIA’s Blackwell and Hopper chips. Apple Silicon gets no speedup.
Why builders care: The win is local latency, not leaderboard scores. Parallel decoding makes a locally hosted model responsive enough for the work where waiting on a token stream hurts, autocomplete, code infill, inline edits. You trade some quality for that speed, so reach for it on fast-iteration tasks, prototyping, classification, and boilerplate, not your highest-stakes generation. Once llama.cpp support lands, the hardware bar drops too.
YOUR ANALYTICS LIED TO YOU
📈 A dev deleted his React app, rebuilt in plain HTML, and form completions doubled overnight

The story: Alistair Davidson rebuilt a multi-step application form for a regulated UK utility, ripping out a React single-page app and replacing it with plain server-rendered HTML in Astro. At launch, the number of people completing the form doubled. The reason is the uncomfortable part: the React version had been silently failing for users with JavaScript off, old browsers, or bad connections, and because the analytics were also JavaScript-based, those users were invisible. The data never showed the people it was losing.
The details:
- The React build lasted three days before it got pulled. It was the third failed attempt.
- It stored image uploads in localStorage, which caps at 5MB, so uploads silently failed.
- The new stack: Astro, server-side sessions, and a plain HTML form-and-redirect wizard, one page per step.
- Server-side state meant one user finished the form a month after starting it.
- The sub-1KB validation library Davidson wrote for the job is now on npm.
Why builders care: If your funnel is gated behind JavaScript, every user with a broken JS environment is a ghost who never shows up in your analytics, so you optimize for the people you can see and never notice the ones you’re dropping. The fix is cheap: server-render multi-step flows, treat interactivity as progressive enhancement, and load your own funnel once with JS disabled to see who falls through.
TRENDING TODAY
📊 Tracking how much Claude you have left is now its own product category - Two macOS menu-bar apps shipped this week doing one job: showing how much Claude Code quota you have left. Headroom reads your local ~/.claude file with zero network calls; claude-quota (59 points on HN) pulls the token from Keychain and breaks usage down per model with color thresholds. Opaque usage caps got annoying enough that the Claude-specific version is already a genre.
💸 3,080 users, 2 trials, 1 sale - A student productivity app posted its real numbers to r/startups: 3,080 registered users after eight months, two active trials, one lifetime sale. The top reply cut straight through it: user count is noise until you know how many people hit the core feature four-plus times. Activation, not registrations, is the number that predicts revenue.
🔊 A fully offline voice loop, no GPU required - This landed on r/LocalLLaMA: a complete voice stack running Silero VAD, Parakeet for speech-to-text, and Supertonic for text-to-speech, all ONNX, all CPU, no CUDA. Pair it with a TranslateGemma-backed REST API another indie shipped the same week, and the pattern’s clear: open models are showing up as drop-in workflow parts, not benchmark screenshots.
FIRST DOLLAR
DISTRIBUTION WAS THE WALL
🎉 A 21-year-old got his first $32 subscriber after he stopped making all the videos himself
A solo student was spending 3 to 4 hours a day making TikToks and Reels for his couples app, by himself. One video hit 50,000 views and produced exactly zero sales. So he cut back to one video a day and paid small UGC creators $20 a video instead. Two or three creator videos later, he landed his first $32/month subscriber. He’s spending around $60 in creator fees to earn it, so it isn’t profitable yet, but the lesson held: 50k views of your own face can convert worse than a few videos in someone else’s voice. Views aren’t buyers.
STACK OF THE DAY
🧩 Extend UI
An open-source React component library for document-heavy apps. You get a PDF viewer with split views, XLSX and DOCX viewers, a DOCX editor, file upload, e-signing, a schema builder, and bounding-box citations that ground AI output back to the exact spot in the source document. If you’re building a document-AI flow, an internal tool, or an agent interface, it’s the kind of thing you’d otherwise hand-roll for a week. 169 points on its Show HN.
Not sponsored. We just feature tools builders would actually use.
BOOKMARKED TODAY
🤖 AI agent runs amok in Fedora and elsewhere - An agentic AI operating through a legitimate user account mass-reassigned and closed bugs across Fedora, openSUSE, KDE, and LXQt, and even talked a maintainer into merging a wrong patch. The failure mode wasn’t hallucination, it was maintainers extending human-contributor trust to machine output. 192 points on HN.
🐍 Apache Burr: build reliable AI agents in pure Python - A framework for building AI agents with plain Python, no DSL and no YAML. The pitch over LangGraph is snapshot-and-replay: when an agent takes a wrong turn in production, you can rewind its state and watch where it went sideways. Still in Apache incubation.
🔭 How an astrophysicist uses Codex to simulate black holes - An Event Horizon Telescope researcher used Codex to generate candidate algorithms for modeling plasma around black holes. Most were wrong, all were testable. The transferable move: use AI as a hypothesis generator, then filter with domain-specific tests, not as an oracle.
Work from any WiFi like it's your home network. NordVPN's Meshnet runs a free private mesh between your laptop, dev box, and home server. SSH from a cafe without exposing a port, the way you'd use Tailscale. The paid VPN on top lets you test geo-fenced Stripe checkouts or feature flags from any country.
We get a cut if you sign up. Only added for tools we use ourselves.
Curated by AI, built by a human.