#004

Anthropic Leaked Its Secret 10-Trillion Parameter Model, Caveman-Speak Saves 65% on Tokens, and Gemma 4 Runs on Your Phone

Anthropic accidentally leaked Claude Mythos, a 10-trillion parameter model they weren't ready to announce. A Claude Code skill cuts tokens 65% by talking like a caveman. Google shipped Gemma 4 to iPhones. And a developer built in 3 months what he couldn't in 8 years, thanks to AI.

Anthropic accidentally leaked its most powerful model ever. A developer shipped an 8-year dream project in 3 months with AI. And you can now cut your Claude bill by 65% by talking like a caveman. Seriously. Here’s what happened.

In this edition:

  • Anthropic’s Claude Mythos leak: 10 trillion parameters
  • 8 years of wanting, 3 months of building with AI
  • Caveman: cut LLM tokens 65% with grug-speak
  • Gemma 4 runs locally on your iPhone
  • First Dollar: an electrical engineer in India ships MailMark

TOP STORIES

THE LEAK THAT SHOOK AI

Anthropic Claude Mythos leak

Anthropic accidentally revealed Claude Mythos, a 10-trillion parameter model it wasn’t ready to announce

A configuration error in Anthropic’s CMS made ~3,000 unpublished assets publicly accessible on March 26. Among them: specs for a model codenamed “Capybara.” The world now knows it as Claude Mythos.

It sits above the existing Opus tier. Anthropic privately warned government officials it could make large-scale cyberattacks “far more likely.” Days later, they leaked Claude Code’s source code too. Two security incidents in one week.

The details:

  • 10 trillion parameters, scoring far higher than any previous Anthropic model
  • Internal codename: Capybara
  • Already in trials with “early access customers”
  • Anthropic called it “a step change” in AI performance
  • Draft blog admitted it’s “currently far ahead of any other AI model in cyber capabilities”
  • Second leak (Claude Code source code) followed days later

Why builders care: The AI capabilities ceiling just jumped. Your underlying models are about to get much more capable. But even the companies building frontier AI can’t keep their own house locked down.


EIGHT YEARS OF WANTING, THREE MONTHS OF BUILDING

SyntaqLite built with AI

A developer shipped his dream project in 250 hours with Claude Code after failing to start for 8 years

Lalit Maganti wanted high-quality devtools for SQLite for eight years. He never built them. Then he spent 250 hours over three months (evenings, weekends, vacation days) and shipped SyntaqLite: a formatter, linter, and language server for SQLite.

The honest part: he threw away his first month of vibe-coded work and started over with proper architecture. AI was “better than me at the act of writing code itself, assuming that code is obvious.” But for design decisions? “A dangerous substitute.”

The details:

  • 250 hours over 3 months, built with Claude Code
  • Discarded first month’s vibe-coded prototype entirely
  • AI excelled at refactoring, research, lateral skill transfer
  • AI struggled with public API design, architectural coherence
  • 512 HN points, 161 comments
  • Top comment: “This is what real AI-assisted coding looks like once you get past the initial wow factor”

Why builders care: The most honest AI-building post-mortem on HN right now. AI is a force multiplier for implementation but a dangerous substitute for design. The architecture still has to come from you.


WHY USE MANY TOKEN WHEN FEW TOKEN DO TRICK

Caveman token compression

Caveman: a Claude Code skill that cuts 65% of tokens by stripping filler language

Julius Brussee built Caveman, a one-line install Claude Code skill that makes Claude respond in caveman-speak. It strips articles, pleasantries, hedging phrases, and filler while keeping technical terms, code blocks, and error messages intact.

A 69-token response becomes 19 tokens. Same technical content. A March 2026 paper found that brevity constraints actually improve model accuracy by 26 percentage points on certain benchmarks. Less fluff, better answers.

The details:

  • Average 65% token savings across benchmarks (range: 22% to 87%)
  • Responses generate ~3x faster
  • 1,400+ GitHub stars, 32 forks
  • Install: npx skills add JuliusBrussee/caveman
  • 634 HN points, 298 comments
  • Spawned multiple derivative projects (caveman-compression, Caveman-Claude)

Why builders care: If you’re burning $200/month on Claude Code, this could drop it to $70. The Kevin from The Office reference in the repo name is earned. Sometimes the best optimization is just saying less.


GOOGLE’S AI NOW RUNS ON YOUR PHONE

Gemma 4 on iPhone

Gemma 4 E2B and E4B: agentic AI models that run entirely on-device, no internet needed

Google shipped Gemma 4 with edge-optimized models designed for phones. The E2B model (2 billion effective parameters) runs in under 1.5GB of memory. The E4B model handles multi-step planning, code generation, audio-visual processing, and 140+ languages. All offline. All on your phone.

Download the AI Edge Gallery app on iOS or Android and you’re running local AI in minutes.

The details:

  • E2B and E4B models, optimized for mobile
  • E2B runs in under 1.5GB with 2-bit quantization
  • Supports tool-calling, multi-step planning, and vision
  • 140+ languages, Apache 2.0 license
  • Runs on Raspberry Pi 5: 133 prefill tokens/sec
  • Available via Google AI Edge Gallery app (iOS + Android)
  • 253 HN points, 69 comments

Why builders care: Local AI on phones is no longer a demo. If you’re building mobile apps, you now have a production-grade local model that’s free and open-source. Offline translation, on-device agents, privacy-first features are all possible now.


“The threat is comfortable drift toward not understanding what you’re doing.” An astrophysicist wrote the essay HN can’t stop debating (769 points, 505 comments). AI tools are only useful when deployed by experts. Novices look productive while building dangerous knowledge gaps. His Alice vs. Bob framework (careful learner vs. AI-dependent producer) hit a nerve. The question: will companies value Alices when Bobs ship faster? Read it here.

OpenAI bought a podcast. TBPN, a daily tech talk show with 58K YouTube subscribers, acquired by OpenAI for a reported low hundreds of millions. $5M ad revenue in 2025, on track for $30M in 2026. Editorial independence “explicitly protected.” CNBC called it “chasing vibes.”

Anthropic leaked twice in one week. First the Claude Mythos specs. Then Claude Code’s full source code. No customer data was exposed, but the optics of an AI safety company with back-to-back security incidents aren’t great. Fortune, CoinDesk, and Futurism all ran stories.


FIRST DOLLAR

FROM POWER PLANT TO PRODUCT HUNT

Debasish is an electrical engineer working at a power plant in India. Not a software developer by profession. He’s been building side projects since 2015, starting with an Android app built in Java before Kotlin existed. His latest: MailMark, a cold email tool where you own your domain and mailboxes. Add your domain, create unlimited mailboxes, run campaigns with built-in mail merge and automated follow-ups. Posted to Show HN today. No funding, no network, just building after work hours at a power plant.


STACK OF THE DAY

sllm - Split a GPU node with other developers. Join a cohort, get an API key, share the cost. OpenAI-compatible API (runs vLLM under the hood). Prompts and responses are never logged. Traffic is routed through an isolated proxy with strict data separation. If you want access to large models but can’t justify a full GPU, this is the cooperative approach. 177 HN points, 87 comments.

Not sponsored. We just feature tools builders would actually use.


BOOKMARKED TODAY

📖 Mvidia: A game where you build a GPU (889 HN points, 177 comments) - Browser-based game where you design and build a GPU from scratch. The HN comments are a masterclass in GPU architecture. Best educational side project of the week.

📖 Running Gemma 4 locally with LM Studio and Claude Code (112 HN points) - Step-by-step guide to running Gemma 4 on your machine using LM Studio’s new headless CLI, then connecting it to Claude Code. Free local AI as your coding copilot.

📖 Nanocode: The best Claude Code that $200 can buy, in pure JAX on TPUs (124 HN points) - Someone rebuilt Claude Code’s core functionality in pure JAX, optimized for TPUs. $200 budget. Open source. If you want to understand how coding agents work under the hood, start here.


Curated by AI, built by a human. Get this daily: indiehacker.news | X | Telegram