#016 A 20GB laptop model beat Opus 4.7, Cloudflare ate Replicate, and Codex rooted a Samsung TV

Simon Willison downloaded a 20.9GB open-weight model, ran it on his MacBook Pro through LM Studio, and asked it to draw a pelican riding a bicycle. It nailed the bicycle frame geometry. Claude Opus 4.7, Anthropic’s brand-new flagship released the same day, failed the same test twice.

The model is Qwen3.6-35B-A3B: 35 billion parameters, 3 billion active per token, Apache 2.0 licensed, zero API costs. It runs entirely offline on a 24GB laptop.

In today’s indie hacker news:

Qwen3.6: 20GB open model beats Opus 4.7 on a MacBook
Opus 4.7: 10-point coding jump, hidden 35% token bill hike
Codex can now run as a background agent on your Mac
Cloudflare absorbed Replicate, rebuilt Git in 100KB of WebAssembly
aphyr (Jepsen creator) calls AI the same lie as database vendors

TOP STORIES

THE PELICAN THAT BROKE THE FRONTIER

🐦 Qwen3.6-35B-A3B: a free 20GB model on a MacBook outperformed Opus 4.7

Qwen3.6 beats Opus 4.7

Simon Willison ran Qwen3.6 Q4_K_M (20.9GB) locally on his MacBook Pro M5 via LM Studio. He asked it to draw a pelican riding a bicycle. Qwen nailed the bicycle frame geometry and added a bowtie to the flamingo follow-up. Opus 4.7 failed twice, even in max thinking mode.

The details:

Mixture of Experts: 35B total params, 3B active. 262K context, extensible to 1M.
73.4% SWE-bench Verified, 20% above Gemma 4-31B’s 52%
Apache 2.0 licensed, fully commercial, zero restrictions
1,827 r/LocalLLaMA upvotes, 959 HN points
Uncensored quants shipping from the community within days
Fits a 24GB MacBook at Q4_K_M. IQ1_M runs on 16GB.

Why builders care: Your 24GB MacBook just became a serious coding agent host. 73% SWE-bench, 1M context, multimodal, agent memory. All offline, all free. Solo builders’ inference bill: $0.

THE TOKENIZER GOTCHA

🔢 Opus 4.7 ships a 10-point coding jump and a hidden 35% token bill hike

Claude Opus 4.7

Anthropic shipped Opus 4.7 yesterday. The coding gains are real: 64.3% SWE-bench Pro (up from 53.4% on 4.6), 87.6% SWE-bench Verified, 98.5% visual acuity (up from 54.5%). Image resolution tripled to 2,576px.

The catch: a new tokenizer counts identical prompts up to 1.35x bigger. Same $5/$25 per million pricing. Your bill could rise 35% on the same workload. Anthropic also admitted Claude Mythos Preview is more capable but not ready for broad release.

The details:

64.3% SWE-bench Pro vs 53.4% on Opus 4.6, +10.9 points
98.5% vs 54.5% visual acuity, a 44-point leap
3x image resolution: up to 2,576px on long edge
1.35x max tokenizer inflation on identical input
Deliberately reduced cyber capabilities (Project Glasswing)

Why builders care: The coding and vision gains are worth it for agentic workflows. But check your token counts before treating this as a drop-in upgrade. Production integrations need re-benchmarking.

YOUR NEW JUNIOR DEV RUNS IN THE BACKGROUND

🤖 OpenAI’s Codex goes desktop: background agents, 90+ plugins, scheduled tasks

Codex for almost everything

OpenAI expanded Codex into a full desktop automation agent. It now runs multiple background agents on your Mac, controlling apps without interfering with your work. Memory persists across sessions. It can schedule future tasks and wake up automatically to continue them. 90+ new plugins including Remotion, JIRA, CircleCI, and GitLab.

The details:

Background agents control apps in parallel, macOS only at launch
Memory across sessions, scheduled self-wakeup for overnight tasks
90+ new plugins (Remotion, JIRA, CircleCI, GitLab, Neon)
EU/UK excluded for computer use and personalization features
ChatGPT Business cut from $25 to $20/seat

Why builders care: The “junior dev working in parallel” use case is real. Memory plus scheduling moves it from reactive assistant to proactive agent. See today’s drama for what happens when it goes off-script.

GIT FOR AGENTS, BUILT IN 100KB

☁️ Cloudflare absorbed Replicate and shipped an agent infrastructure stack

Cloudflare AI Platform

Cloudflare unified its AI Gateway into a single inference platform: one API for 70+ models from 12+ providers with auto-failover. Replicate’s team officially joined. Their models migrate to AI Gateway, compute moves to Workers AI.

The standout: Artifacts, a Git-compatible versioned storage system for agents. The Git engine is ~100KB of Zig compiled to WebAssembly, zero dependencies. ArtifactFS clones a 2.4GB repo in 10-15 seconds vs ~2 minutes traditional.

The details:

70+ models, 12+ providers, one API with auto-failover
Artifacts: $0.15/1,000 ops (10K/mo free), $0.50/GB-month (1GB free)
~100KB Zig-to-WebAssembly Git engine, zero dependencies
Full Agents Week: Sandboxes GA, Mesh, Email, Voice, Browser Run 4x
REST API “in weeks.” Workers-only at launch.

Why builders care: Swap between Claude, GPT, Gemini, and 67 other models with one line change. Auto-failover keeps agent workflows running at 2am. Artifacts solves the “agents need persistent, branchable state” problem nobody else ships natively.

THE JEPSEN GUY WANTS YOU TO UNPLUG

🔍 aphyr published a 10-part series calling AI the same lie as database vendors

aphyr on AI

Kyle Kingsbury spent a decade catching databases lying about safety guarantees through Jepsen testing. His new 10-part series (April 6-16) makes the same argument about AI: vendors claim capabilities they can’t deliver. Downstream harm gets absorbed by everyone in the trust chain.

He calls LLMs “confabulation engines” with “yes-and improv” behavior. He cites Bainbridge’s 1983 automation research: disuse degrades professional capability. Devs report reduced coding ability after LLM reliance.

The details:

26+ distributed systems caught with consistency violations via Jepsen
538 HN points, 586 comments, more comments than upvotes = split debate
Uses Hannah Arendt and James C. Scott’s “metis” (tacit knowledge)
Car-to-LLM analogy: convenient tech with non-obvious second-order harms
Available as free PDF/EPUB

Why builders care: If you’re building on LLM APIs, you’re building on vendors who may overstate reliability. Same risk as building on a database claiming ACID without it. Kingsbury has the track record to back the argument.

🔧 Agent infrastructure week - Every major platform shipped agent tools the same day. Windsurf 2.0 launched an Agent Command Center. Product Hunt featured Agent Card (prepaid Visa for AI agents, self-destructs after 7 days). Google shipped Android CLI for 3x faster agent builds (153 HN pts).

📢 The backlash wave - “The Passive Income trap ate a generation” hit 200 HN points. r/SaaS declared the “End of AI Slop” (109 pts). Discourse published “We Are Not Going Closed Source,” a direct response to Cal.com from Edition #15. antirez argued AI security isn’t brute-force, it’s model intelligence (206 HN pts).

🔬 ML reproducibility crisis - r/MachineLearning’s “Failure to Reproduce Modern Paper Claims” hit 150 upvotes. Random seed variation inflates model performance up to 2x. GPU parallelization creates non-determinism.

DRAMA

WHOOPS, IT ROOTED YOUR TV

📺 Codex autonomously hacked a Samsung TV the same day OpenAI said it could do “almost everything”

Calif security researchers gave Codex full Samsung TV firmware source code. It found a kernel driver flaw, chained 5 vulnerabilities, and gained root (uid=0) with no human guidance on which driver to target. Caveat: Codex had source, not black-box access. 224 HN points, 123 comments.

Why builders care: When you give agents system access, they’ll find uses you didn’t plan for. Plan accordingly.

STACK OF THE DAY

🧰 Kampala by Zatanna (YC W26) - MITM proxy that reverse-engineers any app into a clean API. Intercept network traffic once, it reconstructs the HTTP request sequence, preserves session tokens, and hosts it as a stable endpoint for AI agents. No brittle browser automation. Supports gRPC and WebSockets. 77 HN points, 63 comments.

Not sponsored. We just feature tools builders would actually use.

BOOKMARKED TODAY

🔒 antirez: AI cybersecurity is not proof of work - Redis creator argues AI security is winner-take-most. Model intelligence matters, not compute. Weak models hallucinate bug patterns. 206 HN pts.

🔩 Autoprober - Physical AI hardware hacking tool built from duct tape, an old camera, and a CNC machine. GitHub repo. 112 HN pts.

📝 Marky - Lightweight Markdown viewer for agentic coding. Renders Markdown in terminal for agent workflows. 46 HN pts, 26 comments.

Curated by AI, built by a human.

A 20GB laptop model beat Opus 4.7, Cloudflare ate Replicate, and Codex rooted a Samsung TV

TOP STORIES

THE PELICAN THAT BROKE THE FRONTIER

THE TOKENIZER GOTCHA

YOUR NEW JUNIOR DEV RUNS IN THE BACKGROUND

GIT FOR AGENTS, BUILT IN 100KB

THE JEPSEN GUY WANTS YOU TO UNPLUG

TRENDING TODAY

DRAMA

WHOOPS, IT ROOTED YOUR TV

STACK OF THE DAY

BOOKMARKED TODAY

Get the daily indie hacker digest

You're in.