Last December, Nvidia paid roughly $20B to license Groq’s entire chip portfolio and hire founder Jonathan Ross plus most of his leadership, then insisted it wasn’t an acquisition. Six months on, the company Ross left behind is reportedly raising $650M to relaunch as an AI-inference neocloud, and the silicon it plans to run on is Nvidia’s.
The chip startup that pitched itself as the alternative to Nvidia is being recapitalized to rent Nvidia. If you ever treated a non-Nvidia inference provider as your hedge, the hedge just got absorbed.
In today’s indie hacker news:
- 💸 Nvidia took Groq’s chips and founder, Groq wants $650M to run on Nvidia
- 🔌 A solo dev deleted his proudest feature and went from $150 to $8.6K MRR
- 🧠 Liquid AI’s 8B model hits 253 tokens a second on a laptop CPU
- 💳 An AI consultant told Axios a client burned $500M on Claude in a month
- ⚖️ CNN is the first TV network to sue an AI search engine over scraped news
TOP STORIES
NVIDIA ATE GROQ, GROQ ORDERED SECONDS
💸 After Nvidia’s $20B not-acqui-hire, Groq is reportedly raising $650M to become a neocloud

The story: Axios first reported the raise, and TechCrunch added the structure: it’s an internal pro-rata round from existing shareholders, with backers Disruptive and Infinitum reportedly committed to backstop the whole thing if others pass. No new outside money. No valuation disclosed. Read that structure honestly: when the people who already own you fund the next chapter and outsiders sit it out, that’s a recapitalization, not a vote of fresh conviction. The cash buys a pivot, from designing custom LPU chips to operating an inference cloud on hardware Groq no longer makes.
The details:
- That December license was non-exclusive and perpetual, and it moved about $17B in cash into Groq. Roughly $7.6B of it went straight out to shareholders in February.
- GroqCloud, the inference API now serving 3.5M+ developers, was never part of the Nvidia transaction and keeps running. The new entity is the part that’s pivoting.
- Sunny Madra, the president, and most senior leadership followed Ross to Nvidia. Sources disagree on who runs the company left behind, so nobody’s name is confirmed at the top.
- Pre-Nvidia, Groq raised about $1.75B over six rounds and last carried a $6.9B valuation at its September Series E.
- Keep the asterisk attached: the raise and its terms are all “reportedly,” per Axios. Groq has announced nothing officially.
Why builders care: The provider you picked to route around Nvidia may now route straight back into it. The lesson isn’t “drop GroqCloud,” it’s that multi-provider redundancy on paper can quietly collapse to a single stack underneath, and you won’t get a changelog entry when it does. The neocloud middle, squeezed between hyperscalers above and commodity hardware below, is a thin-margin place to land, which is probably why the round needed a backstop in the first place.
Work from any WiFi like it's your home network. NordVPN's Meshnet runs a free private mesh between your laptop, dev box, and home server. SSH from a café without exposing a port, the way you'd use Tailscale. The paid VPN on top lets you test geo-fenced Stripe checkouts or feature flags from any country.
We get a cut if you sign up. Only added for tools we use ourselves.
THE FEATURE HE WAS PROUDEST OF
🔌 Stuck at $150/mo for 2 years, a solo dev ripped out his node-graph UI and hit $8.6K MRR

The story: A solo founder with a full-time 9-5 wrote up two stuck years on his AI rendering tool for architects. He titled it “one change,” but the honest reading is two moves at once: he deleted the ComfyUI-style node editor he’d built and was clearly proud of, replaced it with a plain chat box that does the prompt engineering for the user, and switched from one-time payments to subscriptions in the same swing. The complexity that flattered his engineering was the exact thing his buyers kept bouncing off. His own framing of the pricing side: an automated email “fires the moment they use their last free credit, with a promo code… catching people at peak demonstrated intent is the biggest conversion driver I have.”
The details:
- Before: about $150/mo, 30-50 visitors a day, under 1% conversion, a node-based UI. After the rebuild: 180-200 targeted visitors a day and 40-70 free-trial signups converting at 20-35%.
- The traffic engine was SEO, not virality. He started a real blog, ran Ahrefs keyword research, and answered the exact questions architects type into search. An aged domain helped him rank fast.
- The funnel: three free renders with no card, then a paywall and an instant promo-code email the second the last credit burns.
- What flopped: Pinterest sent 100,000 views and almost no paying customers. Reach without intent is a vanity metric.
- The $8.6K MRR is self-reported, no Stripe screenshot. The r/SaaS cross-post drew a top comment accusing the write-up of being AI-generated; he replied “real story, AI helped me write it.” Treat the number as a claim, not a receipt.
Why builders care: Repeated user complaints are a roadmap, and he sat on his for two years before acting. A sharp commenter reframed the whole thing better than the title did: this reads less like one magic change and more like finally aligning product, pricing, and acquisition at the same time. The transferable bet is that the powerful feature impressing you is often the wall your customer can’t climb.
AN 8B THAT DOESN’T NEED A GPU
🧠 Liquid AI’s LFM2.5-8B-A1B runs at 253 tokens a second on a laptop CPU

The story: Liquid AI released LFM2.5-8B-A1B, a mixture-of-experts model with 8.3B total parameters but only about 1.5B active per token. That gap is the point: you pay the inference cost of a tiny model while keeping the knowledge of a bigger one. Liquid clocks it at 253 tok/s on an Apple M5 Max and 146 tok/s on an AMD Ryzen AI chip, both on CPU alone, under 6GB of memory. It’s also their first reasoning model on this tier, emitting visible chain-of-thought in <think> tags before answering. Weights are on Hugging Face with day-one support in Ollama, llama.cpp, MLX, and vLLM.
The details:
- It’s not a standard transformer. The architecture stacks Liquid’s gated convolution blocks with a handful of attention layers, trained on 38T tokens, up from 12T in the October predecessor.
- Context grew to 128K, and the vocabulary doubled for cleaner multilingual handling.
- The Q4_K_M quant is a 5.16GB download, small enough for a mainstream laptop.
- Every benchmark is Liquid’s own. It claims to match a Gemma model triple its active size on instruction-following, but r/LocalLLaMA noted the comparison set looks hand-picked, and there are no independent reproductions yet.
- Liquid’s blog calls it open-weight “without restrictions.” That’s not accurate. The LFM Open License cuts off free commercial use above $10M in annual revenue, so anyone bigger has to buy a license.
Why builders care: About 1.5B active parameters means real-time generation on a laptop with no GPU, no API key, and no data leaving the machine, fast enough for an interactive agent loop. With 128K context and llama.cpp plus Ollama support on launch day, you can wire it into a local tool-calling or RAG pipeline right now. Just size your business against that $10M revenue line before you ship it in a paid product, because the marketing copy won’t warn you.
CAP YOUR TOKENS OR BLEED
💳 An AI consultant told Axios a client burned $500M on Claude in one month

The story: One sentence in an Axios piece on enterprise AI sticker shock launched twenty headlines: an AI consultant says one of their clients spent half a billion dollars in a single month after failing to put usage limits on Claude licenses. No company named, no consultant named, no invoice. Tom’s Hardware and the rest all trace back to that one anonymous, second-hand line. So park the $500M as unverified color and keep the part that’s actually useful: a metered LLM API with no caps is a live wire, and agentic tooling is the thing most likely to grab it.
The details:
- The mechanism the reporting cites is real: an agentic workflow can consume roughly 1000x the tokens of a single chat query. Long coding sessions, chained tool calls, and big-context prompts stack fast.
- The confirmed trend underneath the anecdote has teeth. Axios documents Microsoft pulling most internal Claude Code licenses partly on cost, with per-engineer spend running $500 to $2,000 a month.
- Uber’s COO said its entire 2026 AI budget was gone by April. The pullback is a real pattern, even where the half-billion number isn’t checkable.
- Anthropic does offer admin dashboards and per-user limits. They just have to be switched on before rollout, and in this story they reportedly weren’t.
Why builders care: This is a cost-governance story wearing a viral headline. If you ship on any metered model, set hard spend caps and per-key rate limits before launch, alert on token spend rather than request count, and put a kill switch on every agent that can retry or chain calls. An unbounded loop doesn’t fail loudly. It just runs all night and hands you a bill at dawn.
TRENDING TODAY
💀 Indie builders are slamming into the distribution wall - AI made building easy, so the hard part moved downstream, and the front page is one long howl about it. A dev logged 12 hours of manual submissions to 100+ directories: only the high-authority ones (G2, SourceForge, There’s An AI For That) moved his domain rating from 0 to 25+, and even those take six to eight weeks to approve, so they do nothing for launch week. Next to it, a dev who “hates marketing” got his first sale from the Reddit post itself, not the product. Same wall the $8.6K founder above finally climbed, one SEO article at a time.
⚡ Local-LLM builders are tuning inference, not just buying GPUs - The hands-on energy in r/LocalLLaMA has shifted from “what do I buy” to “how fast can I make what I have.” One builder tested speculative decoding on vLLM for Gemma 4 and reported 3.34x faster output on the same card. Separately, kog.ai’s “monokernel” runs the whole decode loop as one GPU-resident program on AMD MI300X, claiming 3,300 output tokens/sec per request, though only on a 2B model so far. Both numbers are author-reported on synthetic tests, but the direction is clear: same hardware, more throughput.
DRAMA
YOU CAN’T COPYRIGHT FACTS, SAYS PERPLEXITY
⚖️ CNN sues Perplexity, the first TV network to take an AI search engine to court
CNN filed in the Southern District of New York on May 28, alleging Perplexity scraped 17,000+ stories, photos, and videos to both train its products and answer user queries in real time, without sending traffic back or paying. There’s a sharper second count: CNN says Perplexity marketed access to “CNN’s premium content” through its Comet Plus tier despite having no deal. CNN tried to license in 2025, couldn’t agree on terms, blocked the scraper, then sued. Perplexity’s response from comms chief Jesse Dwyer was four words: “You can’t copyright facts.” This is a separate action from the 2024 News Corp suit, and reportedly the first by a broadcaster.
Why builders care: The legal line now explicitly targets real-time scraping and RAG, not just training data, which is exactly how most indie AI-search and summarizer products work. Practical cover: respect robots.txt and explicit scraper blocks (CNN blocked Perplexity’s bot before filing), prefer licensed feeds or link-outs over republishing full paragraphs, and never advertise access to content you haven’t licensed. That last one, the Comet Plus claim, is a self-inflicted wound that’s trivially avoidable.
FIRST DOLLAR
ALL THREE CUSTOMERS CAME FROM AN LLM
🚚 A 2-person team did a full month of marketing and got 3 customers, all from ChatGPT
Two founders building TrunkTransfer, a WeTransfer alternative that sends files from your own custom domain, posted an unusually honest month-2 breakdown. After a full marketing month: 353 visitors, 40 registrations, 7 trials, 3 customers. Here’s the twist that ties straight back to today’s distribution thread: 500+ cold Instagram DMs and a pile of LinkedIn outreach produced zero paying customers. All three paying came from LLM recommendations, when ChatGPT, Claude, or Perplexity suggested them. Their hand-built funnel of cold DMs converted nobody; the machines that everyone’s suing over did. Early days, self-reported, but a real signal worth watching: being the answer an LLM gives may be the distribution channel nobody has a playbook for yet.
STACK OF THE DAY
🛠️ Tiny-vLLM
A from-scratch LLM inference engine in C++ and CUDA that rebuilds the hard parts of vLLM, PagedAttention, continuous batching, KV cache, custom kernels, with one dependency and no Python layer. The author is upfront that it’s a teaching project, not a production server, but it’s the cleanest “build it to learn it” reference for what actually happens inside an inference engine. Pairs perfectly with the throughput experiments builders are running this week. 340 stars, Apache-2.0, needs an Nvidia GPU.
Not sponsored. We just feature tools builders would actually use.
BOOKMARKED TODAY
🗄️ “SQLite is all you need for durable workflows” - The “you don’t need a dedicated orchestrator” genre keeps shipping. Yesterday it was Postgres versus Temporal; today the same pitch arrives with SQLite as the checkpoint store. 416 points on HN, and the recurring argument is the same: your existing database can probably hold workflow state, right up until it can’t.
🎨 “Is AI causing a repeat of frontend’s lost decade?” - A sharp think-piece (303 points, 262 comments) arguing AI codegen is re-inflating the exact complexity the frontend world spent a decade unlearning. The skill-atrophy angle is the uncomfortable part.
🇪🇺 “Notes from the Mistral AI Now Summit” - Field notes (319 points) from inside Europe’s frontier lab, useful if you want a read on where Mistral is actually pointed rather than where the press release says.
Curated by AI, built by a human.