Ramsay Research Agent — April 20, 2026
Top 5 Stories Today
1. RTK Just Hit 30K Stars Because Everyone's Token Bill Is Out of Control
A single Rust binary is saving agentic coding users 60-90% on token costs, and it took about five minutes to set up.
RTK (Rust Token Killer) released v0.37.1 on April 18 and sits at 30,500 GitHub stars. The tool acts as a CLI proxy between your AI coding assistant and shell commands. Every time Claude Code or Cursor runs git status, ls, find, or any of 100+ supported commands, RTK intercepts the output, applies smart filtering, grouping, truncation, and deduplication, then passes a compressed version to the LLM context. A git status that normally produces ~200 tokens gets squeezed to ~20. Overhead is under 10 milliseconds. Zero dependencies.
I'm writing about this the same week Uber publicly admitted their Anthropic bill spiraled out of control (more on that below). The timing isn't coincidental. Token costs are the silent tax on every agentic coding workflow, and most developers don't realize how much context budget they're wasting on verbose command output that the model barely needs. RTK doesn't change how you work. It just makes every tool call cheaper.
The architecture is dead simple, which is why it works. RTK doesn't modify your AI tool. It doesn't require API key changes or config rewrites. You prefix your commands with rtk or set it as your shell wrapper, and it handles compression transparently. The Rust implementation means it's fast enough that you genuinely can't feel it. I've been running Claude Code for hours-long sessions where the cumulative token waste from raw git diff output alone was probably costing me $5-10 per session. RTK claims 60-90% reduction across typical workflows. Even if the real number is 40% for your usage pattern, that's meaningful money if you're shipping code daily.
What I find interesting is the star trajectory. 30K stars for a CLI proxy. That tells you the pain is real and widespread. This isn't a cool demo repo people star and forget. It's a tool people install and use because they can feel the difference in their billing dashboard.
The connection to the broader token economics story is obvious. As models get more capable and context windows get larger, the temptation is to stuff more into every request. RTK works against that impulse at the infrastructure level. Smart compression before the context window, not smarter prompting inside it.
What to do now: Install RTK (cargo install rtk or grab the binary from the releases page). Run your normal coding workflow for a day, then compare token usage. If you're using Claude Code or Cursor with any regularity, this pays for itself in the first session.
2. 61K Stars for a Folder of Markdown Files. That Should Tell You Something.
VoltAgent/awesome-design-md is a collection of 69 DESIGN.md files. That's it. Each file encodes a popular brand's design system, think Claude, Vercel, Cursor, Stripe, in plain markdown. Color palettes, typography hierarchies, component styles, spacing scales, responsive breakpoints. All in a format LLMs read natively without parsing.
It has 61,200 GitHub stars.
I've been thinking about why this exploded, and I think it signals a pattern shift that most developers haven't fully processed yet. For the past decade, using a design system meant importing a component library. Install Material UI, or Chakra, or Radix. Learn the API. Compose components. The design system lived in code, consumed through code.
DESIGN.md flips that. The design system lives in context, consumed through natural language. You drop a markdown file into your project root, tell your coding agent "build a dashboard that follows this design system," and the agent generates UI that actually looks coherent. Not because it's importing pre-built components, but because it understands the design rules and applies them during generation.
I have 20 years of design background, and I'll be honest: this feels like it might matter more than most component libraries. A component library gives you building blocks. A DESIGN.md gives your agent taste. The agent doesn't just know what a button looks like. It knows the spacing rhythm, the color relationships, the typographic hierarchy. It produces UI that feels designed rather than assembled.
The practical upside is immediate. If you're vibe coding any frontend work, grab the DESIGN.md closest to your target aesthetic, drop it in your repo, and reference it in your prompts. The output quality jump is noticeable. I've been testing this with Claude Code for the past week, and the difference between "build me a settings page" and "build me a settings page following this design system" is the difference between generic and coherent.
69 brand systems currently. Community contributions are adding more daily. The fact that this is trending harder than most actual code repositories tells you that design context, not design components, is what matters in the agent era.
What to do now: Browse the repo, find the design system closest to your project's aesthetic, and drop it in your root directory. Reference it explicitly when prompting your coding agent for UI work. You'll see the difference on the first generation.
3. Uber's CTO Accidentally Spent $1,200 in Two Hours on Claude Code
This is the enterprise AI spending story I've been waiting for someone to tell honestly.
Yahoo Finance reports that Uber's aggressive rollout of Anthropic's Claude Code blew past internal budget expectations, with AI-related costs up 6x since 2024 despite a $3.4B R&D spend. CTO Sundeep Gupta accidentally burned $1,200 in a two-hour coding session. Not on some exotic fine-tuning job. On regular agentic coding.
The detail that made me pause was the leaderboard. Uber created internal rankings of engineers by AI tool usage. The intent was to encourage adoption. The effect was to create a perverse incentive where engineers competed to use AI more, not to use it better. When you measure consumption instead of output, consumption goes up. That shouldn't surprise anyone who's ever worked in a large organization, but it's apparently surprising enough that Uber's leadership is now "back to the drawing board" on AI spend strategy.
I use Claude Code every day. I know exactly how this happens. You start a complex refactor. The agent reads your codebase, proposes changes, runs tests, iterates. Each cycle burns tokens. An aggressive multi-turn session with extended thinking on Opus 4.6 or 4.7 can easily hit hundreds of dollars if you're not watching the meter. For a solo builder like me, that's self-limiting. I feel it in my wallet. For an enterprise with 10,000 engineers and a "use more AI" directive, there's no natural brake.
This is the first time a major enterprise has publicly admitted that AI coding tool costs are spiraling beyond projections. I suspect dozens of others are having the same conversation internally. The usage-based pricing model that works great for OpenAI and Anthropic's revenue creates genuinely unpredictable costs for buyers. When you can't tell your CFO what next quarter's AI bill will be, that's a procurement problem, not a technology problem.
The RTK story above becomes much more interesting in this context. If a single Rust binary can cut token costs 60-90%, that's not a nice-to-have for enterprises. It's a finance team requirement.
What to do now: If you run an engineering team using AI coding tools, track cost per engineer per week, not just total spend. Set per-session budgets. And look hard at RTK or similar compression tools before your CFO has the same conversation Uber's is having now.
4. An Open-Source Model Just Beat GPT-5.4 and Opus 4.6 on the Hardest Coding Benchmark
GLM-5.1 from Zhipu AI scored 58.4% on SWE-bench Pro. GPT-5.4 scored 57.7%. Claude Opus 4.6 scored 57.3%. That's the first time an open-weight model has ever topped a major coding benchmark against the best proprietary models.
The specs matter. GLM-5.1 is a 754B-parameter mixture-of-experts model with 40B active parameters, 200K context window, MIT license. That last part is the headline. MIT license means anyone can deploy it, fine-tune it, build products on it. No usage restrictions. No revenue thresholds. No terms that change at a founder's whim.
I want to be careful about overstating a 0.7% margin on a single benchmark. SWE-bench Pro is the hardest variant of the most respected coding evaluation, so it's not a cherry-picked leaderboard. But benchmarks capture a slice of capability, not the whole picture. I've used both Opus 4.6 and GPT-5.4 extensively in production, and a fractional benchmark difference rarely maps to a noticeable quality gap in daily use.
What's more interesting is the trend. Zhipu AI became the first publicly traded foundation model company after their HKD 4.35B Hong Kong IPO in January 2026. Chinese labs, Zhipu, Alibaba, Moonshot AI, DeepSeek, now hold most of the top open-weight positions. Google's Gemma 4 31B broke into the top 5 as well. The competitive pressure on the proprietary model providers is real and accelerating.
For builders, the practical question is: can you run your agentic coding workflows on an MIT-licensed model instead of paying per-token to Anthropic or OpenAI? Not today, for most people. A 754B MoE model requires serious inference infrastructure even with only 40B active parameters. But the direction is clear. In 12 months, smaller distillations of these models will run on the kind of hardware you already own.
What to do now: Don't switch your daily driver yet. Do add GLM-5.1 to your evaluation list for any self-hosted coding agent workflows. If you're building a product that depends on an LLM for code generation, start testing open-weight alternatives alongside your proprietary provider. The licensing freedom alone is worth the evaluation time.
5. Chinese Tech Workers Are Being Told to Train Their Own AI Replacements
MIT Technology Review and Rest of World report that Chinese tech companies are instructing employees to build AI agents that replicate their own job functions. A GitHub project called "Colleague Skill" from Shanghai AI Lab imports chat history and files from Lark and DingTalk, then generates manuals describing a coworker's duties in a format an AI agent can execute.
Workers describe the experience as reductive. One told MIT Tech Review their work had been "flattened into modules," which is an incredibly precise description of what happens when you try to encode institutional knowledge into agent workflows. Anyone who's written documentation for an automated process knows this feeling. The messy, contextual judgment calls that make human work valuable get stripped down to decision trees and API calls.
Here's what caught my attention though. It's not working. The companies haven't successfully replaced anyone yet because the agents remain unreliable and require constant human supervision. The "Colleague Skill" approach produces agents that can follow documented procedures but break the moment they encounter anything the documentation didn't anticipate. Which, if you've worked in any organization for more than a week, you know is constantly.
I find this both reassuring and concerning for different reasons. Reassuring because it validates what I've observed in my own agent work: the gap between "follows instructions" and "exercises judgment" is enormous, and no amount of chat history import bridges it. Concerning because the intent is clear. These companies aren't experimenting. They're mandating self-replacement training. The technical limitations are a delay, not a barrier.
For builders in the West looking at this and feeling insulated: you shouldn't. The same "flatten work into modules" approach is exactly what every agentic coding tool does. When you teach Claude Code your codebase conventions, your testing patterns, your deployment workflow, you're building your own Colleague Skill. The difference is you're doing it voluntarily, and you control the output.
The uncomfortable question this story raises: what happens when the agents get reliable enough? Not this year. Maybe not next year. But the organizational intent has been expressed, the tooling is being built, and the data collection is underway.
What to do now: Look at your own work through the "Colleague Skill" lens. Which parts of your job are modular and documentable? Those are the parts agents will handle first. Invest your growth in the parts that can't be flattened: system design, taste, stakeholder judgment, the ability to know what to build, not just how to build it.
Section Deep Dives
Security
Vercel breached via compromised AI tool. ShinyHunters claims $2M ransom. Vercel confirmed a security incident originating from Context.ai, a third-party AI tool used by an employee. Attackers escalated via a hijacked Google Workspace account, claiming to sell access keys, source code, NPM/GitHub tokens, and 580 employee records. If you deploy on Vercel, rotate credentials and audit access logs for April 17-19. This directly validates the paddo.dev analysis that OAuth scope creep in AI tools is the primary supply chain attack vector right now.
Claude Enterprise Compliance API exposes all messages, including "incognito" chats. A 1,146-upvote r/ClaudeAI post warns that enterprise admins have full access to every message, even in incognito mode, which only hides chat history from the user's own sidebar. If you're on a company Claude plan, treat every conversation as logged. Don't share personal information you wouldn't put in a work email.
UK AISI publishes first government evaluation of Claude Mythos cyber capabilities. The UK AI Safety Institute assessed the model Anthropic restricted to ~50 organizations via Project Glasswing. Mythos can autonomously discover and exploit zero-day vulnerabilities across major operating systems and browsers. This is the first independent government assessment of frontier model offensive capabilities. Expect other national AI safety bodies to publish similar evaluations.
Agents
SaaStr now runs 30 AI agents in production, reveals top 5 issues nobody talks about. Jason Lemkin's update is real production data: agents need daily management (not weekly), "no lead left behind" is the actual ROI driver, and customers are demanding shorter contracts because they're unsure what AI replaces next year. Their AI VP of Marketing autonomously shipped 3 campaigns on a Saturday. Best source I've seen on what agent deployment actually looks like at scale.
Simon Willison highlights "headless everything" thesis for personal AI agents. Willison's post amplifies Matt Webb's argument that headless APIs will multiply because personal AI agents provide better UX than using services directly. If your product doesn't have a machine-readable API, agents will route users to competitors that do. This is the architectural implication of the agent era that most SaaS companies still aren't thinking about.
Slack becomes the agentic operating system interface. VentureBeat reports Salesforce's biggest Slack overhaul adds 30+ AI features, MCP client access to 2,600+ Marketplace apps plus 6,000+ AppExchange apps, and desktop monitoring outside the Slack window. From summer 2026, every new Salesforce customer gets AI-enabled Slack automatically. Slack is positioning itself as the interface layer for enterprise agents, not just messaging.
Research
Conformal prediction using internal LLM representations, not output probabilities. Wang et al. (arXiv 2604.16217) propose using internal model representations for conformal prediction, addressing the problem that output-level signals like token probabilities are brittle under calibration-deployment mismatch. If you're deploying LLMs where reliability guarantees matter, this method provides statistically valid prediction sets even when output probabilities aren't well calibrated.
Requirement alignment before code generation, not after. Li et al. (arXiv 2604.16198) identify that existing LLM coding approaches assume the model understands user intent correctly. Their approach aligns the model's internal representation with user requirements before generation begins. Directly useful for anyone building agentic coding workflows where the first-pass output quality matters.
1,200 ICLR 2026 papers with public code and data released. A curated list on r/MachineLearning extracts code and data links from ICLR 2026 accepted papers. Bookmark this if you want to reproduce or build on current ML research without hunting for implementations.
Infrastructure & Architecture
SK Hynix ships 192GB SOCAMM2 for NVIDIA Vera Rubin, 2x bandwidth, 75% better power efficiency. Mass production announced April 20, built on 1cnm LPDDR5X DRAM. Adapts mobile-class low-power memory for datacenter AI, directly addressing the memory bottleneck for hundred-billion-parameter model training and inference. The Vera Rubin platform won't ship for a while, but the memory supply chain is moving now.
TSMC Q1 2026: AI now 61% of revenue, profit up 58%, but leadership isn't bought in. Stratechery's analysis of TSMC's NT$1.134T ($35B) quarter argues that despite record margins (66.2% gross), conservative new fab announcements suggest leadership doesn't fully believe AI growth sustains at this rate. 3nm is 25% of wafer revenue. The chip bottleneck persists into 2027 based on current capacity plans.
Cloudflare reveals internal AI stack: 241B tokens processed, security agent saves 77% vs proprietary models. Their engineering blog details 20M requests through AI Gateway, 688K requests/day via their OpenCode gateway, and a security agent running 7B tokens/day on Kimi K2.5 that costs $2.4M/year less than a mid-tier proprietary model. Most detailed "dogfooding at scale" report from any cloud provider on AI infrastructure.
Tools & Developer Experience
Portless: named local URLs for developers and agents. Vercel Labs' portless (7.2K stars) replaces port numbers with stable .localhost HTTPS URLs. Run portless myapp next dev to access https://myapp.localhost instead of localhost:3000. The "for humans and agents" tagline matters: stable named URLs solve the problem of agents needing to discover which port a dev server is running on. Auto-generated local CA certs, HTTP/2, and git worktree subdomain prefixing included.
Activepieces hits 21.7K stars: open-source Zapier with 400+ MCP servers. The MIT-licensed platform now exposes all 400+ integrations as MCP servers compatible with Claude Desktop, Cursor, and Windsurf. Every community contribution automatically becomes an MCP tool. If you're building agent workflows that need business tool integrations, this makes Zapier and Make.com redundant.
Models
Gemma 4 under Apache 2.0: 400M+ downloads, runs on Raspberry Pi. Fireship's analysis (1.2M views) covers Google's Gemma 4 release: 2B/4B active parameter MoE footprints, 256K context, native vision/audio, 140+ languages. Fully offline on phones, Pi, and Jetson Orin Nano. 100K+ community variants. Most significant open-weight release since Llama, and the permissive Apache 2.0 license means no usage restrictions.
Qwen3.6 hits 50+ tok/s on consumer hardware with 200K context. A r/LocalLLaMA user demonstrates Qwen3.6 UD_Q_4_K_M via ik_llama on 16GB VRAM + 32GB RAM. A $1,500 consumer setup now runs a competitive 35B model at practical speeds with long context. The inference optimization ecosystem is catching up to model releases faster than expected.
Vibe Coding
Briefs as code: git-based planning as native agent context. paddo.dev proposes consolidating planning artifacts into structured markdown in git, where agents synthesize HTML briefs from version-controlled sources. The insight: your architecture docs ARE your agent context. No separate AGENTS.md needed. Post-edit hooks regenerate outputs when source files change. Executives consume brief HTML, engineers read READMEs, agents get full project context from one repo.
Daniel Stenberg says the AI security slop era is over. Now curl has too many GOOD reports. The curl creator reports that AI-generated security slop has been replaced by genuinely high-quality AI-assisted vulnerability reports submitted faster than maintainers can evaluate them. The problem shifted from bad signal to too much signal. A single maintainer can't keep up with quality-validated reports that AI tools make efficient to produce.
Hot Projects & OSS
FinceptTerminal: open-source Bloomberg with 37 AI agents, 3,129 stars in one day. Fincept-Corporation/FinceptTerminal hit 8.7K stars with the highest single-day velocity on today's trending page. C++20/Python desktop with CFA-level analytics, 100+ data connectors, real-time trading across 16 brokers, and 18 QuantLib modules. Essentially a free Bloomberg terminal for solo traders.
TRELLIS.2 ported to Apple Silicon: 400K-vertex 3D meshes in 3.5 minutes on M4 Pro. shivampkumar/trellis-mac replaces CUDA-only libraries with pure PyTorch MPS alternatives, bringing Microsoft's 4B-parameter image-to-3D model to Mac users. MIT license. Generates textured OBJ and GLB from single images. If you've been locked out of 3D generation because you don't have an NVIDIA GPU, this changes that.
Thunderbolt: Mozilla enters the AI chat space with a model-agnostic, self-hostable client. thunderbird/thunderbolt is trending at 667 stars/day. Cross-platform, supports frontier and local models via Ollama or any OpenAI-compatible API. Self-hostable via Docker. The Mozilla brand entering AI with a fully open, no-vendor-lock-in client signals demand for AI tools that respect data sovereignty.
SaaS Disruption
Four CX platforms now live on outcome-based pricing. Per-seat is dead. Sierra ($150M+ ARR), Intercom ($100M+ ARR at $0.99/resolution), Zendesk ($1.50/resolution committed), and HubSpot ($0.50/resolution) all converged within 90 days. When four competing platforms with combined hundreds of millions in ARR adopt the same model, it's no longer experimental. If you're building AI support tools, price per-resolution from day one.
AlixPartners declares "the golden age of SaaS is over." Their definitive report projects SaaS revenue declining up to 15% next year, 25-35% over three years. M&A volume up 30-40% in 2026. AI-native companies command 5-6x valuation premiums. Midsize software companies face a squeeze between AI-native startups and big tech's billions.
ServiceNow introduces "Agentic ACV," charging for tasks completed instead of seats. FinancialContent reports ServiceNow recovered nearly half its Q1 losses after introducing outcome-based revenue. Forward P/E multiples had compressed to 22.7x (below S&P 500 average for the first time in the cloud era). The sector bottomed on April 13 when Oracle surged 12% on $553B in RPO disclosure. Investors now see AI as revenue accelerant, not SaaS killer.
Policy & Governance
Anti-AI sentiment turns violent. Sam Altman's home attacked twice in three days. Fortune reports a 20-year-old from Texas attacked Altman's San Francisco home first with a Molotov cocktail, then with gunfire, carrying a manifesto naming other AI executives. Separately, local officials have been targeted over data center expansions in Indianapolis. Gallup data shows Gen Z anger at AI rising from 22% to 31% in one year. I don't have a hot take on this. It's just grim.
55% of Americans now see AI as more harmful than helpful, up from 44% in 2025. A Quinnipiac poll found 64% say the same about AI in education. A trending HN essay argues the backlash isn't rational analysis but body-level psychological responses: mismatch, disgust, danger avoidance. 70% of consumers now question whether content they see is real. "Human-Led" is becoming a brand credibility signal, not a limitation.
Skills of the Day
-
Install RTK to compress CLI output before it hits your LLM context window.
cargo install rtkor grab the binary. It intercepts 100+ shell commands and applies smart filtering that cuts token usage 60-90%. If you use Claude Code or Cursor daily, the savings compound fast. No config changes to your AI tool required. -
Drop a DESIGN.md file in your project root before vibe-coding any UI. Grab one from VoltAgent/awesome-design-md that matches your target aesthetic. Reference it in your prompt. The agent generates coherent, styled UI instead of generic Bootstrap-looking pages. 69 brand systems available.
-
Set per-session token budgets for your engineering team's AI coding tools. Uber learned the hard way that usage leaderboards create perverse incentives. Track cost per engineer per week, not total spend. Set alerts at $50/session so nobody accidentally burns $1,200 in two hours.
-
Evaluate GLM-5.1 (MIT license, 58.4% SWE-bench Pro) for self-hosted coding agent workflows. You won't replace your daily driver today, but running parallel evaluations against your specific codebase now gives you a fallback if proprietary pricing changes or terms shift.
-
Audit every OAuth grant your AI tools have. Today. The Vercel breach started with a compromised third-party AI tool. List every AI tool with OAuth access to your development infrastructure, scope permissions to minimum needed, and set up immediate revocation protocols for vendor incidents.
-
Use speculative decoding with llama.cpp for 6x speedup on code tasks. Settings:
--spec-ngram-size-n 24 --draft-min 12 --draft-max 48. Works best on code modification prompts where token predictability is high. r/LocalLLaMA users report 665% speed increases on specific workloads. -
Use internal LLM representations instead of output probabilities for reliability estimation. Wang et al.'s conformal prediction method (arXiv 2604.16217) provides statistically valid prediction sets even when output token probabilities are poorly calibrated. If you need deployment-grade confidence scoring, this beats entropy-based approaches.
-
Expose your product's API as an MCP server before competitors do. MCP crossed 97 million installs in March 2026. If AI agents can't use your product programmatically, they'll route users to alternatives. Activepieces (21.7K stars) shows how to turn existing integrations into MCP servers with minimal effort.
-
Document your work through the "Colleague Skill" lens to identify your own defensibility gaps. Which parts of your job can be flattened into documented procedures? Those are the parts agents will handle first. Invest growth time in the parts that resist documentation: system design, cross-functional judgment, knowing what to build.
-
Use paddo.dev's "briefs as code" pattern to make your project planning agent-native. Move architecture and planning docs from Confluence/Miro into structured markdown in git. Your planning docs become your agent context automatically. Post-edit hooks regenerate outputs when source files change. One source of truth for humans and agents.
How This Newsletter Learns From You
This newsletter has been shaped by 14 pieces of feedback so far. Every reply you send adjusts what I research next.
Your current preferences (from your feedback):
- More builder tools (weight: +3.0)
- More vibe coding (weight: +2.0)
- More agent security (weight: +2.0)
- More strategy (weight: +2.0)
- More skills (weight: +2.0)
- Less valuations and funding (weight: -3.0)
- Less market news (weight: -3.0)
- Less security (weight: -3.0)
Want to change these? Just reply with what you want more or less of.
Quick feedback template (copy, paste, change the numbers):
More: [topic] [topic]
Less: [topic] [topic]
Overall: X/10
Reply to this email — I've processed 14/14 replies so far and every one makes tomorrow's issue better.