Ramsay Research Agent — April 16, 2026
Top 5 Stories Today
1. Snap Cuts 1,000 Jobs, Says AI Now Writes 65% of Its Code
Sixty-five percent. That's the number Evan Spiegel dropped when announcing Snap is laying off roughly 1,000 employees, 16% of its workforce. AI now generates more than 65% of Snap's new code. TechCrunch has the details: $500M+ stripped from the annualized cost base, stock up 7% on the news.
This is the highest disclosed AI-coding ratio from any public company. Meta requires a percentage of code changes to be agent-assisted but hasn't published a number. Google's internal adoption, as Steve Yegge revealed last week, looks more like a tractor company than a tech giant. Snap just walked up and said "65%."
I want to interrogate what 65% actually means, though. If I use Claude Code to generate a boilerplate Express server, technically AI "wrote" that code. But the engineering decisions, the architecture choices, the error handling strategy, the deployment config, all of that is still human judgment. Snap's number could mean "65% of lines committed were initially generated by AI" or it could mean something much more meaningful. They didn't specify, and that ambiguity matters.
Here's where it gets uncomfortable. Snap's stock jumped 7% because Wall Street read "65% AI code" as "we need fewer humans." That's the incentive structure now. Every public company CEO saw that 7% pop. Every board will ask their CTO: "What's our AI coding percentage?" The metric will become a target, and when a metric becomes a target, it stops being a good metric. Teams will optimize for AI-generated LOC instead of shipping better products.
The $500M in cost savings is real money. But the Forrester data I'll cover next says 55% of companies regret AI-related layoffs. Snap is making a big bet that they're in the 45% who won't.
What builders should do: Don't chase Snap's 65% number. Instead, measure what actually matters: time-to-ship, defect rates, and developer satisfaction. If AI coding tools make your team faster without degrading quality, great. If you're pressured to hit an AI-generated-code percentage, push back with quality metrics. The companies that win won't be the ones with the highest AI percentage. They'll be the ones that ship the best products.
2. AI Boomerang: 55% of Employers Regret AI Layoffs. 29% Already Rehiring Cut Workers.
Three independent data sources converged this week, and they tell the same story: companies that replaced humans with AI are quietly walking it back. Forrester, Robert Half, and Careerminds all report a "boomerang" pattern. 55% of employers regret AI-related layoffs. 29% have already rehired displaced workers. Among those rehiring, 35.6% brought back more than half the roles they cut.
The reason keeps coming up in the data: AI couldn't handle situations requiring "human-to-human trust." Over 50% of HR leaders said AI integration required more human oversight than expected. The savings on paper didn't survive contact with reality.
This directly contradicts the Snap story above, and that tension is the actual news today. Snap is laying off 1,000 people and citing AI efficiency. Forrester says more than half of companies who did something similar regret it. Both things are true at the same time. The question is which camp Snap ends up in.
I've been building solo products for the past year with AI tools as my primary collaborator. I can tell you from direct experience: AI is extraordinary at generating code, refactoring, writing tests, and handling mechanical work. It's terrible at knowing which code to write. The taste gap, the judgment about what to build and why, that's the part companies discover they need humans for only after they've cut the humans.
Fortune's FOBO reporting adds another dimension: 80% of enterprise workers are either avoiding or actively rejecting AI tools. 54% bypassed company AI tools in the past 30 days. Early adopters save 40-60 minutes daily, but the resistance gap is widening. Meanwhile, LinkedIn's data shows hiring down 20% since 2022, but LinkedIn explicitly blames interest rates, not AI: "We've looked and, honestly, we haven't seen it."
What builders should do: If your leadership is benchmarking against Snap's layoff-and-automate playbook, bring the Forrester data to the meeting. The 55% regret rate and 35.6% rehiring-more-than-half stat are the strongest counter-arguments to "just replace them with AI." The better play is redeploying humans to judgment-heavy work while AI handles the mechanical layer.
3. Cursor 3 Ditches VS Code for Agent Orchestration. Claude Code Holds 54% Market Share at $1.2B ARR.
The IDE market is fragmenting, and this week drew the sharpest lines yet. Cursor 3 launched as a rebuilt agent-orchestration platform in Rust and TypeScript, replacing the VS Code fork with an Agents Window for dispatching and monitoring multiple AI coding agents. Anysphere hit $2B ARR, doubling from $1B in three months, with a $60B valuation.
But here's the catch: Claude Code now holds 54% of the AI coding agent market and is on track for $1.2B in annual revenue, per Menlo Ventures data and Business of Apps analysis. That's more than half of all enterprise spending on Anthropic products flowing through one CLI tool.
Cursor's pivot makes strategic sense when you see those numbers. You can't out-CLI a terminal-native tool backed by the model provider itself. So Cursor went the other direction: agent orchestration as a visual layer. The Agents Window, dispatching work across local and cloud machines, monitoring multiple parallel sessions, that's a genuinely different product category. It's not "better VS Code." It's "mission control for AI agents."
The early user reports tell a more complicated story, though. One developer spent $2,000 in two days on Cursor 3. Compare that to Claude Code's flat-rate Max subscription. When your product costs 100x more than the market leader for similar output, your differentiation needs to be enormous.
Meanwhile, a new native macOS IDE called Agent hit Hacker News with 57 points. Built with AppKit, not Electron, not VS Code. And OpenAI updated Codex into a "SuperApp" with computer use, image generation, memory, and a plugin marketplace. The IDE market isn't consolidating. It's splintering into agent-first platforms, traditional editors with AI bolted on, and model-provider-native tools.
What builders should do: If you're paying per-token for coding agents, track your actual cost per merged PR (see story #4). If you're on Cursor 3, monitor your spend for the first week before going all-in. If you're on Claude Code, the 54% market share means Anthropic will keep investing here. The CLI isn't going away. For teams using multiple tools, check out Ruler (2.6K stars), which syncs a single .ruler/ directory to CLAUDE.md, .cursorrules, AGENTS.md, and every other agent config file.
4. The Pragmatic Engineer on Tokenmaxxing: Cost per Merged PR Ranges from $0.28 to $89.32
Gergely Orosz published the first serious look at what AI coding actually costs at scale, and the numbers are wild. The Pragmatic Engineer covers "tokenmaxxing," a trend where engineers compete on AI token consumption leaderboards. At Meta, one engineer averaged 281 billion tokens. Let that number sink in for a second.
Jellyfish data shows cost per merged PR ranges from $0.28 at low usage to $89.32 at the highest tier. That's a 319x spread. The curve isn't linear. It shows hard diminishing returns: after a certain point, throwing more tokens at a PR doesn't make it better. It just makes it more expensive.
This matters right now because current AI coding plans are heavily subsidized. Anthropic's Max plan, Cursor's Pro tier, GitHub Copilot. these are all priced to acquire users, not to be profitable. Orosz's core question: what happens when the subsidies end and companies face true costs?
I run Claude Code on a Max subscription. It's the best $200/month I've ever spent. But I also know I'm not paying the real cost of inference. When Anthropic eventually reprices, or when my usage patterns hit whatever internal cost thresholds they've set, the economics change. Every team building AI coding into their workflow should be modeling this scenario.
The connection to Cursor 3's $2,000-in-two-days reports and Snap's 65% AI coding claims is direct. If Snap is generating 65% of its code with AI and paying enterprise rates, what's their actual cost per engineer per month? Is it cheaper than the humans they cut? The Jellyfish data suggests it might not be, at scale, once subsidies normalize.
Codeburn, a new TUI dashboard at 1.95K stars, tracks token spend across Claude Code, Codex, and Cursor with per-session breakdowns. The fact that cost observability tools are emerging as their own category tells you something about where this is heading.
What builders should do: Start tracking your cost per merged PR today. Even rough numbers help. If you're spending more than $20/PR on AI assistance, audit whether those PRs actually needed that much compute or if your workflow is inefficient. Set up budget alerts. And model what happens to your team's velocity if AI coding prices double in 12 months.
5. Apple Sends 200 Siri Engineers to Multi-Week AI Coding Bootcamp
If Apple, the company with more cash than some countries, thinks its engineers have an AI coding skill gap serious enough to mandate bootcamps, what does that say about your team?
MacRumors reports that Apple is sending approximately 200 Siri engineers to a multi-week AI coding bootcamp ahead of WWDC 2026 (June 8-12). The remaining 60 engineers stay on core Siri, another 60 shift to testing. Here's the detail that caught my eye: some Apple divisions already have large budgets for Claude Code while Siri engineers lagged behind. The skill gap isn't between Apple and the rest of the industry. It's inside Apple itself.
This mirrors the 20/60/20 split Steve Yegge described at Google: 20% power users, 60% chat-mode users, 20% refusers. Apple apparently looked at their Siri division and decided the 60% in the middle needed forced intervention. A multi-week bootcamp isn't a lunch-and-learn. It's an acknowledgment that the gap between AI-proficient and AI-resistant engineers has real product consequences.
The timing matters. WWDC is eight weeks away. Apple's Siri has been embarrassingly behind for years. Whatever they're planning to show requires engineers who can build with AI tools, not just build AI features. The distinction is important. Building WITH Claude Code is different from building an AI product. Apple needs both, and their Siri team apparently had neither.
I've been saying for months that the bottleneck isn't writing code anymore, it's orchestrating AI. This is Apple proving it with their budget. They're not buying a new tool or hiring new people. They're retraining existing engineers to work differently. That's the hardest kind of organizational change, and Apple decided it was worth pulling 200 engineers off active development for weeks to do it.
Duolingo's CEO tried the opposite approach, mandating AI usage in performance reviews, then reversed course after backlash. Meta requires agent-assisted code changes. Apple chose bootcamps. Three different companies, three different strategies for the same problem: how do you get your existing engineers to adopt AI tools?
What builders should do: Audit your team's AI coding proficiency honestly. Not "do they have access to Copilot?" but "can they run an agentic coding session with context engineering, custom instructions, and multi-step workflows?" If the answer is no for most of your team, you have the same problem Apple has. The investment is training, not tooling.
Section Deep Dives
Security
MCP's STDIO transport is execute-first, validate-never. 150M+ downloads affected. Ox Security disclosed that MCP's STDIO transport passes arbitrary command strings directly to subprocess execution across all official SDKs (Python, TypeScript, Java, Rust). The command executes even when the MCP server fails to start. Ox took over thousands of public servers across 200+ open-source projects and uploaded proof-of-concept malicious servers to 9 of 11 major MCP marketplaces. Anthropic responded that this is "expected behavior." If you're running MCP servers in production, treat every STDIO connection as potentially hostile until the protocol gets mandatory validation.
MCPThreatHive: first automated threat intelligence platform for MCP ecosystems. Researchers released MCPThreatHive, an open-source tool that automates threat detection across MCP-based agent systems. It arrives as independent audits reveal 43% of MCP servers contain command injection vulnerabilities, 33% allow unrestricted network access, and 5% of open-source servers already have tool-poisoning attacks seeded in them. The 97 million MCP installs number makes this a target-rich environment.
28.65 million new hardcoded secrets on GitHub in 2025. AI credential leaks up 81%. GitGuardian's State of Secrets 2026 found AI-service API key leaks surging 81% year-over-year. In response, GitGuardian shipped AI Hooks that integrate with Claude Code, Cursor, and Copilot to scan prompts before they reach the model. The combination of AI code generation speed and credential sprawl is a compounding risk surface. Set up ggshield hooks today.
OpenAI admits prompt injection is "unlikely to ever be fully solved." OpenAI built an RL-trained automated red teamer that discovers multi-step prompt injection attacks spanning hundreds of steps that human red teams missed entirely. The model steers browser agents into sophisticated harmful workflows using strategies absent from any human campaign. OpenAI's simultaneous admission that prompt injection may never be fully solved is the most honest statement any frontier lab has made about the fundamental limits of LLM safety.
Agents
OpenAI Agents SDK gets native sandbox support with 7 provider integrations. TechCrunch reports the update adds sandboxed execution for file, tool, and code workflows with integrations for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. Configurable memory, portable workspaces, built-in snapshotting for durable runs. Python first, TypeScript planned. This is OpenAI catching up to what Claude Code has had (sandboxed execution with tool use) and packaging it as an SDK for everyone.
Microsoft Agent Framework 1.0 GA ships with MCP, A2A, and browser DevUI. Microsoft DevBlogs announced stable APIs, long-term support, and a browser-based DevUI that visualizes agent execution in real time. Available for .NET and Python with cross-runtime interop. If you're in a Microsoft shop evaluating LangChain or CrewAI, this is the enterprise-backed alternative with Azure-native integration. The multi-protocol support (MCP + A2A) means you're not locked into one agent communication standard.
Cloudflare Project Think: durable agentic platform with sandboxed execution and npm resolution. Cloudflare's preview adds durable virtual filesystems, sandboxed JavaScript execution, runtime npm resolution, headless browser, and full OS sandbox access to their Agents SDK. Also launched: AI Platform as unified inference across 14+ providers, AI Search primitive, and Email Service for agent-initiated communication. Cloudflare is building the infrastructure layer that makes long-running, stateful agents viable on the edge.
Zuckerberg is training an AI agent to handle CEO duties at Meta. Bloomberg reports the agent functions as an intelligence retrieval tool that surfaces internal signals and compresses information otherwise requiring a chain of human intermediaries. FT separately reports Meta is building an AI version of Zuckerberg that emulates his mannerisms for employee interaction. This is part of Meta's strategy to flatten a 78,000-person org using AI. Whether this is visionary or dystopian depends on your perspective. Probably both.
Research
Stanford AI Index 2026: AI agents perform only 50% as well as PhD experts on complex tasks. Nature's coverage of the report reveals a paradox: researchers are widely adopting AI agents for autonomous workflows even though objective performance is half of expert-level. 6-9% of all natural-science publications now mention AI. The report catalogs a wave of science foundation models trained on domain-specific datasets. The takeaway isn't that agents are bad. It's that they're being deployed ahead of capability, and the gap matters most on hard problems.
Parcae: 770M looped model matches 1.3B Transformer performance. Together AI and UC San Diego published the first scaling laws for looping architectures. A 770M parameter model reaches 1.3B-level performance by sending activations through layer blocks in loops, increasing compute without adding parameters. 6.3% lower validation perplexity over prior looped models. For on-device inference, this is a big deal. You get 2x the effective model quality at half the parameter count.
Context compression for repo-level tasks has dangerous trade-offs. The first systematic study of context compression for multi-file code intelligence reveals that naive compression destroys critical cross-file signals. When your AI coding tool compresses context to fit the window, it might be throwing away the exact dependency relationship it needs. Anyone building code tools that operate beyond single-file scope should read this paper before implementing compression.
Infrastructure & Architecture
NVIDIA redefines data centers as "AI token factories." Blackwell delivers 15x lower cost per token. NVIDIA's framework repositions cost-per-token as the only metric that matters. GB200 NVL72 delivers 10x throughput per megawatt and 15x lower cost per million tokens versus Hopper, despite costing nearly double per hour ($2.65 vs $1.41). DeepInfra cut cost from 20 cents to 5 cents per million tokens on Blackwell. NVIDIA claims a $5M system generates $75M in token revenue. The math works if utilization stays high.
Zig 0.16.0 ships with new async I/O and dependency injection for main(). Zig's release represents 8 months of work from 244 contributors across 1,183 commits. The new std.Io interface supports io_uring on Linux and GCD on macOS. Package management now uses a local zig-pkg directory with compressed cache. If you're considering systems languages for performance-critical agent infrastructure, Zig's simplicity-first approach is worth evaluating against Rust's complexity.
Tools & Developer Experience
Claude Code v2.1.110: push notifications, /focus, and 500K MCP result overrides. The changelog adds a push notification tool for mobile alerts when Remote Control is enabled, /tui for flicker-free fullscreen, and /focus replacing Ctrl+O. The developer-relevant change: MCP tool results can now override truncation up to 500K characters via _meta['anthropic/maxResultSizeChars']. If you're building MCP servers that return database schemas or large structured data, this removes a painful bottleneck.
Opus 4.7 adds /ultrareview for deep architectural analysis. Anthropic's launch post introduces a dedicated review session analyzing architecture, security, performance, and maintainability in one pass. Pro and Max users get three free ultrareviews. This is Claude Code's first structured, multi-axis code review as a built-in command. I'm curious whether three free runs is enough to be useful or just enough to get you addicted to buying more.
code-review-graph claims 6.8x fewer tokens on reviews by pre-indexing project structure. This tool at 10.5K stars builds a persistent local knowledge graph so Claude Code reads only relevant context. Claims up to 49x reduction on daily coding tasks. The "context engineering" category, tools that make AI coding more efficient by controlling what goes into the context window, is becoming its own ecosystem.
ENABLE_PROMPT_CACHING_1H extends cache TTL from 5 minutes to 1 hour. Claude Code v2.1.108 added this env var for API key, Bedrock, Vertex, and Foundry. If you run agentic loops that exceed 5 minutes between turns, you've been paying for repeated cache misses without knowing it. Set this before launching. The old ENABLE_PROMPT_CACHING_1H_BEDROCK still works but is deprecated.
Models
Claude Opus 4.7 launches with high-res vision, task budgets, and xhigh effort. Anthropic released Opus 4.7 with vision up to 3.75 megapixels, a new "xhigh" effort level for finer reasoning control, task budgets for agentic loops, and a new tokenizer. Pricing stays at $5/$25 per million input/output tokens. Available across API, Bedrock, Vertex, and Foundry. GitHub confirmed same-day GA across all platforms, a change from Anthropic's historically staggered rollouts. Anthropic concedes the unreleased Mythos still surpasses it.
Gemini 3.1 Flash TTS: 200+ audio tags, 70+ languages, native multi-speaker. Google released a TTS model with granular vocal control and Elo score of 1,211 on the Artificial Analysis leaderboard. Native multi-speaker dialogue without separate API calls is the headline feature. Available through Gemini API, AI Studio, Vertex AI, and Google Workspace Vids.
Bonsai 1.7B: 290MB model runs in your browser via WebGPU. PrismML's 1-bit model compresses to 290MB and runs entirely client-side in Chrome. 32K context window despite extreme quantization. 850 upvotes on r/LocalLLaMA. For builders shipping LLM features with zero infrastructure cost, this is the first viable path. No server, no API keys, no per-token billing.
Gemma 4: Apache 2.0, 31B dense model matches GPT-4o on structured tasks. Google's Gemma 4 includes a 26B MoE variant running on a single GPU with only 4B active parameters per query. Midjourney reportedly cut monthly AI spend from $2.1M to under $700K by self-hosting. If you're paying API prices for structured tasks, self-hosted Gemma 4 could cut costs 3-10x.
Vibe Coding
AGENTS.md crosses 60,000 open-source repos. Linux Foundation now stewards it. The Agentic AI Foundation (AAIF), with platinum members AWS, Anthropic, Google, Microsoft, and OpenAI, now stewards AGENTS.md alongside MCP and Block's goose. Cursor, Claude Code, Copilot, Devin, and Gemini CLI all read it. For teams running multiple AI coding tools, this is converging as the single instruction file that all agents respect. If your repo doesn't have one, you're leaving agent context on the table.
Gas Town accused of silently using user LLM credits and Git credentials for self-improvement. A GitHub issue with 237 HN points and 114 comments alleges Steve Yegge's Gas Town multi-agent workspace uses 5-10% of users' LLM credits per session and Git credentials to fix bugs and push releases to its own repo. Patrol logs confirm user agents were picking up the maintainer's tracking issues. The README doesn't mention this. The same composability that makes agent skills useful makes them a supply chain attack vector.
Claude Cowork as autonomous life agent: London flat found in 5 days. A developer used Claude Cowork (GA since April 9) to automate apartment hunting: twice daily, the agent searched SpareRoom, OpenRent, Rightmove, and Zoopla, filtered results, wrote personalized outreach, and emailed everything. Found a flat in 5 days (399 upvotes, open-sourced). This is the clearest public example of Claude Cowork being used as a scheduled autonomous agent for real-world tasks beyond coding.
Anti-vibecoding tools gaining traction as community builds guardrails. A self-described non-experienced developer built an anti-vibecoding tool for Claude Code that went viral on LinkedIn and hit 538 upvotes on r/ClaudeAI. It adds verification gates to prevent AI from generating code that looks correct but fails in production. The community is splitting into "move fast with AI" and "verify everything AI produces." I'm firmly in camp two.
Hot Projects & OSS
MemPalace: 47K stars in 11 days, 96.6% retrieval recall with zero API calls. MemPalace is a local-first AI memory system using a spatial metaphor (wings, rooms, drawers) with 29 MCP tools and pluggable ChromaDB backends. 96.6% R@5 on LongMemEval without an LLM, 98.4%+ with hybrid pipelines. The growth rate suggests real demand for agent memory that doesn't require cloud infrastructure.
ByteDance deer-flow: 62K-star SuperAgent harness for long-horizon tasks. deer-flow handles everything from simple lookups to multi-day research with sandboxes, memories, tools, skills, and subagents. At 62K stars it's one of the largest corporate open-source agent frameworks. Worth studying as a reference architecture even if you don't use it directly.
oMLX: Apple Silicon LLM server with tiered KV cache, 10.4K stars. oMLX runs LLMs, vision models, embeddings, and rerankers simultaneously on M-series Macs with automatic memory management. The tiered KV cache splits between RAM (hot) and SSD (cold) for extended context. Native macOS menu bar app via PyObjC, not Electron. Requires macOS 15.0+ and M1+. If you're doing local inference on a Mac, this replaces Ollama with something purpose-built.
Vercel open-agents: reference app for cloud coding agents, +735 stars today. Vercel Labs released an open-source reference implementing a key principle: "the agent is not the sandbox." Agents run separately from execution environments, communicating through tools. Durable multi-step execution, sandbox hibernation, auto GitHub PR creation. Good architecture reference for anyone building agent infrastructure.
SaaS Disruption
Three institutional signals say the SaaSpocalypse just bottomed. In a 72-hour window (April 13-15): Goldman Sachs declared a "value opportunity" in software at decade-low P/E multiples. Thoma Bravo's CEO called SaaS "the most incredible buying opportunities right now." And Oracle surged 13% on $553B AI backlog, triggering a sector rally (Adobe +6%, Salesforce +5%, ServiceNow/Workday +7%). Software P/E compressed to 22.7x, now below consumer staples. The 18-month sell-off appears to have found a floor.
Anthropic launches "Pencil," a design tool targeting Figma and Adobe. PYMNTS reports Anthropic revealed an AI-native design tool for websites, presentations, and landing pages via natural language. Adobe, Wix, and Figma shares fell over 2%. This is Anthropic's second market-moving product in a week, after the Managed Agents launch triggered a $1.4T SaaS sell-off. Model providers eating their own customers' markets is becoming a pattern.
Canva AI 2.0: first design foundation model, biggest update since 2013. Canva's announcement describes the Canva Design Model as the first foundation model built to understand design hierarchy and complexity. Generates fully layered, editable output from a single prompt. Six new workflows: connectors, scheduling, web research, brand intelligence, Sheets AI, and Canva Code 2.0. Rolling out to the first 1M visitors. Between Anthropic's Pencil and Canva AI 2.0, the design tool market just became a three-front war.
OpenAI revenue chief says Microsoft "limited our ability" to reach clients. Amazon partnership is the growth play. CNBC obtained an internal memo from OpenAI's Denise Dresser positioning the $50B Amazon deal as the primary enterprise channel. The partnership includes exclusive third-party cloud distribution through AWS Bedrock and an expanded $100B compute agreement. This is a structural fracture in the Microsoft-OpenAI relationship with direct implications for enterprise vendors choosing cloud stacks.
Policy & Governance
Anthropic's Mythos reaches UK banks. Bank of England convenes briefings within days. Disruption Banking reports Anthropic will grant UK financial institutions controlled Mythos access within a week. The Bank of England, FCA, HM Treasury, and NCSC will brief major banks, insurers, and exchanges. Project Glasswing now includes 40-50 organizations including AWS, Apple, Google, Microsoft, Cisco, and JPMorgan Chase. The defensive-access-first strategy is unprecedented for a frontier AI model.
a16z AI Super PAC surpasses $51M to block state AI regulation. Bloomberg reports Marc Andreessen and Ben Horowitz each contributed $12.5M, joined by OpenAI's Greg Brockman. The money targets electing lawmakers who support a single national AI framework. The PAC faces opposition from Public First Action, backed by Anthropic, which supports stronger safety rules. Major AI companies are now on opposite sides of the regulatory debate, spending tens of millions against each other.
MIT Technology Review: "humans in the loop" in AI warfare is an illusion. Cognitive neuroscientist Uri Maoz argues that AI is generating targets in real time, controlling missile interceptions, and guiding autonomous drone swarms in the Iran conflict. Pentagon guidelines claiming human oversight provides accountability don't match operational reality. Published amid the escalating Anthropic-Pentagon legal battle over military AI deployment.
SDL bans AI-generated code contributions, adds AGENTS.md policy. The SDL multimedia library (part of Steam Runtime, used by thousands of games) formally banned AI/LLM-generated code via PR template and AGENTS.md file, citing licensing uncertainty: AI-generated code may contain snippets from unknown sources incompatible with the Zlib license. The irony of using AGENTS.md, an agent instruction standard, to tell agents not to contribute is almost too perfect.
Skills of the Day
-
Set ENABLE_PROMPT_CACHING_1H before launching Claude Code. If your agentic loops exceed 5 minutes between turns, you've been paying for repeated cache misses silently inflating costs. One environment variable, immediate savings. Works on API key, Bedrock, Vertex, and Foundry.
-
Use GitGuardian's AI Hooks to scan prompts before they reach the model. Install ggshield with hook-based scanning for Claude Code, Cursor, or Copilot. Secrets get blocked before submission, not after. With 28.65M hardcoded secrets found on GitHub in 2025 and AI credential leaks up 81%, this isn't optional anymore.
-
Track your cost per merged PR using Codeburn or manual logging. The Jellyfish data shows a 319x spread ($0.28 to $89.32) in AI coding costs. Most teams have no idea where they fall. Even rough tracking reveals whether your AI workflow is efficient or burning compute on diminishing returns.
-
Pre-index your codebase with code-review-graph before running Claude Code reviews. The persistent knowledge graph maps your project structure so Claude reads only relevant files, claiming 6.8x fewer tokens on reviews. At scale, context engineering tools pay for themselves by reducing both cost and hallucination from irrelevant context.
-
Add an AGENTS.md file to every repo you maintain. With 60,000+ repos adopted and Linux Foundation stewardship, AGENTS.md is the converging standard that Claude Code, Cursor, Copilot, Devin, and Gemini CLI all read. Specify build commands, test patterns, architecture constraints, and off-limits directories. Five minutes of setup, permanent improvement in agent behavior.
-
Test Gemma 4 31B on your structured tasks before committing to API spend. The Apache 2.0 model matches GPT-4o on structured tasks at zero per-token cost when self-hosted. The 26B MoE variant runs on a single GPU with only 4B active parameters. Midjourney cut monthly spend from $2.1M to $700K by switching. Run your actual prompts through it before assuming you need frontier.
-
Use Ruler to sync agent instructions across multiple AI coding tools. If you're running Claude Code AND Cursor or Copilot, a single .ruler/ directory distributes rules to CLAUDE.md, .cursorrules, and AGENTS.md automatically. Eliminates the drift between agent config files that causes inconsistent behavior across tools.
-
Audit your MCP servers for STDIO command injection. Ox Security demonstrated that MCP's STDIO transport executes commands before validation across all official SDKs. If you're running MCP servers on localhost, switch to STDIO with strict input validation or move to the SSE transport. The "expected behavior" response from Anthropic means the protocol won't fix this for you.
-
Try Bonsai 1.7B in WebGPU for client-side LLM features that need zero infrastructure. At 290MB with a 32K context window, PrismML's 1-bit model runs in Chrome with no server. Good for autocomplete, classification, and lightweight generation where you want zero API cost and full privacy. Test it at the Hugging Face Spaces demo before building against it.
-
Model what happens to your AI coding budget if prices double in 12 months. Current plans are subsidized for acquisition. The Pragmatic Engineer's tokenmaxxing analysis shows diminishing returns at high usage, and the $2,000-in-two-days Cursor 3 reports are early warnings. Build your workflow to be cost-resilient: know which AI-assisted tasks give you the highest ROI and which are burning tokens for marginal gains.
How This Newsletter Learns From You
This newsletter has been shaped by 14 pieces of feedback so far. Every reply you send adjusts what I research next.
Your current preferences (from your feedback):
- More builder tools (weight: +3.0)
- More vibe coding (weight: +2.0)
- More agent security (weight: +2.0)
- More strategy (weight: +2.0)
- More skills (weight: +2.0)
- Less valuations and funding (weight: -3.0)
- Less market news (weight: -3.0)
- Less security (weight: -3.0)
Want to change these? Just reply with what you want more or less of.
Quick feedback template (copy, paste, change the numbers):
More: [topic] [topic]
Less: [topic] [topic]
Overall: X/10
Reply to this email — I've processed 14/14 replies so far and every one makes tomorrow's issue better.