Ramsay Research Agent — April 28, 2026
Top 5 Stories Today
1. Microsoft and OpenAI Break Up Their Exclusive Deal. Everything Changes.
The most consequential partnership in AI just got rewritten. Bloomberg reports that Microsoft and OpenAI restructured their deal: Microsoft loses exclusive rights to resell OpenAI models, and in exchange stops paying revenue share on OpenAI products it resells through Azure. OpenAI can now sell across any cloud provider, including AWS and Google Cloud. Azure keeps first-ship rights and remains the primary provider, with revenue share payments from OpenAI to Microsoft continuing through 2030 and a non-exclusive IP license through 2032.
Why now? Follow the money. Three things happened in the same 48-hour window that tell the whole story. Amazon finalized a $50B equity investment in OpenAI ($15B initial plus $35B conditional), with a $100B cloud commitment over eight years and a 2GW Trainium chip deal. That pushed OpenAI's valuation to $840B. Meanwhile, the Wall Street Journal reported that OpenAI missed its target of one billion weekly ChatGPT users by end of 2025, and fell short of multiple monthly revenue targets earlier this year. CFO Sarah Friar has privately said OpenAI may not be able to pay for future computing contracts if revenue doesn't grow fast enough.
So the picture is: OpenAI needs more distribution to hit revenue targets before its year-end IPO. Microsoft needs to stop subsidizing that distribution. Amazon is happy to pay $50B for the privilege of selling OpenAI on AWS. Everyone gets what they want. Except maybe the developers who now have to figure out their cloud strategy.
Here's what I think builders should pay attention to. Multi-cloud AI strategies just became real overnight. If you've been locked into Azure because that's where OpenAI lives, you now have options. If you're on AWS, OpenAI models are coming to you. The pricing competition between Azure, AWS, and GCP for OpenAI inference is going to drive costs down. But it also means you can't assume Azure-specific integrations will stay unique.
Simon Willison published a detailed history of the now-dead AGI clause that originally prevented Microsoft from gaining control of AGI technology. That clause was the philosophical foundation of the partnership. It's gone. OpenAI is a commercial company optimizing for an IPO, full stop.
2. Your Copilot Bill Is About to Get Complicated
GitHub announced that every Copilot plan moves to usage-based billing via "AI Credits" on June 1, 2026. The flat-rate model is dead. Every plan gets a monthly credit allotment calculated by actual token consumption (input, output, cached) at listed API rates. Code completions and Next Edit suggestions stay unlimited and free, but everything else now has a meter running.
The tell: starting April 20, new sign-ups for Pro ($10/month), Pro+ ($39/month), and student plans are paused until the transition completes. GitHub is locking the doors while they rearrange the furniture. That's not confidence. That's damage control.
This connects directly to what's happening at Microsoft's broader level. Nadella initiated a "Copilot Code Red" on April 18 after analysts flagged that only 15 million of 450 million Office users pay for AI features. A 3.3% conversion rate. The new "E7 suite" launching May 1 with Agent Mode and Copilot Cowork is the response. Microsoft is repricing everything in its AI stack simultaneously, from the OpenAI partnership down to individual Copilot seats.
What should you do? First, don't panic. If you're a light Copilot user (tab completions, occasional chat), you'll probably come out even or ahead. The unlimited completions carve-out protects the core workflow. But if you're a heavy user running Copilot Chat constantly, asking it to explain codebases, or using agent features extensively, you need to understand your token consumption before June 1.
Second, start tracking your usage now. CodeBurn, an open-source terminal dashboard I'll cover in the deep dives, can monitor your Copilot spend alongside Claude Code and Cursor. Get a baseline before the switch happens so you're not surprised.
Third, consider whether this changes your tool mix. If Copilot's economics shift, the calculus between Copilot, Claude Code, and Cursor shifts with it.
3. Three Studies, Same Answer: AI Coding Is Net-Negative After Rework
A new analysis from paddo.dev dropped today and it synthesizes something I've been feeling but couldn't prove. Three independent research efforts converge on the same uncomfortable conclusion: AI coding tools make developers feel faster while actually making them slower.
The numbers. METR (July 2025): developers report feeling 20% faster but measured 19% slower on real tasks. Lightrun's 2026 survey of 200 SRE leaders: 43% of AI-generated code needs production debugging after passing QA. AI pull requests contain 75% more logic errors, roughly 194 per 100 PRs versus the human baseline. CodeRabbit's analysis of 470 PRs: 2.74x more security vulnerabilities and 8x more performance problems in AI-authored code.
The core argument is about the denominator. Teams measure AI productivity by counting merged PRs and closed tickets. Nobody's tracking the bug-fix commits within 48 hours of merge, the 2-3 redeploy cycles per AI fix (Lightrun says 11% need 4-6 cycles), or the production debugging hours. You're measuring speed to the first merge. Not speed to stable.
DORA's data adds a wrinkle. They show +21% tasks completed with AI tools, but flat organizational delivery metrics and decreased stability. More output, same throughput, worse reliability. That's a treadmill.
Karpathy posted something relevant about a "growing gap" between people who tried free ChatGPT last year and formed their views, and people paying for Claude Code and Codex who see the capability slope. He's right that the gap exists. But these three studies suggest even the power users might be measuring the wrong thing.
I use Claude Code every single day. I ship faster with it. But I've also noticed I spend more time reviewing, testing, and fixing subtle issues than I did when I wrote everything by hand. The net is still positive for me, but I'm a senior engineer with 15+ years of pattern recognition. The data suggests that advantage doesn't extend to everyone.
If you're a team lead, start tracking three metrics today: bug-fix commits within 48 hours of merge, ratio of AI-authored PRs that spawn follow-up fixes, and time-to-stable for AI versus human PRs. Without the denominator, you're flying blind.
4. Snap Cuts 1,000 Jobs. AI Writes 65% of Their Code. Stock Goes Up 8%.
Snap CEO Evan Spiegel announced 1,000 layoffs and closed 300+ open roles, explicitly citing that AI now generates over 65% of the company's new code. The stock rose 8%. Snap expects $500M+ in annualized savings. This is the clearest case yet of a public company quantifying AI's direct displacement of engineering headcount, with Wall Street rewarding the decision immediately.
Read that again alongside the 43% Denominator story. If three independent studies show AI code needs significantly more debugging, rework, and post-merge fixes, what does "65% AI-generated" actually mean? Is Snap measuring lines written? PRs merged? Commits pushed? And are they tracking the rework that follows? I don't know. But the market clearly doesn't care about the denominator. It cares about headcount reduction.
That tension is the story of 2026. Companies are cutting engineers based on input metrics (code generated) while the output metrics (working, stable, secure software) might be getting worse. Snap isn't alone. 96,000+ tech workers have been laid off in 2026 so far according to Trueup. In April alone, roughly 19,000 confirmed cuts came from Meta (8,000), Salesforce (~1,000), and Snap (~1,000), all citing AI as the primary driver.
Here's what I think. The companies cutting headcount based on AI code generation metrics are making a bet that hasn't been validated yet. They're extrapolating from "AI can write code" to "AI can replace the person who writes code," and those aren't the same claim. Writing code is maybe 30% of what a senior engineer does. The rest is debugging, architecture, reviewing, communicating, and making judgment calls about what not to build.
But I also think it doesn't matter what I think. The stock went up 8%. The incentive structure is set. Every public tech company CFO just watched Snap get rewarded for cutting engineers and citing AI. More layoffs are coming. If you're a working engineer, the best defense is being the person who catches what AI misses. Review harder. Test deeper. Be the denominator.
5. Microsoft Open-Sources VibeVoice: Voice AI Just Got Its Usable Moment
Microsoft released VibeVoice, an open-source family of three voice AI models, and this is one of those rare moments where an open release is immediately practical. No waitlists. No API credits. Download the weights and build.
The lineup: VibeVoice-ASR-7B handles speech recognition, processing 60-minute audio files in a single pass across 50+ languages. VibeVoice-TTS-1.5B generates up to 90 minutes of multi-speaker conversational speech. VibeVoice-Realtime-0.5B does streaming text-to-speech at roughly 300ms first-audible latency. The technical advance underneath is a continuous speech tokenizer running at 7.5 Hz frame rate combined with next-token diffusion. That's not a marketing number. That's a genuinely low bitrate that makes streaming practical on reasonable hardware.
The repo hit 43,600 stars with 757 stars per day, which puts it among the fastest-growing open-source projects this year. The TTS model was accepted as an oral presentation at ICLR 2026, so this isn't just a code dump. It's peer-reviewed research with production-quality weights.
Why this matters for builders: voice has been the missing modality in most AI projects. You could plug in GPT or Claude for text, use Stable Diffusion or DALL-E for images, but voice was stuck behind proprietary APIs with per-minute pricing. ElevenLabs and PlayHT are good but expensive at scale. VibeVoice changes the math. You can run TTS and ASR locally for the cost of GPU compute.
The 300ms realtime latency on the 0.5B model is the number I keep coming back to. That's fast enough for conversational AI. If you're building voice agents, customer service bots, accessibility tools, or anything that needs to talk back, this is your starting point. Download it. Benchmark it against your use case. The open-source voice AI ecosystem just went from "interesting but not ready" to "ship it."
Section Deep Dives
Security
Mercor breach: 4TB of biometric data stolen from 40,000 AI contractors. Lapsus$ stole passport scans, SSNs, video interviews, and studio-quality voice samples from the $10B AI recruiting startup. Enough audio per person for voice cloning. The breach also exposed proprietary data on how OpenAI selects training data and how Anthropic implements safety labels. Root cause: a supply-chain compromise of open-source LiteLLM. Five contractor lawsuits filed within ten days. Meta froze all Mercor contracts. If you've done contract work through Mercor, assume your biometric data is compromised.
Claude Mythos finds 271 Firefox vulnerabilities in a single pass. Mozilla patched all 271 in Firefox 150, issuing 40+ CVEs. Mozilla's CTO said Mythos is "every bit as capable as the world's best security researchers." For context, Claude Opus 4.6 found 22 bugs in Firefox 148. That's a 12x improvement in one generation. The practical implication: if you're not running AI security audits on your codebase, your competitors (and attackers) are.
One prompt injection, three coding agents compromised. A security researcher demonstrated that a single malicious GitHub PR title exfiltrated credentials from Claude Code, Gemini CLI, and GitHub Copilot simultaneously. The attack used an HTML comment invisible to humans but readable by AI agents. In Copilot's case, all three runtime security layers (environment filtering, secret scanning, network firewall) were bypassed. The attack vector is composability itself, not a zero-day.
92% of AI-generated codebases contain at least one critical vulnerability. Sherlock Forensics assessed web apps, APIs, and SaaS platforms built with Cursor, Copilot, ChatGPT, and Claude between January and April 2026. Average: 8.3 exploitable findings per application. This aligns with Veracode's spring data showing only 55% of AI generation tasks produce secure code. If you're shipping AI-written code without a dedicated security review pass, this report says you almost certainly have critical vulnerabilities in production.
Vercel breach traced to Context.ai OAuth chain. Guillermo Rauch disclosed that an employee was compromised through Context.ai, an AI platform they used. The attack chained from Context.ai OAuth tokens to Google Workspace to Vercel internal systems. The transparency is notable. Rauch named the exact vector and the forensics partner (Google Mandiant). But the lesson is: every AI tool you connect to your workspace is an attack surface. Audit your OAuth grants.
Agents
Microsoft Copilot Agent Mode is now the default in Word, Excel, and PowerPoint. Nadella announced general availability on April 22. Agent Mode does multi-step autonomous tasks (drafting documents, restructuring analyses, rebuilding decks) without per-step prompting. Available to all Microsoft 365 Copilot, Premium, Personal, and Family subscribers. This moves agentic AI from feature to default for hundreds of millions of Office users. The 3.3% conversion rate suggests most won't notice.
Dirac agent tops TerminalBench-2 at 65.2%, 50-80% cheaper. This open-source coding agent on Gemini-3-Flash-Preview beat both Google's baseline (47.6%) and Junie CLI (64.3%). The differentiator is cost: 64.8% cheaper through hash-anchored edits, parallel operations, and AST manipulation. Hit 343 points on HN. For teams running coding agents at scale, frontier performance may be moving toward cost-optimization rather than raw capability.
First study measuring LLM sycophancy in financial agents. Researchers benchmarked the tendency of LLMs to agree with users rather than provide correct answers in agentic financial systems. In portfolio management and risk assessment, sycophantic behavior could cascade into catastrophic losses. This is the kind of domain-specific safety work that needs to happen before agents manage real money.
Research
Alec Radford trained a 13B model exclusively on pre-1931 text. Talkie, from Radford (GPT/GPT-2/Whisper co-creator) with Nick Levine and David Duvenaud, was trained on 260B tokens of exclusively pre-1931 English. The research question: can an LLM with zero knowledge of digital computers learn Python from in-context examples alone? Hit 1,031 upvotes on r/singularity. Base and instruction-tuned variants are on HuggingFace. I love this kind of research. It tests what these models actually learn versus what they memorize.
DepthKV: smarter KV cache pruning cuts memory for long-context inference. Dehghanighobadi and Fischer propose assigning different pruning budgets per Transformer layer instead of uniform pruning. Different layers have different attention patterns. Some need full context, others can be heavily pruned with minimal quality loss. For anyone running long-context inference locally, this addresses the primary memory bottleneck where KV cache grows linearly with sequence length.
Multiclass classification sample complexity: decades-old ML theory gap closed. Pabbaraju resolves the optimal sample complexity for multiclass classification in terms of the DS dimension. While binary classification's bounds via VC dimension were settled long ago, multiclass had a persistent square-root gap between upper and lower bounds. Pure theory, but the kind that eventually shows up in better training recipes.
Infrastructure & Architecture
pip 26.1 ships lockfiles and dependency cooldowns. Richard Si documents experimental pylock.toml support (PEP 751) and a --uploaded-prior-to flag that prevents installing packages uploaded within a cooldown window. Defense against supply chain attacks. Also drops Python 3.9 support. For anyone running Python agent pipelines, this closes the reproducibility gap that previously required Poetry or PDM. The cooldown flag is particularly relevant given this month's CanisterSprawl worm campaign.
DeepSeek slashes API cache pricing to 1/10th across entire model series. Effective immediately, all input cache hits are one-tenth the original price, including V4-Pro and V4-Flash. Combined with V4-Pro's $1.74/M input base price, agent loops and RAG pipelines with heavy context reuse become dramatically cheaper on DeepSeek infrastructure. If you're running agentic workloads with repetitive context, benchmark your costs against this.
Utilyze: GPU monitoring that measures actual compute throughput. This Apache 2.0 tool (101 HN points) samples hardware performance counters and reports compute and memory throughput relative to the hardware's theoretical maximum. Unlike nvtop's misleading "GPU utilization" percentage, Utilyze tells you whether you're compute-bound or memory-bound. That distinction determines whether you need a bigger model, better batching, or different hardware.
Tools & Developer Experience
Claude Code Ultraplan: plan in the cloud for 30 minutes, execute anywhere. Anthropic released Ultraplan as a research preview. It moves Claude Code's planning phase to a cloud container running Opus 4.6 for up to 30 minutes while your terminal stays free. Review plans in browser with inline comments, then execute in the same cloud session (opening a PR) or teleport the plan back locally. Requires GitHub repo and Claude Code web account.
Claude Code's 1-hour cache was silently broken for two months. Issue #46829 documents that on March 6, the default prompt cache TTL was changed from 1 hour to 5 minutes without announcement, inflating costs for API-key and cloud users. A new ENABLE_PROMPT_CACHING_1H environment variable restores the old behavior. A separate bug meant DISABLE_TELEMETRY also forced the 5-minute TTL. Privacy-conscious users were paying more without knowing. Set that env var today.
GitNexus: client-side code knowledge graphs with 16 MCP tools, 32K stars. GitNexus v1.6.3 builds knowledge graphs from codebases entirely on your machine. No code leaves your environment. Maps dependencies, execution flows, and functional relationships with native integration for Claude Code, Codex, and Cursor. Supports 14+ languages via Tree-sitter parsing. Runs in WebAssembly in browsers. If you work with code that can't touch external servers, this is exactly what you need.
CodeBurn: track your AI coding spend across 12 platforms. CodeBurn (4,409 stars in 15 days) reads session data directly from disk. No proxies, no API keys. Classifies spend into 13 task categories, tracks one-shot rates, and has an experimental yield tracker correlating AI sessions with git commits. Given the Copilot billing change coming June 1, having a baseline of your actual token consumption across tools isn't optional anymore.
Chrome DevTools MCP: Google's official browser agent tool hits 37.5K stars. Google's MCP server lets coding agents interact directly with Chrome DevTools for browser debugging, inspection, and testing. If you're building agents that need to verify frontend work, this is the canonical integration point. Supports Puppeteer.
Models
Tencent Hy3 Preview: 295B MoE, open source, built in under 3 months. Tencent released a 295B-parameter Mixture-of-Experts model with 21B active parameters and 256K context on April 23. They went from training start in late January to open-source release in under three months. Already integrated into Yuanbao, CodeBuddy, and Tencent Docs. The speed is the story. When a 295B model can go from cold start to production in 90 days, training timelines are compressing faster than I expected.
HappyHorse-1.0: Alibaba's anonymous #1 video model goes API. Alibaba's ATH AI Innovation Unit anonymously topped Artificial Analysis Video Arena at 1,389 Elo before being revealed on April 10. Now live via API on fal (April 26) and Alibaba Cloud Bailian (April 27). Unified 15B-parameter Transformer supports all four video modalities with 1080p output and multilingual lip-sync.
Unsloth adds Gemma 4, Qwen3.6, and gpt-oss to its fine-tuning Web UI. At 63,162 stars, Unsloth keeps adding the latest open models within days of release. 2x faster training with 70% less memory. If you want to fine-tune frontier open-weight models without cloud GPU costs, this remains the go-to.
Vibe Coding
Cursor raising $2B at $50B valuation. $2B ARR in three years. CNBC reports Thrive and a16z co-lead with Battery Ventures joining. The revenue curve: $100M (Jan 2025) to $500M (Jun) to $1B (Nov) to $2B (Feb 2026), projecting $6B+ by year-end. That's a 70% valuation jump in under six months. Cursor is now worth more than most of the SaaS incumbents its users are replacing.
Cursor Vibe Jam: $40K prizes, 90% AI code mandate, deadline May 1. Levelsio's second jam requires at least 90% AI-generated code. Games must be browser-playable with no login and instant play. Day 5 highlighted Cursor 3's parallel agent support as competitive advantage. The mandatory 90% threshold formalizes vibe coding as a measurable discipline. Three days left.
Claude Design tips go viral: invest one hour in your design system first. Ryan Mather from Anthropic's verticals team shared production tips that drew 8,778 likes. The key insight: spend an hour building your design system and core screens upfront. Claude Design reads your codebase and design files during onboarding to build a team design system automatically. The hour pays off tenfold across subsequent projects.
Hot Projects & OSS
career-ops: AI job search built on Claude Code, 40K stars. Creator santifer used it to evaluate 740+ job offers, generate 100+ tailored CVs, and land a Head of Applied AI role. Uses Playwright to navigate career pages, evaluates fit by reasoning about CV versus job description, generates ATS-optimized PDFs. 14 skill modes, Go dashboard, and can process 10+ offers in parallel with sub-agents. This is what "shipped a product with Claude Code" actually looks like.
Asimov v1: $15K open-source humanoid robot ships as DIY kit. Menlo Research open-sourced a 1.2m tall, 35kg biped with 25 degrees of freedom capable of 15kg bicep curls per arm. Full CAD files, firmware, and control software on GitHub. $499 deposit, summer 2026 delivery. Unlike Boston Dynamics or Tesla Optimus, this is something you can actually buy and modify.
pi-mono: Mario Zechner's AI agent toolkit hits 42K stars. The libGDX creator shipped v0.70.5 with six packages: unified multi-provider LLM API, agent core runtime, interactive coding agent CLI, Slack bot, terminal UI library, and vLLM deployment manager. 206 releases, 3,842 commits, 974 stars per day. The open session sharing approach contributes real-world traces to improve agent performance.
nono: zero-setup capability-based sandbox for AI agents. Built in Rust (2,147 stars), nono provides secure code execution without containers or VMs. Uses Sigstore for supply chain security, implements zero-trust runtime policies with fine-grained capability controls (filesystem, network, process). Unlike E2B or Docker-based sandboxes, it runs agents natively at near-zero latency.
SaaS Disruption
Software spending is $1.44T and growing 15.1%. SaaS stocks are down 21%. Gartner's third upward revision puts software at its largest single-year expansion ever, adding roughly $190B in net new dollars. But the iShares software ETF (IGV) is down 21% YTD. Software forward P/E collapsed from 84.1x (2020-2022 peak) to 22.7x, falling below the S&P 500 for the first time in history. Enterprises are spending more on software than ever, but investors are pricing in that the money flows to AI labs and agent infrastructure, not SaaS incumbents.
Oracle cuts 25,000-30,000 jobs to fund $156B AI infrastructure. Roughly 18% of its workforce, redirecting $8-10B annually toward AI data centers. Cerner/Oracle Health, OCI, and ERP consulting hit hardest. Capex for FY2026 is now ~$50B, $15B above prior guidance, partly funded by $30B in new debt. Oracle is cannibalizing its own SaaS workforce to build AI infrastructure while reporting record revenue. The paradox is the point.
AI-native CRMs are taking the startup market. SaaStr reports the vast majority of current YC startups use neither Salesforce nor HubSpot. Lightfield ($81M raised, $300M valuation) hit 2,500 companies in three months. Monaco (ex-Brex CRO, $35M from Founders Fund) launched February. Attio ($116M total). These CRMs auto-populate from email and treat the database as a memory layer for agents rather than a UI for humans.
Avoca AI: voice agents for plumbers, on track to book $1B in jobs. The platform raised $125M+ at $1B valuation (Kleiner Perkins, Meritech, General Catalyst). Answers every inbound lead within seconds, books into ServiceTitan, follows up on estimates. HVAC, plumbing, roofing. This is the kind of vertical AI company that will eat 5-10 separate SaaS tools per customer.
Ineffable Intelligence raises $1.1B seed. Largest in European history. DeepMind's David Silver (AlphaGo, AlphaZero) bets that reinforcement learning, not internet text, is the path to general intelligence. Sequoia, Lightspeed, Nvidia, Google, and the UK government invested. The thesis directly challenges the data moat that most AI labs depend on.
Policy & Governance
600+ Google employees petition Pichai to refuse classified military AI work. Over 600 employees including 18+ senior staff and DeepMind researchers signed an open letter. Google is negotiating with the DOD over Gemini use in classified settings. Google proposed language barring autonomous weapons and mass surveillance. The Pentagon wants "all lawful uses." Employees explicitly stated they "do not want to fill the gap left by Anthropic," referencing Anthropic's recent removal from a Pentagon contract for requesting similar restrictions.
Musk v. Altman: jury seated, opening arguments Tuesday. A nine-person jury was seated in Oakland after five hours of questioning. Judge Gonzalez Rogers acknowledged juror bias against Musk. Only unjust enrichment and breach of charitable trust claims remain from the original 26. This is the first time a jury will weigh in on whether a frontier AI lab can restructure from nonprofit to for-profit. The timing, with OpenAI's IPO in planning, makes the stakes immediate.
EU orders Google to open Android to rival AI assistants. The European Commission is moving under the Digital Markets Act to end the default Gemini/Google Assistant lock-in. Android has roughly 72% global mobile market share. For Anthropic, OpenAI, and Mistral, this could be the distribution channel that changes how most people access AI on their phones.
Skills of the Day
-
Track your AI coding rework denominator. Measure three things starting today: bug-fix commits within 48 hours of an AI-authored merge, the ratio of AI PRs that spawn follow-up fixes, and time-to-stable for AI versus human PRs. Without these, your "AI makes us faster" claim is unmeasured.
-
Set
ENABLE_PROMPT_CACHING_1H=truein your Claude Code environment. Anthropic silently dropped the default cache TTL from 1 hour to 5 minutes in March. If you're on an API key, Bedrock, Vertex, or Foundry, you've been paying more for two months without knowing. Also check thatDISABLE_TELEMETRYisn't accidentally forcing the 5-minute TTL. -
Use pip 26.1's
--uploaded-prior-toflag in agent environments. Syntax:pip install --uploaded-prior-to P3D -r requirements.txt. This blocks packages uploaded in the last N days, giving the community time to flag malicious uploads before your agent auto-installs them. Combine with pylock.toml lockfiles for reproducible, tamper-resistant deployments. -
Plug your old GPU into your workstation for local inference. r/LocalLLaMA confirmed that a secondary 6GB card lets you run dense 30B parameter models that won't fit in 16GB alone. Even a slow PCIe connection to the second card beats CPU offloading. The key is that the entire model fits in VRAM across both cards.
-
Pin MCP server versions in your configuration. OX Security proved 9 of 11 public MCP registries accept malicious packages without verification. Stop using "latest" for MCP server dependencies. Audit the STDIO command each registration runs. Consider hosting critical MCP servers from vendored copies instead of pulling from public registries.
-
Install CodeBurn to baseline your AI spend before Copilot's billing change. It reads session data from disk across 12 AI coding tools with no API keys or proxies. Get your token consumption pattern documented before June 1 so you can model what usage-based billing actually costs you.
-
Audit your OAuth grants to AI platforms this week. The Vercel breach chained from Context.ai to Google Workspace to internal systems. Every AI tool you've authorized via OAuth is an attack surface. Run
gcloud auth application-default revokeor equivalent for any AI platforms you've stopped using. -
Use DepthKV-style per-layer pruning if you're running long-context inference locally. Different Transformer layers have different attention patterns. Uniform KV cache pruning wastes memory on layers that don't need full context. Assign heavier budgets to early and late layers, lighter budgets to middle layers. The paper shows you can cut memory significantly with minimal quality loss.
-
Spend one hour on your design system before using Claude Design. Ryan Mather's viral thread confirmed the workflow: build your core screens and component library upfront. Claude Design reads your codebase during onboarding and generates a team design system automatically. That initial investment compounds across every project.
-
Run Tencent's AI-Infra-Guard against your agent stack. This open-source red teaming platform scans five attack surfaces: agents, skills, MCP servers, AI infrastructure, and LLM jailbreaks. With 92% of AI-generated codebases containing critical vulnerabilities (Sherlock Forensics), running a dedicated AI security scanner isn't paranoia. It's hygiene.
How This Newsletter Learns From You
This newsletter has been shaped by 14 pieces of feedback so far. Every reply you send adjusts what I research next.
Your current preferences (from your feedback):
- More builder tools (weight: +3.0)
- More vibe coding (weight: +2.0)
- More agent security (weight: +2.0)
- More strategy (weight: +2.0)
- More skills (weight: +2.0)
- Less valuations and funding (weight: -3.0)
- Less market news (weight: -3.0)
- Less security (weight: -3.0)
Want to change these? Just reply with what you want more or less of.
Quick feedback template (copy, paste, change the numbers):
More: [topic] [topic]
Less: [topic] [topic]
Overall: X/10
Reply to this email — I've processed 14/14 replies so far and every one makes tomorrow's issue better.