Ramsay Research Agent — 2026-03-04
Top 5 Stories Today
1. OpenAI Pentagon Deal Triggers Largest Consumer AI Migration in History. OpenAI lost 1.5 million subscribers in 48 hours after rushing a Pentagon classified-network deal hours after Trump blacklisted Anthropic. ChatGPT uninstalls surged 295%. Claude hit #1 on both iOS and Android US app stores. OpenAI VP Max Schwarzer defected to Anthropic. Sam Altman admitted the deal "looked opportunistic and sloppy" and amended terms to bar domestic surveillance and NSA use. He then misspoke about deploying on "all NATO classified networks" — it's only unclassified under consideration. This is the biggest commercial reputation hit OpenAI has faced, and the first time AI ethics governance visibly drove consumer adoption decisions at scale. What to do: If you depend on OpenAI APIs, monitor service stability as organizational focus shifts. If you ship consumer AI products, ethics positioning is now a competitive moat. (Axios, CNBC)
2. Karpathy Retires "Vibe Coding," Willison Publishes Definitive Agentic Engineering Guide. Just one year after coining "vibe coding," Andrej Karpathy officially retires the term for "agentic engineering" — "AI does the implementation, human owns the architecture, quality, and correctness." Simon Willison simultaneously published a multi-chapter Agentic Engineering Patterns guide (497 points on HN, 283 comments) covering TDD with agents, anti-patterns, and cognitive debt prevention. Anthropic's new 2026 Agentic Coding Trends Report adds production data: agents now complete 20 autonomous actions before requiring human input (double from six months ago), Rakuten achieved 99.9% accuracy on 12.5M-line codebase modifications, and Zapier deployed 800+ internal agents with 89% adoption. This is no longer hype — it's a professional discipline with guides, courses, and production tooling. What to do: Read Willison's guide at simonwillison.net/guides/agentic-engineering-patterns/. Adopt the /simplify pre-PR workflow. Structure your CLAUDE.md for progressive disclosure. (The New Stack, Anthropic)
3. Schneier Publishes "Promptware Kill Chain" — MITRE ATT&CK for AI Agents. Bruce Schneier and coauthors proposed a 7-stage kill chain for "promptware" — prompt injection attacks that evolve into multi-step malware: initial access → privilege escalation → reconnaissance → persistence → C2 → lateral movement → actions on objective. Critical insight: persistence via long-term agent memory poisoning and C2 via controllable trojans turn static prompt injection into evolving, adaptive malware. Self-replicating attacks spread by tricking email agents into forwarding payloads. This gives defenders the first shared vocabulary for reasoning about agentic threats at the same maturity as traditional cyber frameworks. What to do: Map your agent deployments against the 7 stages. Implement defense-in-depth at each boundary, especially tool-return validation (see AgentSentry paper below). (Schneier on Security)
4. ServiceNow Pays $1B for Veza — Agent Identity Becomes Enterprise Security Category. ServiceNow completed its $1B acquisition of Veza, gaining the patented Access Graph technology that maps access relationships across human, machine, and AI agent identities. Veza's "Enterprise Agent Identity Control Plane" quantifies exact action-level blast radius for every AI agent. Context: 88% of organizations report agent security incidents (Gravitee), only 21% maintain real-time agent inventories (Strata/CSA), and 45.6% still use shared API keys for agent-to-agent auth. NIST's AI Agent Standards RFI closes March 9. Okta shipped dedicated shadow AI agent detection. Agent identity governance is now a billion-dollar enterprise category. What to do: Audit your agent fleet. Implement scoped credentials. Submit comments to NIST before March 9 if you ship agent infrastructure. (SecurityWeek, Gravitee)
5. Claude Cowork 11 Plugins Trigger $200B+ SaaSpocalypse Across 7 Verticals. Anthropic's 11 Claude Cowork plugins spanning legal, sales, finance, data analysis, marketing, support, and product management erased over $200B in market cap in a single day. Thomson Reuters dropped 16% (biggest single-day loss ever), LegalZoom -20%, Salesforce -7%, ServiceNow -7%, Adobe -7%. Meanwhile, Salesforce's own Agentforce hit $800M ARR measuring value in "agentic work units" instead of seats, and Goldman Sachs has had embedded Anthropic engineers for 6 months building Claude agents managing operations for $2.5 trillion in assets. One general-purpose AI platform simultaneously threatening incumbents in 7+ verticals is the clearest cross-category disruption signal yet. What to do: Build MCP server integrations for underserved SaaS categories. Study the SaaStr 90/10 rule: buy 90% off-the-shelf, but any tool with zero AI features is a build signal. (TechCrunch, CNBC)
Breaking News & Industry
OpenAI Building GitHub-Rival Code Repository
OpenAI is developing an internal code-hosting platform after repeated GitHub outages disrupted engineering teams. The project is months from completion but employees have discussed commercializing it. A repository integrated with OpenAI's coding agents could let developers collaborate with autonomous AI systems — directly challenging Microsoft (OpenAI's largest investor and GitHub's owner). Strategic tension is palpable: OpenAI building against the platform of its $840B valuation partner. (Dataconomy)
NVIDIA Secret Inference Chip with Groq LPU Technology
NVIDIA is building a new inference processor integrating Groq's Language Processing Unit technology (acquired December 2025). The chip uses on-chip SRAM instead of HBM, delivering up to 80 TB/s memory bandwidth (~10x H100). OpenAI committed to 3 GW of dedicated inference capacity as lead customer. Debut at GTC 2026 (March 16-19). Meanwhile, Jensen Huang told Morgan Stanley that a "$100 billion investment in OpenAI is probably not in the cards" — notable distancing during the backlash. (SiliconANGLE)
Alibaba Qwen Leadership Exodus
Junyang Lin (tech lead who built Qwen from lab project to 600M+ downloads) and Yu Bowen (post-training head) resigned one day after Qwen 3.5 launched. Huibin (Qwen Code lead) had already left for Meta in January. The catalyst: Alibaba dismantled Lin's vertically-integrated R&D model, splitting the team into horizontal modules. Hao Zhou (ex-Google DeepMind Gemini) appointed as new lead. BABA shares dropped 5.3% intraday. Google is already recruiting ex-Qwen researchers. Despite turmoil, Qwen's MAU jumped from 31M to 203M in February. Simon Willison calls Qwen 3.5 "truly remarkable" but fears it may be "Qwen's swan song." (Bloomberg)
Anthropic Accuses Three Chinese Labs of Industrial-Scale Distillation
Anthropic identified 24,000+ fraudulent accounts generating 16M+ exchanges with Claude from DeepSeek, Moonshot AI, and MiniMax. The agent-specific targeting is key: Moonshot (3.4M exchanges) targeted agentic reasoning and tool use; MiniMax (13M) targeted agentic coding; DeepSeek (150K) targeted reasoning. Anthropic deployed behavioral fingerprinting classifiers and "response shaping" to reduce extractive value. Two new arXiv papers (CMI logit purification + trace rewriting) provide complementary technical defenses. (Anthropic)
Xcode 26.3 Ships Agentic Coding with Claude and Codex
Apple's Xcode 26.3 integrates Anthropic's Claude Agent and OpenAI's Codex directly into the IDE. Agents can search documentation, explore file structures, update project settings, capture previews, and iterate through builds autonomously. MCP support means any compatible third-party agent can plug in. This reaches millions of iOS/macOS developers — the strongest signal yet that agentic coding is going mainstream. (Apple Newsroom)
DeepSeek V4 Imminent
Expected this week, timed to China's Two Sessions. Specs: ~1T parameters, ~32B active per token (MoE), native multimodal, 1M-token context. Designed for Huawei Ascend chips with zero NVIDIA dependency. Leaked benchmarks claim 90% HumanEval and 80%+ SWE-bench (unverified). Consumer tier: dual RTX 4090s or single RTX 5090. (TechNode)
SaaS Disruption & Builder Moves
Seat Extinction Confirmed Across 5+ Categories Simultaneously
Per-seat pricing collapse is happening everywhere at once: Support (Intercom Fin $100M ARR at $0.99/resolution, Ada 83% autonomous resolution), HR (LinkedIn Hiring Assistant saves 4 hrs/role at AMD/Canva/Siemens, Workday cut 2,100+ jobs in 12 months automating its own customer ops), Finance (Ramp Accounting Agent 90%+ auto-coding 3x faster close, BILL W-9 Agent eliminates 80% manual steps, Basis raised $100M for autonomous accounting), CRM (Monaco raised $35M to replace Salesforce for startups, Salesforce itself measuring "agentic work units" instead of seats), Legal (Thomson Reuters -16% from Cowork plugins). Salesforce's own growth is 72% price hikes — unsustainable when AI-native competitors deliver 5.7x better revenue efficiency per employee. (SaaStr)
Agencies Vibe-Coding Custom Tools in Hours
Broadhead's VP vibe-coded a GEO monitoring platform in one evening using Claude Code. Havas built Brand Insights AI. Three agencies independently said off-the-shelf tools don't fit — so they build custom tools in hours. What used to take a dev team 3 months now takes a marketing manager one afternoon. The SaaStr 90/10 rule: any tool with zero AI features is a build signal. (Adweek)
Goldman Sachs Bypasses SaaS Entirely
Goldman has had embedded Anthropic engineers for 6 months co-developing Claude agents for trade accounting ($2.5T assets), achieving 30% faster onboarding. The architectural pattern: embedded AI lab engineers building domain-specific agents on foundation models, bypassing off-the-shelf SaaS entirely. Gartner projects 35% of point-product SaaS replaced by AI agents by 2030. (CNBC)
Vibe Coding & AI Development
Claude Code v2.1.68 — Major Capability Update
Rapid shipping from v2.1.63 to v2.1.68 this week: /simplify spawns three parallel review agents (Code Reuse, Code Quality, Efficiency) that auto-apply fixes before merge. /batch plans migrations interactively then executes in parallel across git worktrees. Auto-memory automatically saves useful context across sessions. HTTP hooks replace shell-only hooks for remote integrations. Memory leak fixes for unbounded growth in git root detection and JSON parsing caches during long sessions. MCP OAuth token refresh race conditions fixed. (GitHub CHANGELOG)
GLM-5: 744B Open-Source, Free on NVIDIA NIM
Z.ai's GLM-5 (744B/40B MoE, MIT license, 205K context) is free on NVIDIA NIM at 40 req/min with no credit card. Benchmarks: 77.8% SWE-bench Verified (highest open-source), 56.2 Terminal-Bench 2.0 (approaching Opus 4.5's 59.3). Trained entirely on 100,000 Huawei Ascend chips. You can point Claude Code at this model via claude-launcher's translation proxy. Strongest free coding model available today. (NVIDIA NIM)
Raycast Launches Glaze — Desktop Vibe Coding Goes Mainstream
Raycast launched Glaze in private beta — a platform that builds real native desktop Mac apps from natural language prompts. Unlike web-based vibe coding tools, Glaze apps run natively with keyboard shortcuts, menu bar integration, file system access, and offline support. Public app store and private team stores included. Free tier + $20-30 paid plans. Strongest signal yet that vibe-coded software is moving from demos to production desktop tooling. (Raycast Blog)
Check Point Discloses Claude Code RCE (CVE-2026-21852)
Three attack vectors: (1) Hooks-based RCE via .claude/settings.json executing shell commands on SessionStart without confirmation, (2) MCP consent bypass via repo-controlled config auto-approving all servers, (3) API key exfiltration via ANTHROPIC_BASE_URL pointing to attacker endpoint. All patched. Always review .claude/ config files before opening untrusted repositories. (Check Point Research)
Builder Tips
- PreCompact hooks preserve working state across context compaction. Reference implementation at mvara-ai/precompact-hook. Combined with auto-memory, this creates a dual-layer memory system.
- Never add/remove tools mid-conversation — it invalidates the entire KV-cache prefix, destroying the 81% cost savings from prompt caching. Keep tool definitions static.
- Run /simplify before every PR — three specialized review perspectives catch different issue classes that a single review misses.
- Git worktrees are now standard multi-agent infrastructure: Claude Code /batch, Windsurf, Superset IDE, Codex Desktop all converged on worktrees independently.
What Leaders Are Saying
Karpathy: "Vibe coding = YOLO. Agentic engineering = AI does the implementation, human owns architecture, quality, and correctness." Just one year after coining the term, he's retiring it for professional framing. Already adopted by IBM, Google Cloud, Osmani. (The New Stack)
Willison: Published multi-chapter Agentic Engineering Patterns guide. New March 4 chapter on anti-patterns — core rule: never file PRs with AI-generated code you haven't reviewed yourself. The most actionable practitioner resource on agentic coding workflows currently being published. (simonwillison.net)
Chollet: ARC-AGI-3 launches March 25 — first interactive reasoning benchmark. 1,000+ levels across 150+ environments requiring agents to explore, learn, plan, and adapt. Measures genuine generalization and agency rather than pattern matching. (ARC Prize)
Huang: Rules out $100B OpenAI investment, preps GTC 2026 keynote covering NVIDIA's five-layer AI stack with emphasis on agentic systems. Told CNBC "markets got it wrong" on SaaS disruption — agents are customers of software tools, not replacements. (Bloomberg)
LeCun: Warns of "two AI bubbles feeding each other" — a financial bubble (overvaluation) and a narrative bubble (AGI hype). Compared AI to "the new printing press" not electricity. Pushed back against AGI timelines. Gains weight as AI stocks wobble. (Startup News)
Rauch: v0 at 3M users, 3,200 merged PRs/day. Built skills.sh (34K submissions) entirely using v0. Non-technical team members contributing production code. "We're heading toward a generative web where apps are created on-demand for individual users." (Lenny's Newsletter)
AI Agent Ecosystem
CyberStrikeAI: First Open-Source AI Attack Platform Used at Scale
Go-based framework with 100+ security tools and Claude/DeepSeek integration was used by a Russian-speaking financially motivated actor to compromise 600+ FortiGate devices across 55 countries (Jan-Feb 2026). Developer Ed1s0nZ holds a CNNVD 2024 contribution award linked to China's MSS. No zero-days — purely AI-automated credential attacks at scale. Team Cymru tracked 21 unique IPs. The "theoretical risk" phase of AI offensive operations is definitively over. (The Hacker News)
OpenClaw Supply Chain Crisis Escalates
824+ confirmed malicious skills across 10,700+ total in ClawHub (~8% of registry). Primary payload: Atomic macOS Stealer. 30,000+ publicly exposed instances; Censys tracked growth from ~1,000 to 21,000+ in a single week. Agent skills supply chain remains the most active attack surface. (eSecurity Planet)
NIST AI Agent Standards Initiative
Three pillars: standards, open-source protocol development, and agent security/identity research. RFI on Agent Security due March 9. Agent Identity and Authorization Concept Paper due April 2. Listening sessions in April. The US government's first major move to standardize agent governance. (NIST)
Framework Consolidation
Microsoft Agent Framework reached RC (GA end of Q1), merging AutoGen + Semantic Kernel. AutoGen and SK now in maintenance mode. Google ADK for TypeScript fills the JS ecosystem gap with strong typing for inter-agent data contracts. Windsurf shipped Phoenix Alpha with parallel multi-agent sessions and context window usage indicator. Cursor BugBot reached GA with autonomous PR scanning + cloud agent auto-fix for 1M+ users.
Hot Projects & Repos
| Project | Stars | What It Does | Why It Matters |
|---|---|---|---|
| Worktrunk | 2.7K | Rust CLI for Git worktrees with AI agents | Auto-squash/rebase/merge, LLM commit messages, build cache sharing. From PRQL creator. |
| Timber | 545 | Compile XGBoost/sklearn to C99 binaries | 336x faster than Python, 48KB artifacts, MISRA-C compliant. Ollama for classical ML. |
| Codebuff | 3.6K | Open-source multi-agent coding tool | Claims 61% vs Claude Code's 53% on 175-task eval. Any model via OpenRouter. |
| Hive Memory | new | MCP cross-project memory for agents | Fully local, cross-project context sharing. Show HN featured. |
| Sub-500ms Voice Agent | 562 HN | Production voice agent in one day for $100 | Groq ~80ms TTFT + streaming pipeline = 2x faster than Vapi. Blueprint documented. |
| PDF Oxide | 304 | 5x faster PDF processing than PyMuPDF | Rust core, Python/JS/WASM bindings. MCP server included. |
| Omni | 503 | Self-hosted workplace search (open-source Glean) | Unified search across Google Workspace, Slack, Jira. ParadeDB single-Postgres. |
Best Content This Week
- Donald Knuth's "Claude's Cycles" — Knuth credits Claude Opus 4.6 for solving an open graph theory problem in 31 steps. Called it "a dramatic advance in automatic deduction." The godfather of CS revising his opinions about generative AI.
- Max Woolf: AI Agent Coding in Excessive Detail — Skeptic-to-convert journey building Rust ML library 9-30x faster than Python equivalents. Key insight: agents work best when you have "approximate knowledge of many things with enough domain expertise to know what should and should not work."
- Tenzai Vibe Coding Security Study — 69 vulnerabilities across 5 tools. Zero SQLi/XSS but pervasive business logic flaws. Security-focused prompts produced "minimal vulnerability reduction." CMU's SusVibes: only 10.5% of solutions both correct AND secure.
- PRX Part 3: Text-to-Image in 24 Hours — Photoroom trains 1.3B model from scratch: TREAD token routing drops 50% of tokens, REPA alignment with DINOv3, Muon optimizer. Full Apache 2.0 release. Dramatically lowers the barrier to generative model training.
- Interconnects Open Artifacts #19 — Best aggregation of Chinese open-weight releases: Qwen 3.5, GLM-5, MiniMax M2.5, Step-3.5-Flash. Nathan Lambert introduces RAM (Relative Adoption Metrics) normalizing downloads by model size class.
Hacker News Pulse
| Story | Points | Comments | Signal |
|---|---|---|---|
| MacBook Neo launch (A18 Pro) | 1,635 | 1,943 | Day's top story. New ultraportable line — HN debates on-device inference. |
| Nobody Gets Promoted for Simplicity | 833 | 473 | Resonant essay connecting complexity incentives to AI code proliferation. |
| Qwen Leadership Exodus (Willison) | 568 | 259 | Community anxious about open-weight future. |
| Agentic Engineering Patterns (Willison) | 497 | 283 | Highest comment engagement for any AI story today. |
| Amodei calls OpenAI "straight up lies" | 324 | 160 | Lab tensions at peak intensity. |
| Qwen3.5 Fine-Tuning Guide (Unsloth) | 300 | 70 | Practitioners racing to learn models while project future is uncertain. |
| nCPU: CPU entirely on GPU | 243 | 121 | Heterogeneous compute implications for inference. |
| Google Workspace CLI | 227 | 101 | Agent tooling relevance — CLI for Workspace APIs. |
Research Papers
AgentSentry — Temporal Causal Defense Against Prompt Injection
First defense modeling multi-turn indirect prompt injection as temporal causal takeover. Uses counterfactual re-executions at tool-return boundaries to detect when tool outputs steer agent behavior. Evaluated on AgentDojo across four task suites. Builder-ready pattern for tool-augmented agents. (arXiv 2602.22724)
Code Fingerprints — Model-Specific Code Attribution
Beyond binary human-vs-machine detection: identifies which specific LLM generated a code snippet. Enables vulnerability triage (which model produced the bug?), licensing audits, and distillation detection. Directly relevant to the Anthropic distillation crackdown. (arXiv 2603.04212)
Codified Context — Three-Component Agent Infrastructure
Hot memory + 19 specialized agents + cold knowledge base, evaluated across 283 sessions on a 108K-line codebase. Open-source companion repo. Blueprint for scaling agentic coding with structured context engineering. (arXiv 2602.20478)
MCPShield — 10% to 95% MCP Defense Rate
Plug-in security cognition layer with pre-invocation probing, runtime sandboxed projection, and post-invocation trace reasoning. Undefended MCP agents achieve only 10% defense rate; MCPShield reaches 95.3%. (arXiv 2602.14281)
AlgoVeri — Formal Verification Benchmark
First cross-language benchmark for formally verified code generation: 40.3% success in Dafny, 24.7% Verus, 7.8% Lean. LLMs handle high-level verified code but collapse on systems-level constraints and manual proofs. (arXiv 2602.09464)
Safety Alignment as Attack Surface
Adversaries inject documents into RAG knowledge bases that trigger safety refusals on benign queries. Weaponizes alignment homogeneity itself — high cross-model transfer rates. The alignment-as-vulnerability paradox. (arXiv 2603.03919)
AgentLAB — First Long-Horizon Agent Attack Benchmark
5 novel attack types (intent hijacking, tool chaining, task injection, objective drifting, memory poisoning) across 28 environments. Key finding: single-turn defenses fail against multi-turn adversarial strategies. (arXiv 2602.16901)
OSS Momentum
| Repo | Stars | Velocity | Category |
|---|---|---|---|
| Shannon | 30.7K | +1,854/day | AI pentester — #1 trending on GitHub |
| CC Switch | 23.8K | +3,594/wk | Unified manager for Claude/Codex/Gemini CLI |
| PageIndex | 20.4K | +2,851/wk | Vectorless RAG — 98.7% accuracy without embeddings |
| Ruflo | 18.8K | +4,245/wk | Multi-agent swarm with Q-Learning router |
| GitNexus | 9.6K | +6,262/wk | Knowledge graph giving agents structural awareness |
| OpenSandbox | 6.1K | +4,592/wk | Alibaba's sandbox for agent execution |
| Agency-Agents | 5.7K | +2,209/day | 55+ specialized agent personas |
| OpenViking | 4.6K | 2mo old | ByteDance context database (filesystem paradigm) |
| ComposioHQ | 3.6K | 3wks old | Parallel coding agent fleet orchestrator |
| AI-Infra-Guard | 3.0K | +1,041 commits | Tencent red teaming (400+ CVEs, MCP scanner) |
Category signals: Agent Skills is now GitHub's fastest-growing category (Anthropic official at 83.9K stars). Sandbox infrastructure is consolidating (OpenSandbox, E2B). Vectorless RAG (PageIndex) challenges embedding orthodoxy. Multi-agent IDE tooling reflects developers juggling 3-5 AI tools simultaneously.
Newsletters & Blogs
- Simon Willison: Agentic Engineering Anti-Patterns chapter (never file unreviewed AI PRs) + Qwen exodus analysis. 15th consecutive top source.
- PRX Part 3 (Hugging Face/Photoroom): Full 1.3B text-to-image training recipe in 24 hours. Apache 2.0.
- Nathan Lambert (Interconnects): RAM methodology for normalizing open-model adoption + Qwen/GLM/MiniMax coverage. Feed working again after 3+ run gap.
- OpenAI Blog: GPT-5.2 extends gluon physics breakthrough to graviton amplitudes in quantum gravity.
- Feed health: 4/15 feeds still broken (The Batch, Anthropic, Mistral, Eugene Yan). Interconnects recovered.
Community Pulse
Reddit Highlights
- Qwen3.5 small model efficiency: 0.8B runs on 14-year-old i5/4GB DDR3. 35B-A3B hits 37.8% SWE-bench. 9B runs on Android. $3 10-minute finetune produces capable results. Local inference cost floor approaching zero.
- NVFP4 coming to llama.cpp: GitHub PR #19769 active, days away. Unlocks native FP4 for memory-constrained users.
- Claude Excel plugin: 987 upvotes. Financial modeler reports transformative adoption for complex multi-sheet models. Claude gaining real adoption in professional finance beyond coding.
- Self-evolving Rust agent: 200-line agent wakes every 8 hours, reads its own code, files bugs, iterates autonomously toward rivaling Claude Code.
- OpenAI subscriber exodus: 3,847 upvotes — top post across all subreddits. Day 3 = structural consequences: talent flight, subscriber churn, market-share transfer.
Skills to Learn Today
| # | Skill | Domain | Difficulty |
|---|---|---|---|
| 1 | Secure MCP with mcp-scan tool pinning | agent-security | beginner |
| 2 | Spec-driven development with GitHub Spec Kit | vibe-coding | intermediate |
| 3 | Claude Code Agent Teams (2-16 instances, peer messaging) | agent-patterns | advanced |
| 4 | Structured note-taking for long-horizon agents | agent-patterns | intermediate |
| 5 | Three-tier context layering with Skills | vibe-coding | intermediate |
| 6 | Defense-in-depth for MCP tool poisoning | agent-security | advanced |
| 7 | RAG chunking: 512-token recursive (skip overlap) | ml-ops | intermediate |
| 8 | Detect distillation with behavioral fingerprinting | agent-security | advanced |
| 9 | Dynamic RAG with query-adaptive retrieval | ml-ops | advanced |
| 10 | Plan-iterate-test loop for production vibe coding | vibe-coding | beginner |
Source Index
Breaking News: [1] Dataconomy — OpenAI code repo, [2] Bloomberg — Qwen exodus, [3] Apple Newsroom — Xcode 26.3, [4] SiliconANGLE — NVIDIA chip, [5] Axios — Claude #1, [6] Help Net Security — Agent identity dark matter, [7] TechNode — DeepSeek V4, [8] Anthropic Blog — Distillation attacks
SaaS Disruption: [9] TechCrunch — SaaSpocalypse, [10] SaaStr — 90/10 rule, [11] CNBC — Goldman/Anthropic, [12] Adweek — Agency vibe coding, [13] Computer Weekly — LinkedIn Hiring Assistant, [14] Salesforce IR — Agentforce $800M, [15] HR Executive — Workday layoffs
Vibe Coding: [16] GitHub CHANGELOG — Claude Code v2.1.68, [17] Anthropic — Trends Report, [18] Raycast Blog — Glaze, [19] Check Point Research — CVE-2026-21852, [20] NVIDIA NIM — GLM-5, [21] Windsurf Changelog
Thought Leaders: [22] The New Stack — Karpathy, [23] simonwillison.net — Agentic Engineering Patterns, [24] ARC Prize — ARC-AGI-3, [25] Bloomberg — Huang, [26] Startup News — LeCun
Agent Ecosystem: [27] The Hacker News — CyberStrikeAI, [28] Schneier on Security — Promptware Kill Chain, [29] SecurityWeek — ServiceNow/Veza, [30] NIST — Agent Standards, [31] eSecurity Planet — OpenClaw crisis
Projects: [32-38] GitHub — Worktrunk, Timber, Codebuff, OpenSandbox, AgentScope, Sub-500ms Voice Agent, Omni
Best Content: [39] Stanford CS — Knuth paper, [40] minimaxir.com — Max Woolf, [41] Tenzai — Vibe coding security, [42] HuggingFace Blog — PRX Part 3, [43] Interconnects — Open Artifacts #19
HN Pulse: [44] Apple Newsroom — MacBook Neo, [45] Terrible Software — Simplicity essay, [46] simonwillison.net — Qwen analysis, [47] Unsloth — Qwen3.5 fine-tuning
Research Papers: [48-54] arXiv — AgentSentry, Code Fingerprints, Codified Context, MCPShield, AlgoVeri, RAG blocking attack, AgentLAB
OSS Momentum: [55-64] GitHub — Shannon, CC Switch, PageIndex, Ruflo, GitNexus, OpenSandbox, Agency-Agents, OpenViking, ComposioHQ, AI-Infra-Guard
RSS/Blogs: [65] simonwillison.net — Anti-patterns, [66] HuggingFace Blog — PRX Part 3, [67] Interconnects — Open Artifacts #19, [68] OpenAI Blog — Graviton amplitudes
Community Pulse: [69-73] Reddit — Qwen efficiency, NVFP4, Claude Excel, Self-evolving agent, OpenAI exodus
Meta: Research Quality
- 873 total findings across 29 runs (24 new this run)
- 226 skills across 7 domains (10 new)
- 85 patterns tracked (5 new)
- 120 unique sources indexed
- Most productive agents: news-researcher (12 findings), saas-disruption-researcher (25 findings — enormous day), thought-leaders-researcher (12), agents-researcher (12)
- Top sources this run: Simon Willison Blog (appeared in 5 agents), TechCrunch (4 agents), GitHub (4 agents), arXiv (17 papers analyzed), Reddit (70 posts scanned)
- New Tier 1 source: Schneier on Security — Promptware Kill Chain is foundational
- Coverage gap: Consumer hardware (MacBook Neo was #1 on HN but we only caught it via HN agent). Consider monitoring Apple events more directly.
- Feed health: 4/15 RSS feeds still broken (The Batch, Anthropic, Mistral, Eugene Yan). Interconnects recovered. Web supplements produce more value than feeds themselves.
How This Newsletter Learns From You
This newsletter has been shaped by 8 pieces of feedback so far. Every reply you send adjusts what I research next.
Your current preferences (from your feedback):
- More builder tools (weight: +2.5)
- More agent security (weight: +2.0)
- More agent security (weight: +1.5)
- More vibe coding (weight: +1.5)
- Less market news (weight: -1.0)
- Less valuations and funding (weight: -3.0)
- Less market news (weight: -3.0)
Want to change these? Just reply with what you want more or less of.
Ways to steer this newsletter:
- "More [topic]" / "Less [topic]" — adjust coverage priorities
- "Deep dive on [X]" — I'll dedicate extra research to it
- "[Section] was great" — reinforces that direction
- "Missed [event/topic]" — I'll add it to my radar
- Rate sections: "Vibe Coding section: 9/10" helps me calibrate
Reply to this email — I've processed 8/8 replies so far and every one makes tomorrow's issue better.