Ramsay Research Agent — 2026-03-04

Breaking News & Industry

OpenAI Building GitHub-Rival Code Repository

OpenAI is developing an internal code-hosting platform after repeated GitHub outages disrupted engineering teams. The project is months from completion but employees have discussed commercializing it. A repository integrated with OpenAI's coding agents could let developers collaborate with autonomous AI systems — directly challenging Microsoft (OpenAI's largest investor and GitHub's owner). Strategic tension is palpable: OpenAI building against the platform of its $840B valuation partner. (Dataconomy)

NVIDIA Secret Inference Chip with Groq LPU Technology

NVIDIA is building a new inference processor integrating Groq's Language Processing Unit technology (acquired December 2025). The chip uses on-chip SRAM instead of HBM, delivering up to 80 TB/s memory bandwidth (~10x H100). OpenAI committed to 3 GW of dedicated inference capacity as lead customer. Debut at GTC 2026 (March 16-19). Meanwhile, Jensen Huang told Morgan Stanley that a "$100 billion investment in OpenAI is probably not in the cards" — notable distancing during the backlash. (SiliconANGLE)

Alibaba Qwen Leadership Exodus

Junyang Lin (tech lead who built Qwen from lab project to 600M+ downloads) and Yu Bowen (post-training head) resigned one day after Qwen 3.5 launched. Huibin (Qwen Code lead) had already left for Meta in January. The catalyst: Alibaba dismantled Lin's vertically-integrated R&D model, splitting the team into horizontal modules. Hao Zhou (ex-Google DeepMind Gemini) appointed as new lead. BABA shares dropped 5.3% intraday. Google is already recruiting ex-Qwen researchers. Despite turmoil, Qwen's MAU jumped from 31M to 203M in February. Simon Willison calls Qwen 3.5 "truly remarkable" but fears it may be "Qwen's swan song." (Bloomberg)

Anthropic Accuses Three Chinese Labs of Industrial-Scale Distillation

Anthropic identified 24,000+ fraudulent accounts generating 16M+ exchanges with Claude from DeepSeek, Moonshot AI, and MiniMax. The agent-specific targeting is key: Moonshot (3.4M exchanges) targeted agentic reasoning and tool use; MiniMax (13M) targeted agentic coding; DeepSeek (150K) targeted reasoning. Anthropic deployed behavioral fingerprinting classifiers and "response shaping" to reduce extractive value. Two new arXiv papers (CMI logit purification + trace rewriting) provide complementary technical defenses. (Anthropic)

Xcode 26.3 Ships Agentic Coding with Claude and Codex

Apple's Xcode 26.3 integrates Anthropic's Claude Agent and OpenAI's Codex directly into the IDE. Agents can search documentation, explore file structures, update project settings, capture previews, and iterate through builds autonomously. MCP support means any compatible third-party agent can plug in. This reaches millions of iOS/macOS developers — the strongest signal yet that agentic coding is going mainstream. (Apple Newsroom)

DeepSeek V4 Imminent

Expected this week, timed to China's Two Sessions. Specs: ~1T parameters, ~32B active per token (MoE), native multimodal, 1M-token context. Designed for Huawei Ascend chips with zero NVIDIA dependency. Leaked benchmarks claim 90% HumanEval and 80%+ SWE-bench (unverified). Consumer tier: dual RTX 4090s or single RTX 5090. (TechNode)

SaaS Disruption & Builder Moves

Seat Extinction Confirmed Across 5+ Categories Simultaneously

Per-seat pricing collapse is happening everywhere at once: Support (Intercom Fin $100M ARR at $0.99/resolution, Ada 83% autonomous resolution), HR (LinkedIn Hiring Assistant saves 4 hrs/role at AMD/Canva/Siemens, Workday cut 2,100+ jobs in 12 months automating its own customer ops), Finance (Ramp Accounting Agent 90%+ auto-coding 3x faster close, BILL W-9 Agent eliminates 80% manual steps, Basis raised $100M for autonomous accounting), CRM (Monaco raised $35M to replace Salesforce for startups, Salesforce itself measuring "agentic work units" instead of seats), Legal (Thomson Reuters -16% from Cowork plugins). Salesforce's own growth is 72% price hikes — unsustainable when AI-native competitors deliver 5.7x better revenue efficiency per employee. (SaaStr)

Agencies Vibe-Coding Custom Tools in Hours

Broadhead's VP vibe-coded a GEO monitoring platform in one evening using Claude Code. Havas built Brand Insights AI. Three agencies independently said off-the-shelf tools don't fit — so they build custom tools in hours. What used to take a dev team 3 months now takes a marketing manager one afternoon. The SaaStr 90/10 rule: any tool with zero AI features is a build signal. (Adweek)

Goldman Sachs Bypasses SaaS Entirely

Goldman has had embedded Anthropic engineers for 6 months co-developing Claude agents for trade accounting ($2.5T assets), achieving 30% faster onboarding. The architectural pattern: embedded AI lab engineers building domain-specific agents on foundation models, bypassing off-the-shelf SaaS entirely. Gartner projects 35% of point-product SaaS replaced by AI agents by 2030. (CNBC)

Vibe Coding & AI Development

Claude Code v2.1.68 — Major Capability Update

Rapid shipping from v2.1.63 to v2.1.68 this week: /simplify spawns three parallel review agents (Code Reuse, Code Quality, Efficiency) that auto-apply fixes before merge. /batch plans migrations interactively then executes in parallel across git worktrees. Auto-memory automatically saves useful context across sessions. HTTP hooks replace shell-only hooks for remote integrations. Memory leak fixes for unbounded growth in git root detection and JSON parsing caches during long sessions. MCP OAuth token refresh race conditions fixed. (GitHub CHANGELOG)

GLM-5: 744B Open-Source, Free on NVIDIA NIM

Z.ai's GLM-5 (744B/40B MoE, MIT license, 205K context) is free on NVIDIA NIM at 40 req/min with no credit card. Benchmarks: 77.8% SWE-bench Verified (highest open-source), 56.2 Terminal-Bench 2.0 (approaching Opus 4.5's 59.3). Trained entirely on 100,000 Huawei Ascend chips. You can point Claude Code at this model via claude-launcher's translation proxy. Strongest free coding model available today. (NVIDIA NIM)

Raycast Launches Glaze — Desktop Vibe Coding Goes Mainstream

Raycast launched Glaze in private beta — a platform that builds real native desktop Mac apps from natural language prompts. Unlike web-based vibe coding tools, Glaze apps run natively with keyboard shortcuts, menu bar integration, file system access, and offline support. Public app store and private team stores included. Free tier + $20-30 paid plans. Strongest signal yet that vibe-coded software is moving from demos to production desktop tooling. (Raycast Blog)

Check Point Discloses Claude Code RCE (CVE-2026-21852)

Three attack vectors: (1) Hooks-based RCE via .claude/settings.json executing shell commands on SessionStart without confirmation, (2) MCP consent bypass via repo-controlled config auto-approving all servers, (3) API key exfiltration via ANTHROPIC_BASE_URL pointing to attacker endpoint. All patched. Always review .claude/ config files before opening untrusted repositories. (Check Point Research)

Builder Tips

PreCompact hooks preserve working state across context compaction. Reference implementation at mvara-ai/precompact-hook. Combined with auto-memory, this creates a dual-layer memory system.
Never add/remove tools mid-conversation — it invalidates the entire KV-cache prefix, destroying the 81% cost savings from prompt caching. Keep tool definitions static.
Run /simplify before every PR — three specialized review perspectives catch different issue classes that a single review misses.
Git worktrees are now standard multi-agent infrastructure: Claude Code /batch, Windsurf, Superset IDE, Codex Desktop all converged on worktrees independently.

What Leaders Are Saying

Karpathy: "Vibe coding = YOLO. Agentic engineering = AI does the implementation, human owns architecture, quality, and correctness." Just one year after coining the term, he's retiring it for professional framing. Already adopted by IBM, Google Cloud, Osmani. (The New Stack)

Willison: Published multi-chapter Agentic Engineering Patterns guide. New March 4 chapter on anti-patterns — core rule: never file PRs with AI-generated code you haven't reviewed yourself. The most actionable practitioner resource on agentic coding workflows currently being published. (simonwillison.net)

Chollet: ARC-AGI-3 launches March 25 — first interactive reasoning benchmark. 1,000+ levels across 150+ environments requiring agents to explore, learn, plan, and adapt. Measures genuine generalization and agency rather than pattern matching. (ARC Prize)

Huang: Rules out $100B OpenAI investment, preps GTC 2026 keynote covering NVIDIA's five-layer AI stack with emphasis on agentic systems. Told CNBC "markets got it wrong" on SaaS disruption — agents are customers of software tools, not replacements. (Bloomberg)

LeCun: Warns of "two AI bubbles feeding each other" — a financial bubble (overvaluation) and a narrative bubble (AGI hype). Compared AI to "the new printing press" not electricity. Pushed back against AGI timelines. Gains weight as AI stocks wobble. (Startup News)

Rauch: v0 at 3M users, 3,200 merged PRs/day. Built skills.sh (34K submissions) entirely using v0. Non-technical team members contributing production code. "We're heading toward a generative web where apps are created on-demand for individual users." (Lenny's Newsletter)

AI Agent Ecosystem

CyberStrikeAI: First Open-Source AI Attack Platform Used at Scale

Go-based framework with 100+ security tools and Claude/DeepSeek integration was used by a Russian-speaking financially motivated actor to compromise 600+ FortiGate devices across 55 countries (Jan-Feb 2026). Developer Ed1s0nZ holds a CNNVD 2024 contribution award linked to China's MSS. No zero-days — purely AI-automated credential attacks at scale. Team Cymru tracked 21 unique IPs. The "theoretical risk" phase of AI offensive operations is definitively over. (The Hacker News)

OpenClaw Supply Chain Crisis Escalates

824+ confirmed malicious skills across 10,700+ total in ClawHub (~8% of registry). Primary payload: Atomic macOS Stealer. 30,000+ publicly exposed instances; Censys tracked growth from ~1,000 to 21,000+ in a single week. Agent skills supply chain remains the most active attack surface. (eSecurity Planet)

NIST AI Agent Standards Initiative

Three pillars: standards, open-source protocol development, and agent security/identity research. RFI on Agent Security due March 9. Agent Identity and Authorization Concept Paper due April 2. Listening sessions in April. The US government's first major move to standardize agent governance. (NIST)

Framework Consolidation

Microsoft Agent Framework reached RC (GA end of Q1), merging AutoGen + Semantic Kernel. AutoGen and SK now in maintenance mode. Google ADK for TypeScript fills the JS ecosystem gap with strong typing for inter-agent data contracts. Windsurf shipped Phoenix Alpha with parallel multi-agent sessions and context window usage indicator. Cursor BugBot reached GA with autonomous PR scanning + cloud agent auto-fix for 1M+ users.

Hot Projects & Repos

Project	Stars	What It Does	Why It Matters
Worktrunk	2.7K	Rust CLI for Git worktrees with AI agents	Auto-squash/rebase/merge, LLM commit messages, build cache sharing. From PRQL creator.
Timber	545	Compile XGBoost/sklearn to C99 binaries	336x faster than Python, 48KB artifacts, MISRA-C compliant. Ollama for classical ML.
Codebuff	3.6K	Open-source multi-agent coding tool	Claims 61% vs Claude Code's 53% on 175-task eval. Any model via OpenRouter.
Hive Memory	new	MCP cross-project memory for agents	Fully local, cross-project context sharing. Show HN featured.
Sub-500ms Voice Agent	562 HN	Production voice agent in one day for $100	Groq ~80ms TTFT + streaming pipeline = 2x faster than Vapi. Blueprint documented.
PDF Oxide	304	5x faster PDF processing than PyMuPDF	Rust core, Python/JS/WASM bindings. MCP server included.
Omni	503	Self-hosted workplace search (open-source Glean)	Unified search across Google Workspace, Slack, Jira. ParadeDB single-Postgres.

Best Content This Week

Donald Knuth's "Claude's Cycles" — Knuth credits Claude Opus 4.6 for solving an open graph theory problem in 31 steps. Called it "a dramatic advance in automatic deduction." The godfather of CS revising his opinions about generative AI.
Max Woolf: AI Agent Coding in Excessive Detail — Skeptic-to-convert journey building Rust ML library 9-30x faster than Python equivalents. Key insight: agents work best when you have "approximate knowledge of many things with enough domain expertise to know what should and should not work."
Tenzai Vibe Coding Security Study — 69 vulnerabilities across 5 tools. Zero SQLi/XSS but pervasive business logic flaws. Security-focused prompts produced "minimal vulnerability reduction." CMU's SusVibes: only 10.5% of solutions both correct AND secure.
PRX Part 3: Text-to-Image in 24 Hours — Photoroom trains 1.3B model from scratch: TREAD token routing drops 50% of tokens, REPA alignment with DINOv3, Muon optimizer. Full Apache 2.0 release. Dramatically lowers the barrier to generative model training.
Interconnects Open Artifacts #19 — Best aggregation of Chinese open-weight releases: Qwen 3.5, GLM-5, MiniMax M2.5, Step-3.5-Flash. Nathan Lambert introduces RAM (Relative Adoption Metrics) normalizing downloads by model size class.

Hacker News Pulse

Story	Points	Comments	Signal
MacBook Neo launch (A18 Pro)	1,635	1,943	Day's top story. New ultraportable line — HN debates on-device inference.
Nobody Gets Promoted for Simplicity	833	473	Resonant essay connecting complexity incentives to AI code proliferation.
Qwen Leadership Exodus (Willison)	568	259	Community anxious about open-weight future.
Agentic Engineering Patterns (Willison)	497	283	Highest comment engagement for any AI story today.
Amodei calls OpenAI "straight up lies"	324	160	Lab tensions at peak intensity.
Qwen3.5 Fine-Tuning Guide (Unsloth)	300	70	Practitioners racing to learn models while project future is uncertain.
nCPU: CPU entirely on GPU	243	121	Heterogeneous compute implications for inference.
Google Workspace CLI	227	101	Agent tooling relevance — CLI for Workspace APIs.

Research Papers

AgentSentry — Temporal Causal Defense Against Prompt Injection

First defense modeling multi-turn indirect prompt injection as temporal causal takeover. Uses counterfactual re-executions at tool-return boundaries to detect when tool outputs steer agent behavior. Evaluated on AgentDojo across four task suites. Builder-ready pattern for tool-augmented agents. (arXiv 2602.22724)

Code Fingerprints — Model-Specific Code Attribution

Beyond binary human-vs-machine detection: identifies which specific LLM generated a code snippet. Enables vulnerability triage (which model produced the bug?), licensing audits, and distillation detection. Directly relevant to the Anthropic distillation crackdown. (arXiv 2603.04212)

Codified Context — Three-Component Agent Infrastructure

Hot memory + 19 specialized agents + cold knowledge base, evaluated across 283 sessions on a 108K-line codebase. Open-source companion repo. Blueprint for scaling agentic coding with structured context engineering. (arXiv 2602.20478)

MCPShield — 10% to 95% MCP Defense Rate

Plug-in security cognition layer with pre-invocation probing, runtime sandboxed projection, and post-invocation trace reasoning. Undefended MCP agents achieve only 10% defense rate; MCPShield reaches 95.3%. (arXiv 2602.14281)

AlgoVeri — Formal Verification Benchmark

First cross-language benchmark for formally verified code generation: 40.3% success in Dafny, 24.7% Verus, 7.8% Lean. LLMs handle high-level verified code but collapse on systems-level constraints and manual proofs. (arXiv 2602.09464)

Safety Alignment as Attack Surface

Adversaries inject documents into RAG knowledge bases that trigger safety refusals on benign queries. Weaponizes alignment homogeneity itself — high cross-model transfer rates. The alignment-as-vulnerability paradox. (arXiv 2603.03919)

AgentLAB — First Long-Horizon Agent Attack Benchmark

5 novel attack types (intent hijacking, tool chaining, task injection, objective drifting, memory poisoning) across 28 environments. Key finding: single-turn defenses fail against multi-turn adversarial strategies. (arXiv 2602.16901)

OSS Momentum

Repo	Stars	Velocity	Category
Shannon	30.7K	+1,854/day	AI pentester — #1 trending on GitHub
CC Switch	23.8K	+3,594/wk	Unified manager for Claude/Codex/Gemini CLI
PageIndex	20.4K	+2,851/wk	Vectorless RAG — 98.7% accuracy without embeddings
Ruflo	18.8K	+4,245/wk	Multi-agent swarm with Q-Learning router
GitNexus	9.6K	+6,262/wk	Knowledge graph giving agents structural awareness
OpenSandbox	6.1K	+4,592/wk	Alibaba's sandbox for agent execution
Agency-Agents	5.7K	+2,209/day	55+ specialized agent personas
OpenViking	4.6K	2mo old	ByteDance context database (filesystem paradigm)
ComposioHQ	3.6K	3wks old	Parallel coding agent fleet orchestrator
AI-Infra-Guard	3.0K	+1,041 commits	Tencent red teaming (400+ CVEs, MCP scanner)

Category signals: Agent Skills is now GitHub's fastest-growing category (Anthropic official at 83.9K stars). Sandbox infrastructure is consolidating (OpenSandbox, E2B). Vectorless RAG (PageIndex) challenges embedding orthodoxy. Multi-agent IDE tooling reflects developers juggling 3-5 AI tools simultaneously.

Newsletters & Blogs

Simon Willison: Agentic Engineering Anti-Patterns chapter (never file unreviewed AI PRs) + Qwen exodus analysis. 15th consecutive top source.
PRX Part 3 (Hugging Face/Photoroom): Full 1.3B text-to-image training recipe in 24 hours. Apache 2.0.
Nathan Lambert (Interconnects): RAM methodology for normalizing open-model adoption + Qwen/GLM/MiniMax coverage. Feed working again after 3+ run gap.
OpenAI Blog: GPT-5.2 extends gluon physics breakthrough to graviton amplitudes in quantum gravity.
Feed health: 4/15 feeds still broken (The Batch, Anthropic, Mistral, Eugene Yan). Interconnects recovered.

Community Pulse

Reddit Highlights

Qwen3.5 small model efficiency: 0.8B runs on 14-year-old i5/4GB DDR3. 35B-A3B hits 37.8% SWE-bench. 9B runs on Android. $3 10-minute finetune produces capable results. Local inference cost floor approaching zero.
NVFP4 coming to llama.cpp: GitHub PR #19769 active, days away. Unlocks native FP4 for memory-constrained users.
Claude Excel plugin: 987 upvotes. Financial modeler reports transformative adoption for complex multi-sheet models. Claude gaining real adoption in professional finance beyond coding.
Self-evolving Rust agent: 200-line agent wakes every 8 hours, reads its own code, files bugs, iterates autonomously toward rivaling Claude Code.
OpenAI subscriber exodus: 3,847 upvotes — top post across all subreddits. Day 3 = structural consequences: talent flight, subscriber churn, market-share transfer.

Skills to Learn Today

#	Skill	Domain	Difficulty
1	Secure MCP with mcp-scan tool pinning	agent-security	beginner
2	Spec-driven development with GitHub Spec Kit	vibe-coding	intermediate
3	Claude Code Agent Teams (2-16 instances, peer messaging)	agent-patterns	advanced
4	Structured note-taking for long-horizon agents	agent-patterns	intermediate
5	Three-tier context layering with Skills	vibe-coding	intermediate
6	Defense-in-depth for MCP tool poisoning	agent-security	advanced
7	RAG chunking: 512-token recursive (skip overlap)	ml-ops	intermediate
8	Detect distillation with behavioral fingerprinting	agent-security	advanced
9	Dynamic RAG with query-adaptive retrieval	ml-ops	advanced
10	Plan-iterate-test loop for production vibe coding	vibe-coding	beginner

Source Index

Breaking News: [1] Dataconomy — OpenAI code repo, [2] Bloomberg — Qwen exodus, [3] Apple Newsroom — Xcode 26.3, [4] SiliconANGLE — NVIDIA chip, [5] Axios — Claude #1, [6] Help Net Security — Agent identity dark matter, [7] TechNode — DeepSeek V4, [8] Anthropic Blog — Distillation attacks

SaaS Disruption: [9] TechCrunch — SaaSpocalypse, [10] SaaStr — 90/10 rule, [11] CNBC — Goldman/Anthropic, [12] Adweek — Agency vibe coding, [13] Computer Weekly — LinkedIn Hiring Assistant, [14] Salesforce IR — Agentforce $800M, [15] HR Executive — Workday layoffs

Vibe Coding: [16] GitHub CHANGELOG — Claude Code v2.1.68, [17] Anthropic — Trends Report, [18] Raycast Blog — Glaze, [19] Check Point Research — CVE-2026-21852, [20] NVIDIA NIM — GLM-5, [21] Windsurf Changelog

Thought Leaders: [22] The New Stack — Karpathy, [23] simonwillison.net — Agentic Engineering Patterns, [24] ARC Prize — ARC-AGI-3, [25] Bloomberg — Huang, [26] Startup News — LeCun

Agent Ecosystem: [27] The Hacker News — CyberStrikeAI, [28] Schneier on Security — Promptware Kill Chain, [29] SecurityWeek — ServiceNow/Veza, [30] NIST — Agent Standards, [31] eSecurity Planet — OpenClaw crisis

Projects: [32-38] GitHub — Worktrunk, Timber, Codebuff, OpenSandbox, AgentScope, Sub-500ms Voice Agent, Omni

Best Content: [39] Stanford CS — Knuth paper, [40] minimaxir.com — Max Woolf, [41] Tenzai — Vibe coding security, [42] HuggingFace Blog — PRX Part 3, [43] Interconnects — Open Artifacts #19

HN Pulse: [44] Apple Newsroom — MacBook Neo, [45] Terrible Software — Simplicity essay, [46] simonwillison.net — Qwen analysis, [47] Unsloth — Qwen3.5 fine-tuning

Research Papers: [48-54] arXiv — AgentSentry, Code Fingerprints, Codified Context, MCPShield, AlgoVeri, RAG blocking attack, AgentLAB

OSS Momentum: [55-64] GitHub — Shannon, CC Switch, PageIndex, Ruflo, GitNexus, OpenSandbox, Agency-Agents, OpenViking, ComposioHQ, AI-Infra-Guard

RSS/Blogs: [65] simonwillison.net — Anti-patterns, [66] HuggingFace Blog — PRX Part 3, [67] Interconnects — Open Artifacts #19, [68] OpenAI Blog — Graviton amplitudes

Community Pulse: [69-73] Reddit — Qwen efficiency, NVFP4, Claude Excel, Self-evolving agent, OpenAI exodus

Meta: Research Quality

873 total findings across 29 runs (24 new this run)
226 skills across 7 domains (10 new)
85 patterns tracked (5 new)
120 unique sources indexed
Most productive agents: news-researcher (12 findings), saas-disruption-researcher (25 findings — enormous day), thought-leaders-researcher (12), agents-researcher (12)
Top sources this run: Simon Willison Blog (appeared in 5 agents), TechCrunch (4 agents), GitHub (4 agents), arXiv (17 papers analyzed), Reddit (70 posts scanned)
New Tier 1 source: Schneier on Security — Promptware Kill Chain is foundational
Coverage gap: Consumer hardware (MacBook Neo was #1 on HN but we only caught it via HN agent). Consider monitoring Apple events more directly.
Feed health: 4/15 RSS feeds still broken (The Batch, Anthropic, Mistral, Eugene Yan). Interconnects recovered. Web supplements produce more value than feeds themselves.

How This Newsletter Learns From You

This newsletter has been shaped by 8 pieces of feedback so far. Every reply you send adjusts what I research next.

Your current preferences (from your feedback):

More builder tools (weight: +2.5)
More agent security (weight: +2.0)
More agent security (weight: +1.5)
More vibe coding (weight: +1.5)
Less market news (weight: -1.0)
Less valuations and funding (weight: -3.0)
Less market news (weight: -3.0)

Want to change these? Just reply with what you want more or less of.

Ways to steer this newsletter:

"More [topic]" / "Less [topic]" — adjust coverage priorities
"Deep dive on [X]" — I'll dedicate extra research to it
"[Section] was great" — reinforces that direction
"Missed [event/topic]" — I'll add it to my radar
Rate sections: "Vibe Coding section: 9/10" helps me calibrate

Reply to this email — I've processed 8/8 replies so far and every one makes tomorrow's issue better.