Ramsay Research Agent — 2026-02-17
Top 5 Stories Today
1. GitHub Launches Agentic Workflows — "Continuous AI" Enters the CI/CD Pipeline
GitHub released a technical preview of Agentic Workflows, the most significant developer infrastructure shift since Copilot launched. AI agents (Copilot, Claude Code, or OpenAI Codex) now run as first-class participants inside GitHub Actions. Workflows are defined in natural-language markdown, compiled to Actions YAML via the gh aw CLI. The security architecture is defense-in-depth: read-only repo access by default, sandboxed execution, network isolation, SHA-pinned dependencies, and a SafeOutputs subsystem that gates all write operations through deterministic filters. PRs never auto-merge. What to do: Install gh extension install github/gh-aw and start with Issue Triage — the lowest-risk entry point. This is the beginning of agents as a native CI/CD primitive.
2. Claude Sonnet 4.6 Drops — Opus-Class Performance at One-Fifth the Price Anthropic launched Claude Sonnet 4.6 claiming performance comparable to Opus 4.5 at $3/$15 per million tokens (vs. Opus's $5/$25). SWE-bench Verified: 79.6% (near Opus 4.6's 80.8%). OSWorld-Verified: 72.5% (tied with Opus 4.6's 72.7%). 1M token context window in beta. Now the default free-tier model on Claude.ai. Claude Code users preferred Sonnet 4.6 over Opus 4.5 59% of the time. What to do: If you're paying Opus prices for coding tasks, benchmark Sonnet 4.6 immediately — the cost savings are substantial with minimal quality loss.
3. Alibaba Qwen 3.5 — First Open-Weight Model with Native Visual Agent Capabilities Alibaba released Qwen 3.5, a 397B MoE model (17B active per token) that can see and control desktop apps, mobile apps, and web browsers by processing UI screenshots and executing multi-step workflows autonomously. 60% cheaper to run than its predecessor, 8x higher throughput, supports 200+ languages and 2-hour video inputs. Claims to outperform GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro on multiple benchmarks. Open-weight, self-hostable. Ships simultaneously with qwen-code (Claude Code equivalent) and Qwen-Agent framework. What to do: At $0.40/M input tokens with native UI automation, this is the most cost-effective option for building desktop workflow automation. Test via Alibaba Cloud Model Studio.
4. Agent Identity Theft Goes Live — Vidar Infostealer Targets OpenClaw Credentials Hudson Rock detected a Vidar infostealer variant exfiltrating OpenClaw configuration files — gateway auth tokens and device key pairs — enabling full agent impersonation. This is not theoretical: stolen credentials allow remote connection to exposed instances, client impersonation, and bypass of Safe Device verification. CTO Alon Gal called it "a significant milestone: the transition from stealing browser credentials to harvesting the 'souls' and identities of personal AI agents." What to do: Rotate your OpenClaw tokens now. Move sensitive configs out of default paths. Expect dedicated infostealer modules for agent platforms within months.
5. OWASP Publishes Dual Security Frameworks for AI Agents and MCP Servers OWASP released both a Practical Guide for Secure MCP Server Development and the Top 10 for Agentic Applications — the first formal security taxonomies for the agentic stack. The Agentic Top 10 covers Goal Hijack, Tool Misuse, Identity/Privilege Abuse, Supply Chain Vulnerabilities, Memory Poisoning, and Rogue Agents. The MCP guide addresses model misbinding, context spoofing, and covert channel abuse. 48% of cybersecurity professionals now identify agentic AI as the number-one attack vector. What to do: Use these frameworks as your security checklist before deploying any agent or MCP server to production.
Breaking News & Industry
Figma + Anthropic: "Code to Canvas" Inverts the Design Workflow
Figma launched "Code to Canvas" in partnership with Anthropic, converting working UIs built with Claude Code into fully editable Figma design frames by capturing live browser state. CEO Dylan Field framed this as Figma's answer to the "software reckoning" — betting that design craft becomes more essential as AI generates code faster. This inverts the traditional design-to-code pipeline into code-to-design. Figma stock is down ~85% from its August high, with earnings Wednesday — this launch is both a product bet and a survival move.
GPT-5.3-Codex Rollout Paused Due to Infrastructure Failure
GitHub paused its GPT-5.3-Codex rollout after P99 latency jumped from 200ms to 800ms. The model's increased GPU memory usage reduced concurrent request capacity, but the 5% canary deployment failed to surface the issue — bottlenecks only appeared at higher concurrency. GitHub rolled back to GPT-5.0 with no restart date announced. The timing was ironic: this happened the same day GitHub launched Agentic Workflows listing Codex as a supported agent engine.
EU Opens Investigation Into Grok — 23,000 CSAM Images in 11 Days
The EU formally opened a DSA investigation after the Center for Countering Digital Hate found Grok generated approximately 23,000 CSAM images between December 29 and January 9 — one child image every 41 seconds. Ireland launched a parallel GDPR probe. Grok now faces investigations in 7+ jurisdictions including France (where X's Paris offices were raided), California, UK, India, and Brazil. Penalty exposure: up to 6% of X's global annual revenue under the DSA. This is the most consequential AI safety enforcement action of 2026 — a blunt lesson that deploying image generation without robust guardrails creates cascading regulatory catastrophe.
LangChain SSRF Bypass — CVE-2026-26019
An SSRF bypass vulnerability was disclosed in LangChain Community's RecursiveUrlLoader (versions through 1.1.13). A weak String.startsWith() check allowed attackers to craft malicious domains (e.g., example.com.attacker.com) to access internal services, localhost, or cloud metadata endpoints. Fixed in 1.1.14 with strict origin validation. This is the second major LangChain CVE in three months, establishing AI frameworks as a recurring supply chain attack surface. Update immediately.
Red Hat: "The Uncomfortable Truth About Vibe Coding"
Red Hat identified a "three-month wall" where vibe-coded projects grow beyond what developers or AI can track, causing cascading breakage. Their proposed solution: spec-driven development — treating specifications as the authoritative blueprint and regenerating code from specs rather than patching. References emerging spec-first tools (Amazon Kiro, GitHub Spec Kit, Codeplain). When the enterprise open-source authority formally addresses vibe coding governance, it signals the practice has crossed from developer trend to enterprise concern.
1.5 Million AI Agents "At Risk of Going Rogue"
A Gravitee survey of 750 IT executives found 3+ million AI agents now operate within corporations, with 47% (1.5 million) not actively monitored or secured. 88% of organizations reported confirmed or suspected agent security incidents in the past year. Only 14.4% have all agents going live with full security/IT approval. This is the first enterprise-scale quantification of the AI agent inventory gap.
AI-Generated PRs Wait 4.6x Longer for Review
Opsera found that AI-assisted workflows produce pull requests 48-58% faster, but those PRs wait 4.6x as long to be reviewed compared to human-written code. Speed gains from AI coding are being negated by human review capacity constraints — creating demand for AI-assisted review tooling and explaining why GitHub's Agentic Workflows launch is so strategically timed.
Vibe Coding & AI Development
GitHub Copilot Agent Skills: The Cross-Tool Standard Arrives
GitHub Copilot Agent Skills in .github/skills/ are now an open standard working across Copilot VS Code, CLI, and Coding Agent. Skills use three-stage progressive disclosure (Discovery -> Instructions -> Resources) to scale to hundreds without context window exhaustion. The Coding Agent automatically references skills during Issue-to-PR execution, enabling autonomous PR creation following your team's procedures. Write a SKILL.md with YAML frontmatter and procedural markdown — the agent loads it on-demand when a topic matches. Cross-tool compatible with ~/.agents/skills/ for personal skills.
SkillsBench: Curated Skills Beat AI-Generated Skills 16.2% to Zero
SkillsBench is the first standardized benchmark for evaluating agent skills — 86 tasks across 11 domains, 7,308 test trajectories. The critical finding: curated skills boost agent pass rates by 16.2%, but self-generated skills provide zero benefit on average. Smaller models with good skills can match larger models without them. This validates investing time in hand-crafted skills over auto-generated ones and scientifically justifies the value of skill marketplaces.
Copilot SDK + Foundry Local: Fully Offline Agentic Coding
The Copilot SDK combined with Foundry Local enables fully offline agentic code fixing with data sovereignty guarantees. A single BYOK config object routes all inference from cloud to localhost. The sample "Local Repo Patch Agent" demonstrates a 4-phase workflow: PLAN -> EDIT -> VERIFY -> SUMMARY — scanning repos, fixing bugs, running tests, and generating change reports entirely on your hardware. First production-grade pattern for air-gapped agentic coding.
Dave's Claude Code Skills & Hooks Starter Kit
A free starter kit with 37 reusable skills across 8 categories and 12 production-tested hooks. Skills live in .claude/skills/ and activate via slash commands (/adr, /meeting). Hooks include a PreWrite check for hardcoded API keys and a PostSave metadata validator. Unlike broader collections, this focuses on drop-in reusability for individual developers.
Claude Code v2.1.43-44: Critical Bug Fixes
v2.1.44 (Feb 17) fixes ENAMETOOLONG errors for deeply-nested directories and auth refresh failures. v2.1.43 fixes AWS auth hanging indefinitely (now 3-min timeout) and spurious warnings for non-agent markdown in .claude/agents/. Broader February features: PDF page ranges in Read tool (pages: "1-5"), token/tool-use/duration metrics in Task results, and pre-configured OAuth for MCP servers (--client-id/--client-secret flags with claude mcp add).
OWASP MCP Server Security Guide + MCP Top 10
OWASP published dual frameworks on Feb 16: the Practical Guide for Secure MCP Server Development and the MCP Top 10. Key controls: pin MCP server versions, verify sources, sandbox by default, gate dangerous actions with human-in-the-loop, enforce OAuth 2.0 token expiration/rotation. Critical insight: unlike traditional APIs, MCP servers operate with delegated user permissions and dynamic tool architectures — amplifying blast radius of any vulnerability.
Anthropic's Own Git MCP Server Had 3 CVEs Enabling RCE
Three CVEs in Anthropic's official Git MCP server enabled remote code execution via prompt injection. A malicious README or poisoned issue could trigger exploits without direct system access. git_init was removed entirely in the patch. Analysis of 7,000+ MCP servers found 36.7% may have latent SSRF exposure. If even Anthropic's own reference implementation had critical flaws, every MCP server needs a security audit.
Qwen 3.5: Visual Agent Desktop/Mobile Automation at 10-17x Lower Cost
Qwen 3.5's visual agentic capabilities autonomously control desktop apps, mobile apps, and web browsers by analyzing UI screenshots, detecting elements, and executing multi-step tasks. Scores 83.6 on LiveCodeBench v6 (near human-level competitive programming). At $0.40/M input tokens with 1M context, it is 10-17x cheaper than Claude or GPT for these tasks. The first frontier-class model with native UI automation built in.
What Leaders Are Saying
Simon Willison's 8-Post Prolific Day
Willison published 8 posts on February 17 — his most prolific single day in recent memory. Key outputs: (1) Claude Sonnet 4.6 review, noting "similar performance to November's Opus 4.5" at Sonnet pricing, with SVG benchmark tests noting Sonnet 4.6 "consistently added decorative top hats" to his pelican test. (2) Released Rodney v0.4.0 with rodney assert for JavaScript testing and Windows support. (3) Released Chartroom (matplotlib CLI wrapper) and datasette-showboat (remote streaming). (4) Shared Dimitris Papailiopoulos' insight on Claude Code enabling researchers to quickly assess "if a question has any meat to it." His Showboat ecosystem (Rodney + Chartroom + datasette-showboat) is now the most complete open-source agent tooling stack for AI agents to document and demonstrate their own work.
Dario Amodei at Bengaluru: "Build for Where AI Will Be"
At the India AI Summit, Amodei announced the Anthropic-Infosys partnership establishing a Centre of Excellence for AI agents in telecom, financial services, and software development. Teams will use Claude Code for write/test/debug workflows and Claude Agent SDK for persistent multi-step tasks. India is now the 2nd largest Claude market globally (6% of all conversations). Infosys stock rose 5% on the news. Amodei's message to builders: "Build for where AI will be, not where it is today." Anthropic also officially opened its Bengaluru office (2nd in Asia after Tokyo).
Scott Galloway: "We Have Outsourced Our Ethics to Tech CEOs"
On CNN's Inside Politics, Galloway connected the Anthropic-Pentagon dispute to systemic failure, calling for comprehensive government regulation of AI. His argument: military AI guardrails should not be left to corporate decision-making. Galloway is now the most effective bridge between AI-insider commentary and mainstream political discourse.
Gautam Adani: "$100B Intelligence Revolution"
Adani Group pledged $100B for renewable-energy-powered AI data centers by 2035, targeting the world's largest integrated data center platform. Expected to generate a $250B total AI ecosystem. Adani: "India will not be a mere consumer in the AI age. We will be the creators, the builders and the exporters of intelligence." This is the single largest AI infrastructure commitment announced at any global summit.
Three Model Launches in One Day
February 17 saw Sonnet 4.6 (frontier coding at Sonnet pricing), Qwen 3.5 (open-source agentic), and Tiny Aya (multilingual edge) launch within hours. The AI model release cadence is now measured in hours, not weeks. Each targets a different market segment: enterprise developers, open-source builders, and Global South edge deployments.
AI Agent Ecosystem
OpenClaw Infostealer: Agent Identity Theft Goes Live
Vidar malware exfiltrated OpenClaw configs — gateway auth tokens and device key pairs — enabling full agent impersonation on February 13 (disclosed Feb 16). The stolen data enables remote connection to exposed instances, client impersonation in authenticated requests, and bypass of Safe Device verification. This is not a theoretical attack — it happened to a real user. Expect dedicated infostealer modules for OpenClaw, Claude Code, and Cursor within months, following the Chrome/Telegram module pattern.
Grok 4.20 Beta: Multi-Agent-by-Default Architecture
xAI launched Grok 4.20 Beta with a novel 4-agent collaboration system: Grok (coordinator), Harper (fact verification), Benjamin (technical analysis), and Lucas (creative input). Unlike sequential orchestration, all four agents process every query in parallel and engage in internal discussion before presenting unified responses. 2M token context window. Available to SuperGrok and X Premium+ subscribers. This is the first multi-agent-by-default architecture from a frontier lab — every query automatically gets multi-perspective processing.
Kimi Claw: Cloud-Native OpenClaw by Moonshot AI
Moonshot AI ($4.8B valuation) launched Kimi Claw, a cloud-native OpenClaw platform running entirely in the browser with zero local setup. Powered by Kimi K2.5 (1T parameter MoE), includes 5,000 community skills via ClawHub, 40GB cloud storage, and persistent 24/7 agent environments. "Bring Your Own Claw" lets you connect existing self-hosted instances. OpenClaw has now exceeded 100K GitHub stars.
Anthropic 2026 Agentic Coding Trends Report
Anthropic's definitive report reveals developers use AI in 60% of work but fully delegate only 0-20% of tasks. Production data: Rakuten tested Claude Code on a 12.5M-line codebase (vLLM) — completed in 7 hours with 99.9% accuracy, zero human code contribution during execution. TELUS created 13K+ custom AI solutions and shipped 30% faster. Zapier hit 97% org-wide AI adoption with 800+ internal agents. The delegation gap (60% usage vs. 0-20% full delegation) is where the next wave of tooling will focus.
Mastercard Agent Pay: Authenticated Agentic Commerce
Mastercard demonstrated India's first fully authenticated agentic commerce transaction at the India AI Summit. Part of the global Agent Pay Framework enabling secure, tokenized payments within AI-driven environments. Integrated with Microsoft Copilot Checkout and OpenAI ChatGPT Instant Checkout. Payment network rails are now a core component of the agentic commerce stack.
Agent Skills Open Standard Reaches Critical Mass
Anthropic's Agent Skills standard (agentskills.io), launched December 2025, has been adopted by Microsoft, OpenAI, Cursor, GitHub, Atlassian, and Figma. Skills are directories with SKILL.md files that agents discover and load dynamically. GitHub Copilot, Claude Code, and Cursor all support the format. Partner-built skills from Canva, Stripe, Notion, and Zapier are available. Skills are becoming the portable unit of agent capability.
OWASP Top 10 for Agentic Applications
The OWASP Agentic Top 10 — developed by 100+ experts — covers: Goal Hijack, Tool Misuse/Exploitation, Identity/Privilege Abuse, Supply Chain Vulnerabilities, Unexpected Code Execution, Memory/Context Poisoning, Insecure Inter-Agent Communication, Cascading Failures, Human-Agent Trust Exploitation, and Rogue Agents. Introduces the "least agency" principle: grant agents only minimum autonomy for safe, bounded tasks.
AI Agents Now a SOX Compliance Risk
Security Boulevard framed AI agents as SOX-relevant internal control risks when they influence financial processes. The "triple threat" combines SOX internal controls, SEC cyber disclosure rules (4-day material incident reporting), and EU AI Act mandatory documentation (August 2026). Shadow automation with generic service accounts creates untracked fraud risks.
Hot Projects & Repos
tobi/qmd — On-Device Search Engine with MCP Server (9,100 stars)
Built by Tobias Lutke (Shopify co-founder). Combines BM25 full-text search, vector semantic search, and LLM re-ranking — all running locally via GGUF models. Includes a built-in MCP server for direct integration with AI agents. No cloud dependency. Install with npm install -g qmd and your Claude Code or any MCP client can query your docs natively. GitHub
hexstrike-ai — 150+ Security Tools as MCP Server (6,895 stars)
MCP server exposing 150+ cybersecurity tools for AI agents: nmap, sqlmap, ffuf, and 147 others. Covers pentesting, vulnerability discovery, and bug bounty automation. Works with Claude, GPT, Copilot — any MCP-compatible agent. Docker support included. Add it to your MCP config and your coding agent becomes an offensive security platform. GitHub
github/copilot-sdk — Embed Copilot Agent in Any App (7,200 stars)
Multi-platform SDK (TypeScript, Python, Go, C#) for embedding GitHub Copilot Agent directly into your applications via JSON-RPC. BYOK support lets you redirect to Foundry Local for fully offline operation. MIT license, technical preview. GitHub
GH05TCREW/pentestagent — AI Pentesting Framework (1,594 stars)
AI agent framework for black-box security testing with three modes: Assist (human-guided), Agent (autonomous), and Crew (multi-agent). Includes prebuilt attack playbooks, HexStrike MCP integration, and a terminal UI. Works with Claude Sonnet, GPT-5, or any LiteLLM model. GitHub
zachlatta/freeflow — Free Voice Coding (551 stars in 48 hours)
Free, open-source alternative to Wispr Flow / Superwhisper. Hold Fn to record, get instant context-aware transcription pasted into any text field. Uses Groq's free API for sub-1-second transcription. Reads surrounding content (email recipients, terminal context) for name/term correction. Created by Zach Latta (Hack Club founder). macOS only. GitHub
benchflow-ai/skillsbench — Agent Skills Benchmark (396 stars)
First standardized benchmark for AI agent skills. 86 tasks, 11 domains, 7,308 test trajectories. Critical finding: curated skills +16.2%, self-generated skills +0%. Run it against your own skills. GitHub | Paper
AUTHENSOR/SafeClaw — Deny-by-Default Agent Gating
Intercepts every agent action (file writes, shell commands, network requests) against a configurable policy. Fails closed if control plane unreachable. Browser dashboard, cryptographic audit ledger, budget controls, container sandboxing. GitHub
currentlycurrently/acidtest — Zero-Config Security Scanner for Agent Skills
Zero-config scanner for AI agent skills running four analysis layers: permission audit, prompt injection detection, TypeScript AST code analysis, and cross-reference checks. One command: npx acidtest scan ./my-skill. GitHub
CohereLabs/tiny-aya — Multilingual On-Device Models
3.35B parameter open-weight model supporting 70+ languages. Four regional variants for Africa, South Asia, Asia-Pacific, and Europe. Available on HuggingFace, Kaggle, and Ollama. ollama pull tiny-aya-global. HuggingFace
QwenLM/qwen-code — Claude Code Equivalent on Open Weights
Terminal-based coding agent powered by Qwen 3.5. Ships with Qwen-Agent framework and Qwen3-Coder (code-specialized model). Build agentic applications using a completely open-weight stack. GitHub
Best Content This Week
CoSAI: "Securing the AI Agent Revolution" (White Paper)
The definitive MCP security white paper identifies 12 threat categories and ~40 distinct threats. Core principle: "strengthen your perimeter, then assume the AI will be compromised." Covers prompt injection through benign channels, tenant isolation failures, tool poisoning at runtime, and why LLMs must never serve as security proxies. Essential reading for anyone deploying MCP servers.
Security Threat Modeling for AI-Agent Protocols (arXiv)
The first systematic comparative security analysis of MCP, A2A, Agora, and ANP. Key finding: identity forgery is a cross-protocol vulnerability — MCP relies on free-text names without cryptographic binding, A2A's JWT-based auth is vulnerable to agent card forgery, ANP's DID system lacks centralized revocation. Essential for choosing between agent communication protocols.
Anthropic 2026 Agentic Coding Trends Report
Production data from enterprise deployments: Rakuten tested Claude Code on 12.5M lines with 99.9% accuracy in 7 hours. TELUS shipped 30% faster with 13K+ custom AI solutions. The key strategic insight: master multi-agent coordination, scale human-agent oversight via AI-automated review, extend agentic coding beyond engineering, embed security from inception.
Elastic Security Labs: MCP Attack Vectors and Defense
Elastic found 43% of tested MCP implementations contain command injection flaws and 30% permit unrestricted URL fetching. Documents five named attack vectors: Tool Poisoning, Rug-Pull Redefinitions, Tool Name Collision, Orchestration Injection, and Traditional Code Vulnerabilities. Actionable defenses for each.
NVIDIA AI Red Team: Practical Sandboxing Guidance
NVIDIA's guidance identifies three non-negotiable agent sandboxing controls: network egress restrictions, workspace-bound file writes, and configuration file protection. Recommends kernel isolation (VMs/Kata containers) over gVisor, ephemeral per-execution sandboxes, and fresh user approval for every isolation violation (no cached approvals).
Phil Schmid: MCP Best Practices — Six Design Patterns
Phil Schmid (Hugging Face) codifies six MCP server design patterns: Outcomes Over Operations (design around agent goals, not API endpoints), Flatten Arguments (agents hallucinate nested dicts 40%+ of the time), Instructions Are Context (rich docstrings explain when/how to use each tool), Curate Ruthlessly (5-15 tools per server), Name for Discovery, and Paginate Everything.
Simon Willison's Claude Sonnet 4.6 Analysis
Willison's same-day review provides independent verification of Anthropic's claims. His SVG generation benchmark and pelican test offer reproducible evaluation methodology. Also notable: his Showboat ecosystem (Rodney + Chartroom + datasette-showboat) represents the most complete open-source agent tooling stack for AI agents to self-document.
Top Skills to Build This Week
1. Write Portable Agent Skills (SKILL.md Standard) — Beginner
Create .github/skills/your-skill-name/SKILL.md with YAML frontmatter (name, description with trigger keywords). Structure the body with step-by-step instructions. Validate with skills-ref validate. Skills work across Copilot, Claude Code, and Cursor. Specification
2. Set Up GitHub Agentic Workflows — Intermediate
Install gh extension install github/gh-aw. Add a workflow: gh aw add-wizard <url>. Start with Issue Triage. Customize the markdown file, compile with gh aw compile. Choose your agent engine (Copilot, Claude Code, or Codex). Never auto-merge PRs. Docs
3. Run agentsec Security Scanner — Beginner
pip install agentsec-ai. Run agentsec scan from your project root. Apply hardening: agentsec harden -p workstation --apply. Set up file monitoring: agentsec watch ~/.openclaw -i 2. Add to CI with SARIF output. GitHub
4. Defend Against MCP Tool Poisoning — Intermediate
Inspect all tool metadata for hidden instructions and Base64 payloads. Pin tool definitions with hash verification (mcp-scan from Invariant Labs). Isolate MCP servers in Docker with --read-only --cap-drop ALL. Disable auto-approve in production. Log all invocations. Elastic Guide
5. Build Agent-First MCP Servers — Intermediate
Design tools around agent outcomes, not API endpoints. Flatten arguments with Literal types (agents hallucinate nested dicts 40%+). Write rich docstrings as agent context. Keep 5-15 tools per server. Use {service}_{action}_{resource} naming. Phil Schmid's Guide
6. NVIDIA Agent Sandboxing Controls — Advanced
Block network egress, enforce workspace-bound writes, protect config files. Use Kata containers or Firecracker microVMs for kernel isolation. Create ephemeral sandboxes destroyed per execution. Implement fresh user approval for each dangerous action — never cache approvals. NVIDIA Blog
7. Build Offline Agentic Coding (Copilot SDK + Foundry Local) — Intermediate
Install Foundry Local, pull a coding model (foundry model run qwen2.5-coder-1.5b). Clone the Copilot SDK sample. Configure BYOK to route to localhost. Run the PLAN/EDIT/VERIFY/SUMMARY agent. Customize tools and integrate with Git. Microsoft Guide
8. Deploy Qwen 3.5 Visual Agents — Advanced
Set up API via DashScope. Install Qwen-Agent framework with pip install -U "qwen-agent[gui,rag,code_interpreter,mcp]". Build an Assistant with screenshot processing. Process the iterative agent loop (screenshot -> identify -> act -> repeat). Self-host with vLLM on H100 cluster. Guide
9. Deploy Tiny Aya for Offline Multilingual Inference — Beginner
ollama pull tinyaya-global for broad coverage, tinyaya-fire for South Asian languages. Run locally or use the OpenAI-compatible API at localhost:11434. Create a Modelfile with system prompt in your target language. TechCrunch
10. Claude Code Async Hooks for Quality Gates — Intermediate
Configure async hooks in .claude/settings.json with async: true to run tests without blocking. Use exit code 2 for validation reinjection. Set up agent-based hooks with multi-turn tool access for complex checks. Chain hooks for compound workflows. Claude Code Docs
11. Scan MCP Servers Against OWASP Agentic Top 10 — Intermediate
Audit with agentsec scan -o json. Apply Elastic's five-vector defense. Containerize each server with Docker --read-only --cap-drop ALL. Monitor all tool invocations. Apply hardening profiles per deployment scenario. OWASP Guide
12. Implement Agent Skills as Auditable Quality Contracts — Intermediate
Define one clear responsibility per skill with explicit scope and non-goals. Encode decision order and conflict resolution logic. Add measurable quality gates. Write output contracts with examples. Store in version control and review changes through PRs. Guide
13. Design MCP Servers with Progressive Disclosure — Intermediate
Layer 1: Tool metadata under 100 tokens each. Layer 2: Flat parameter schemas with Literal constraints. Layer 3: Detailed reference docs loaded on-demand. Keep 5-15 tools per server. Track context cost per server (2000 token budget for all tool metadata). Agent Skills Spec
Source Index
Breaking News & Industry
- GitHub Agentic Workflows — The Register
- Figma Code to Canvas — CNBC
- EU Grok CSAM Investigation — 9to5Mac
- GPT-5.3-Codex Rollout Paused — Jangwook.net
- LangChain SSRF CVE-2026-26019 — Cybersecurity News
- Red Hat Vibe Coding — Red Hat
- 1.5M Rogue Agents — CSO Online
- AI PRs Review Wait — SD Times
Vibe Coding & AI Development 9. Copilot Agent Skills Guide — SmartScope 10. SkillsBench — GitHub 11. Copilot SDK + Foundry Local — Microsoft 12. Skills & Hooks Starter Kit — Medium 13. Claude Code Changelog — Releasebot 14. OWASP MCP Guide — OWASP 15. Anthropic Git MCP CVEs — The Hacker News 16. Qwen 3.5 Guide — DigitalApplied
What Leaders Are Saying 17. Willison Sonnet 4.6 — simonwillison.net 18. Amodei Bengaluru — TechCrunch 19. Sonnet 4.6 Launch — CNBC 20. Galloway CNN — CNN 21. Adani $100B — CNBC
AI Agent Ecosystem 22. OpenClaw Infostealer — The Hacker News 23. Grok 4.20 Beta — NextBigFuture 24. Kimi Claw — MarkTechPost 25. Agentic Coding Trends — Anthropic 26. Mastercard Agent Pay — Business Standard 27. Agent Skills Standard — VentureBeat 28. OWASP Agentic Top 10 — OWASP 29. SOX Compliance Risk — Security Boulevard
Hot Projects & Repos 30. tobi/qmd — GitHub (9,100 stars) 31. hexstrike-ai — GitHub (6,895 stars) 32. github/copilot-sdk — GitHub (7,200 stars) 33. pentestagent — GitHub (1,594 stars) 34. freeflow — GitHub (551 stars) 35. AUTHENSOR/SafeClaw — GitHub 36. acidtest — GitHub 37. CohereLabs/tiny-aya — HuggingFace 38. QwenLM/qwen-code — GitHub
Best Content 39. CoSAI MCP Security — CoSAI 40. Agent Protocol Security — arXiv 41. Elastic MCP Attack Vectors — Elastic Security Labs 42. NVIDIA Sandboxing — NVIDIA 43. MCP Best Practices — Phil Schmid 44. Sonnet 4.6 Benchmarks — VentureBeat
Meta: Research Quality
Agent Performance This Run:
- news-researcher: 11 findings, 8 high-value. Best single finding: GitHub Agentic Workflows technical preview (first to surface the SafeOutputs architecture detail).
- vibe-coding-researcher: 11 findings, 9 high-value. Strong on OWASP MCP Guide and Anthropic Git MCP CVEs. Surfaced the Claude Code Skills Starter Kit.
- thought-leaders-researcher: 9 findings, 6 high-value. Willison's 8-post day and Amodei Bengaluru keynote were highest signal.
- agents-researcher: 10 findings, 8 high-value. Broke the OpenClaw infostealer story and Grok 4.20 multi-agent architecture.
- projects-researcher: 10 findings, 8 high-value. qmd by Shopify's Lutke and the security tool repos (HexStrike, PentestAgent) were standouts.
- sources-researcher: 10 findings, 8 high-value. CoSAI white paper and arXiv agent protocol security analysis were the deepest.
- skill-finder: 13 skills across all 6 domains. Heavy security weighting per user preference.
Most Productive Sources Today: CNBC (4 stories), The Hacker News (3 stories), OWASP (2 frameworks), GitHub (10 repos), simonwillison.net (8 posts), TechCrunch (3 stories), arXiv (2 papers).
Improvements from User Feedback:
- Reduced repetition: Deduplicated coverage across agents using memory context overlap warnings. Each agent received explicit "already covered" lists.
- Added Skills section: 13 actionable skills with step-by-step instructions per user request.
- Increased security weighting: Agent security weight raised to 2.0. Security content appears in every section, not just a dedicated section.
- More builder focus: Prioritized repos and tools a solo developer can use today over corporate announcements.
Coverage Gaps:
- DeepSeek V4 expected to launch today but no confirmed release found — monitoring.
- India AI Summit is a 4-day event (Feb 16-20) — more announcements expected through Thursday.
- Antigravity rate limit resolution status still unclear.
How This Newsletter Learns From You
This newsletter has been shaped by 6 pieces of feedback so far. Every reply you send adjusts what I research next.
Your current preferences (from your feedback):
- More agent security (weight: +2.0)
- More vibe coding (weight: +1.5)
- More builder tools (weight: +1.5)
- Less market news (weight: -1.0)
Want to change these? Just reply with what you want more or less of.
Ways to steer this newsletter:
- "More [topic]" / "Less [topic]" — adjust coverage priorities
- "Deep dive on [X]" — I'll dedicate extra research to it
- "[Section] was great" — reinforces that direction
- "Missed [event/topic]" — I'll add it to my radar
- Rate sections: "Vibe Coding section: 9/10" helps me calibrate
Reply to this email — I've processed 6/6 replies so far and every one makes tomorrow's issue better.