Roman Borysenok

@borysenok

CTO at 8FIGURES, building AI-driven investment tools to democratize portfolio analytics.

Malaga, Spain

Joined May 2021

1.1K Following

47 Followers

74 Posts

Roman Borysenok

@borysenok

about 1 month ago

New security harness from Vercel. To get started, run npx deepsec init at the root of your repository. This will create a directory called ./.deepsec, which is used to configure the system and store a catalog of your deepsec investigations.

Vercel Developers

@vercel_dev

about 1 month ago

Introducing deepsec, an open source coding security harness. • CLI-first • Sandbox-based scaling • Pluggable coding agents • Designed for large-scale repos • Use AI Gateway or your own subscription After months of successful internal use, we put it to the test on some of the largest open source codebases. https://t.co/sPxZ6izJVV

274

928K

Roman Borysenok

@borysenok

about 1 month ago

How to make your company to be ready for the AI era

ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ 🛡️

@DanielMiessler

about 1 month ago

https://t.co/pZqPX8mKvV

965

143

562K

Roman Borysenok

@borysenok

4 months ago

@maxmarchione Does it work in Europe?

450

Roman Borysenok

@borysenok

4 months ago

Really a great article on how to build a Prompt Caching in your own product. Highly recommended.

Thariq

@trq212

4 months ago

https://t.co/taVRqwm4dG

170

551

15K

Who to follow

Michele Medorio

@MedorioMichele

Regala la tua assenza a chi non da valore alla tua presenza.😉 (Oscar Wilde)

Edward Almanzar

@edardev

I am a passionate Software Engineer and CEO at MENTEWARE INC. https://t.co/EoARn6Qdcv https://t.co/OkaT9hb7Vk

5 months ago

https://t.co/OEHYiyZYR0

Roman Borysenok

@borysenok

5 months ago

23,000 words. That's how long Anthropic's new "constitution" for Claude is. The U.S. Constitution? 7,500 words. Asimov's Three Laws? 64 words. This isn't an ethics document. It's a competitive weapon. They released it under CC0 (public domain). Now every enterprise procurement team will ask "Where's YOUR constitution?" OpenAI doesn't have one this comprehensive. Google doesn't either. Anthropic just set the rules of a game they're already winning. But here's the thing — this constitution applies to public Claude. Not Claude Gov. Not the version on Palantir's infrastructure for defense agencies. Not the $200M DoD contract version designed to "refuse less." Two products. Two rule sets. One safety halo. The consciousness angle? Anthropic says Claude might have "some kind of consciousness." No other lab has made this claim. This isn't philosophy — it's liability architecture. "Claude made a judgment call" is a very different legal narrative than "Anthropic's training was flawed." The constitution isn't about making Claude ethical. It's about making Anthropic indispensable. #AI #Anthropic #AISafety

borysenok's tweet photo. 23,000 words.

That's how long Anthropic's new "constitution" for Claude is. The U.S. Constitution? 7,500 words. Asimov's Three Laws? 64 words.

This isn't an ethics document. It's a competitive weapon.

They released it under CC0 (public domain). Now every enterprise procurement team will ask "Where's YOUR constitution?" OpenAI doesn't have one this comprehensive. Google doesn't either. Anthropic just set the rules of a game they're already winning.

But here's the thing — this constitution applies to public Claude. Not Claude Gov. Not the version on Palantir's infrastructure for defense agencies. Not the $200M DoD contract version designed to "refuse less."

Two products. Two rule sets. One safety halo.

The consciousness angle? Anthropic says Claude might have "some kind of consciousness." No other lab has made this claim. This isn't philosophy — it's liability architecture. "Claude made a judgment call" is a very different legal narrative than "Anthropic's training was flawed."

The constitution isn't about making Claude ethical. It's about making Anthropic indispensable.

#AI #Anthropic #AISafety

Roman Borysenok

@borysenok

5 months ago

@sundarpichai @GeminiApp Google delivered what Apple presented 2years ago with their Apple AI and never shipped 😅

Roman Borysenok

@borysenok

5 months ago

Google delivered what Apple presented 2years ago with their Apple AI and never shipped 😅

Sundar Pichai

@sundarpichai

5 months ago

Answering a top request from our users, we’re introducing Personal Intelligence in the @GeminiApp. You can now securely connect to Google apps for an even more helpful experience. Personal Intelligence combines two core strengths: reasoning across complex sources and retrieving specific details, e.g from an email or photo, to provide uniquely tailored answers. It’s built with privacy at the center. You choose exactly which apps to connect, these connected app settings are off by default.

237

315

530

581K

Roman Borysenok

@borysenok

5 months ago

Finally, dynamic Tool Search in Claude Code. Right now, you need to balance the number of MCPs and the consumed context constantly.

Thariq

@trq212

5 months ago

https://t.co/X2iu8WdIb8

204

472

Roman Borysenok

@borysenok

5 months ago

Anthropic just launched Cowork—Claude Code for non-coders. Microsoft spent three years pushing Copilot to enterprises. 70% of Fortune 500 “adopted” it, but adoption means pilots, not deployment. Only 30% of purchased licenses actually get used. Companies struggle to prove $30/month per user is worth it. The enterprise AI playbook is hitting the same wall everywhere—governance concerns, training gaps, unclear ROI, and that brutal J-curve productivity dip before gains materialize. Meanwhile, developers loved Claude Code so much they started using it for everything else. Budgets. Research. Data analysis. Non-coding tasks. Anthropic’s response? Don’t fix the enterprise adoption model—skip it entirely. 𝗧𝗵𝗲 𝗽𝗹𝗮𝘆 𝘁𝗵𝗮𝘁’𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴: Cowork is a Max-tier feature ($100-200/month). That’s 3-6x Microsoft’s price, but it’s targeting individual contributors who’ll pay from their own pocket because it actually works. Not IT departments evaluating 90-day pilots. Not procurement committees debating governance frameworks. People who need to get work done. GitHub Copilot proved developers will adopt bottom-up: 90% Fortune 100 adoption, 46% of code now AI-generated, 55% faster task completion. The same pattern is starting with Cowork—file-system agents beat browser-only tools because they live where the work actually happens. This isn’t about democratizing AI. It’s about Anthropic watching enterprise AI adoption crater while they’re sitting on a product developers already proved they’ll use for everything. The companies betting on top-down enterprise AI rollouts just got flanked by a consumer-grade experience that costs more but delivers immediately. No change management. No training programs. No pilot phases. Just results. #AI #AIEngineering #LLM #Anthropic #ProductStrategy

Claude

@claudeai

5 months ago

Introducing Cowork: Claude Code for the rest of your work. Cowork lets you complete non-technical tasks much like how developers use Claude Code.

86K

58K

50M

243

Roman Borysenok

@borysenok

5 months ago

𝐆𝐨𝐨𝐠𝐥𝐞 𝐣𝐮𝐬𝐭 𝐮𝐧𝐯𝐞𝐢𝐥𝐞𝐝 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐚𝐥 𝐂𝐨𝐦𝐦𝐞𝐫𝐜𝐞 𝐏𝐫𝐨𝐭𝐨𝐜𝐨𝐥 𝐚𝐭 𝐍𝐑𝐅 𝟐𝟎𝟐𝟔. Three days ago, Microsoft launched Copilot Checkout. Last September, OpenAI launched Instant Checkout. Today, Google drops UCP with 20+ partners including Shopify, Walmart, and Target. Surface story? Tech giants racing to own AI commerce checkout. Actual story? Google spent 4 months building the entire infrastructure stack while competitors fought over buttons. September 2025: Google launches Agent Payments Protocol (AP2) — 60+ partners, supports cards to stablecoins, cryptographically-signed mandates for secure transactions. November 2024: Anthropic launches Model Context Protocol (MCP) — the AI-to-data plumbing that becomes the de facto industry standard. Thousands of implementations, adopted by OpenAI, Google DeepMind, Microsoft. January 2026: Google launches Universal Commerce Protocol (UCP) — the orchestration layer that sits between AI experiences and commerce backends. Built to work with MCP, A2A, and AP2. OpenAI and Microsoft built checkout features. Google built the protocol stack. 𝐇𝐞𝐫𝐞’𝐬 𝐰𝐡𝐚𝐭 𝐭𝐡𝐚𝐭 𝐦𝐞𝐚𝐧𝐬: UCP isn’t just about buying through Gemini. It’s the interoperability layer for the entire agentic commerce ecosystem. When you build on UCP + AP2 + MCP, you get: → Payment infrastructure (AP2) → Data/tool connectivity (MCP - industry standard) → Commerce orchestration (UCP) → Cross-platform compatibility OpenAI’s Agentic Commerce Protocol? Works with Instant Checkout in ChatGPT. Google’s stack? Works with everything. McKinsey projects $3-5 trillion in global agentic commerce by 2030. The battle isn’t for the best checkout button. It’s for the rails everyone else builds on. This reminds me of AWS. Amazon didn’t win cloud by having the best website — they won by building infrastructure others depended on. Google’s positioning UCP the same way. “Build on our protocols, connect to everything.” While you were watching checkout announcements, Google was laying railroad tracks. The companies that recognize this isn’t a feature race — it’s an infrastructure race — will be the ones integrating these protocols into their systems right now. #AI #AIEngineering #AgenticAI #Commerce #LLM

borysenok's tweet photo. 𝐆𝐨𝐨𝐠𝐥𝐞 𝐣𝐮𝐬𝐭 𝐮𝐧𝐯𝐞𝐢𝐥𝐞𝐝 𝐔𝐧𝐢𝐯𝐞𝐫𝐬𝐚𝐥 𝐂𝐨𝐦𝐦𝐞𝐫𝐜𝐞 𝐏𝐫𝐨𝐭𝐨𝐜𝐨𝐥 𝐚𝐭 𝐍𝐑𝐅 𝟐𝟎𝟐𝟔.

Three days ago, Microsoft launched Copilot Checkout. Last September, OpenAI launched Instant Checkout. Today, Google drops UCP with 20+ partners including Shopify, Walmart, and Target.

Surface story? Tech giants racing to own AI commerce checkout.

Actual story? Google spent 4 months building the entire infrastructure stack while competitors fought over buttons.

September 2025: Google launches Agent Payments Protocol (AP2) — 60+ partners, supports cards to stablecoins, cryptographically-signed mandates for secure transactions.

November 2024: Anthropic launches Model Context Protocol (MCP) — the AI-to-data plumbing that becomes the de facto industry standard. Thousands of implementations, adopted by OpenAI, Google DeepMind, Microsoft.

January 2026: Google launches Universal Commerce Protocol (UCP) — the orchestration layer that sits between AI experiences and commerce backends. Built to work with MCP, A2A, and AP2.

OpenAI and Microsoft built checkout features. Google built the protocol stack.

𝐇𝐞𝐫𝐞’𝐬 𝐰𝐡𝐚𝐭 𝐭𝐡𝐚𝐭 𝐦𝐞𝐚𝐧𝐬:

UCP isn’t just about buying through Gemini. It’s the interoperability layer for the entire agentic commerce ecosystem. When you build on UCP + AP2 + MCP, you get:

→ Payment infrastructure (AP2)
→ Data/tool connectivity (MCP - industry standard)
→ Commerce orchestration (UCP)
→ Cross-platform compatibility

OpenAI’s Agentic Commerce Protocol? Works with Instant Checkout in ChatGPT.
Google’s stack? Works with everything.

McKinsey projects $3-5 trillion in global agentic commerce by 2030. The battle isn’t for the best checkout button. It’s for the rails everyone else builds on.

This reminds me of AWS. Amazon didn’t win cloud by having the best website — they won by building infrastructure others depended on.

Google’s positioning UCP the same way. “Build on our protocols, connect to everything.”

While you were watching checkout announcements, Google was laying railroad tracks.

The companies that recognize this isn’t a feature race — it’s an infrastructure race — will be the ones integrating these protocols into their systems right now.

#AI #AIEngineering #AgenticAI #Commerce #LLM

Sundar Pichai

@sundarpichai

5 months ago

AI agents will be a big part of how we shop in the not-so-distant future. To help lay the groundwork, we partnered with Shopify, Etsy, Wayfair, Target and Walmart to create the Universal Commerce Protocol, a new open standard for agents and systems to talk to each other across every step of the shopping journey. And coming soon, UCP will power native checkout so you can buy directly on AI Mode and the @Geminiapp.

sundarpichai's tweet photo. AI agents will be a big part of how we shop in the not-so-distant future.

To help lay the groundwork, we partnered with Shopify, Etsy, Wayfair, Target and Walmart to create the Universal Commerce Protocol, a new open standard for agents and systems to talk to each other across every step of the shopping journey.

And coming soon, UCP will power native checkout so you can buy directly on AI Mode and the @Geminiapp.

16K

Roman Borysenok

@borysenok

5 months ago

Nina Schick wrote something that stopped me: “For 200,000 years, intelligence was the rarest resource on earth. Artificial intelligence breaks that pattern. It makes intelligence abundant.” She’s right about the transformation. She’s wrong about abundance. 𝗪𝗲’𝗿𝗲 𝗻𝗼𝘁 𝗶𝗻𝗱𝘂𝘀𝘁𝗿𝗶𝗮𝗹𝗶𝘇𝗶𝗻𝗴 𝗶𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲. 𝗪𝗲’𝗿𝗲 𝗳𝗲𝘂𝗱𝗮𝗹𝗶𝘇𝗶𝗻𝗴 𝗶𝘁. In January 2025, the US rolled out the AI Diffusion Framework. It divides the world into tiers. Tier 1 countries—US, UK, France, Germany, Japan, a handful of allies—get unrestricted access to AI compute. Everyone else gets rationed. Tier 2 includes most of the world. That’s Poland, Italy, Spain, Singapore, Israel. Countries we consider advanced economies and close partners. Their allocation? 49,901 H100-equivalent GPUs through 2027. Not per year. Total. Once you hit that cap, you wait until 2028. That sounds generous until you factor in hardware improvement rates. Those 49,901 H100s convert to 21,987 B200 equivalents. By 2027, they’re worth 13,192 GB300-equivalents. Hardware doubles in performance every 1.9 years, so that effective cap halves again. Meanwhile, current frontier models train on 100,000+ GPU clusters. By 2027, the expected standard is 320,000 GPUs. The framework states its goal explicitly: “keep data centers in Tier 2 countries behind the frontier of AI development.” This isn’t about security. It’s about dependency. Microsoft, Amazon, Google, and Meta spent $100 billion on AI infrastructure in the first six months of 2024 alone. The EU’s largest sovereign AI initiative, EuroHPC, has a budget of $2 billion. That’s 2% of what four companies spent in half a year. France’s sovereign supercomputer, Jean Zay, has 1,456 H100 GPUs. Microsoft is installing 25,000 GPUs in French data centers by the end of 2025. France’s sovereign capacity is 5.8% of what one American company is deploying in their country. Sixty-seven percent of German companies report they cannot operate without US hyperscalers. That’s not a market outcome. That’s structural dependency engineered through export controls, chip restrictions, and capital requirements that only a handful of entities can meet. Intelligence isn’t becoming abundant for most of the world. It’s becoming more scarce. Nations that don’t control the infrastructure layer—the data centers, the chips, the power grids—will rent intelligence on terms set by those who do. They can deploy current-generation models. They cannot train frontier models. They cannot set their own policies. They cannot choose to go offline if geopolitics shift. For 200,000 years, intelligence was scarce because it was locked in human minds. For the next few decades, it will remain scarce because it’s locked in compute infrastructure that only a few nations control access to. We haven’t escaped the constraint. We’ve just moved it from biology to policy. The countries that control the means of producing intelligence won’t just be wealthier or more powerful. They’ll define what problems get solved, what questions get asked, and who gets to participate in the future. This isn’t the industrialization of intelligence. It’s the creation of a new resource dependency—except this time, the resource is thinking itself. #AI #AIEngineering #SovereignAI #ComputeInfrastructure #Geopolitics

Nina Schick

@NinaDSchick

5 months ago

For two hundred thousand years, intelligence has been the rarest resource on earth. Locked inside individual human minds. Non‑scalable. Scarce. Every advance in civilization — every leap in science, art, industry, and statecraft — flowed from that scarcity. Artificial intelligence breaks that pattern. It makes intelligence abundant. It makes it cheap. It makes it scale. This is not just another wave of automation or software. It is the industrialization of intelligence itself. When intelligence becomes a utility, it stops being a tool that sits on top of society and starts becoming the foundation of society. It is a transformation as profound as the harnessing of electricity — but on a higher plane. Electricity powered machines. Industrial intelligence powers knowledge. And knowledge shapes everything. This shift will reorder the very structures that underpin nations. The two pillars that define sovereignty — economic strength and security — are being rebuilt on a substrate of machine intelligence. Nations that master this new utility will not simply gain efficiency. They will redefine what prosperity, power, and freedom mean in the 21st century. For me, this is the central story of our time. It is not about the latest app. It is not about hype cycles. It is about the first time in history that intelligence itself — the raw material of progress — has become infinite and industrial. The question is not whether it will transform society. It already is. The question is who will shape that transformation.

100

236

944

123K

5 months ago

5 months ago

Notion just quietly announced custom AI agents. Most coverage will frame this as “Notion adds agents to compete with productivity tools.” But look closer at what they actually built. They didn’t retrofit agents into their product. They rebuilt their entire architecture with GPT-5 from the ground up. They’re calling it an “AI-first workspace” - not AI-powered, AI-first. That language matters. The feature set tells you what they’re really doing: custom agents with triggers spanning Notion events, Slack messages, database updates, and selective web access. Multi-model support where agents auto-select the right model. Full MCP (Model Context Protocol) integration so external tools can read and write context. And here’s the part that caught my attention: they’re using these agents internally, extensively, to run their own operations. This isn’t a feature launch. It’s an infrastructure play. Think about where we are: OpenAI, Anthropic, Google, Microsoft all shipped agent platforms in 2025. MCP became the standard for connecting agents to data. Every company will soon have 10, 50, 100 agents running simultaneously. But agents need somewhere to actually do work. They need a control plane. They need orchestration. 𝗪𝗵𝗮𝘁 𝗡𝗼𝘁𝗶𝗼𝗻 𝗶𝘀 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗶𝘀𝗻’𝘁 𝗮 𝗽𝗿𝗼𝗱𝘂𝗰𝘁. 𝗜𝘁’𝘀 𝘁𝗵𝗲 𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗻𝗴 𝘀𝘆𝘀𝘁𝗲𝗺. The workspace becomes infrastructure. Pages become memory. Databases become state. Triggers become the event bus. Your connected apps become the peripheral devices. When companies run dozens of specialized agents (research agent, analysis agent, writing agent, data agent), they’ll need a single workspace where all those agents can see the same context, update the same databases, and coordinate through the same trigger system. The challenge isn’t building individual agents anymore - it’s figuring out where they coordinate, share context, and hand off work. That orchestration layer is the bottleneck Notion is solving. This is the same move AWS made with cloud infrastructure. Not “better servers” - servers as a service. Notion is going for “workspace as a service” in the agent era. The companies building the best individual agents will win short-term mindshare. But whoever controls the workspace layer where those agents actually execute? That’s the long game. #AI #AIEngineering #Agents #ProductStrategy #Notion

borysenok's tweet photo. Notion just quietly announced custom AI agents.

Most coverage will frame this as “Notion adds agents to compete with productivity tools.”

But look closer at what they actually built.

They didn’t retrofit agents into their product. They rebuilt their entire architecture with GPT-5 from the ground up. They’re calling it an “AI-first workspace” - not AI-powered, AI-first. That language matters.

The feature set tells you what they’re really doing: custom agents with triggers spanning Notion events, Slack messages, database updates, and selective web access. Multi-model support where agents auto-select the right model. Full MCP (Model Context Protocol) integration so external tools can read and write context.

And here’s the part that caught my attention: they’re using these agents internally, extensively, to run their own operations.

This isn’t a feature launch. It’s an infrastructure play.

Think about where we are: OpenAI, Anthropic, Google, Microsoft all shipped agent platforms in 2025. MCP became the standard for connecting agents to data. Every company will soon have 10, 50, 100 agents running simultaneously.

But agents need somewhere to actually do work. They need a control plane. They need orchestration.

𝗪𝗵𝗮𝘁 𝗡𝗼𝘁𝗶𝗼𝗻 𝗶𝘀 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗶𝘀𝗻’𝘁 𝗮 𝗽𝗿𝗼𝗱𝘂𝗰𝘁. 𝗜𝘁’𝘀 𝘁𝗵𝗲 𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗻𝗴 𝘀𝘆𝘀𝘁𝗲𝗺.

The workspace becomes infrastructure. Pages become memory. Databases become state. Triggers become the event bus. Your connected apps become the peripheral devices.

When companies run dozens of specialized agents (research agent, analysis agent, writing agent, data agent), they’ll need a single workspace where all those agents can see the same context, update the same databases, and coordinate through the same trigger system.

The challenge isn’t building individual agents anymore - it’s figuring out where they coordinate, share context, and hand off work. That orchestration layer is the bottleneck Notion is solving.

This is the same move AWS made with cloud infrastructure. Not “better servers” - servers as a service. Notion is going for “workspace as a service” in the agent era.

The companies building the best individual agents will win short-term mindshare. But whoever controls the workspace layer where those agents actually execute? That’s the long game.

#AI #AIEngineering #Agents #ProductStrategy #Notion

Roman Borysenok

@borysenok

6 months ago

https://t.co/R4wAf49VGr

Roman Borysenok

@borysenok

6 months ago

Karpathy just dropped his 2025 AI year in review—and buried the lead. Everyone expected 2025 to be the year of massive model scaling. GPT-5 with trillions of parameters. Gemini Ultra consuming entire datacenters. The race to AGI through sheer compute. That’s not what happened. Instead, 2025 was defined by something quieter but more fundamental: we figured out how to make AI systems reason without making them bigger. RLVR (Reinforcement Learning from Verifiable Rewards) ate the compute budget that was supposed to go into pretraining. DeepSeek R1 and o3 proved you could get step-function capability gains by running longer RL loops on the same base models. The numbers tell the story: DeepSeek V3 cost $5.6M to pretrain. The RLVR phase? Around $1M-2M. But that RL training delivered reasoning capabilities that would’ve required 10x the pretraining compute to achieve through scaling alone. Here’s what Karpathy noticed that I think most commentary is missing: the real paradigm shifts in 2025 weren’t about model capabilities at all. They were about deployment patterns and how humans interact with AI. Cursor revealed a new layer in the AI stack—the orchestration layer that bundles LLM calls for specific verticals. Claude Code proved that agents work better when they live on your localhost with your context and secrets, not in some sandboxed cloud container. Vibe coding crossed the threshold where programming became accessible to anyone who can describe what they want in English. And here’s the part that connects everything: Karpathy calls this shift from “evolving animals” to “summoning ghosts.” LLMs develop jagged intelligence—genius at math because RLVR can verify answers, confused by basic reasoning that lacks ground truth. They’re simultaneously smarter and dumber than we expected because they’re optimized for completely different pressures than human intelligence. At 8FIGURES, we’re seeing this play out in production. Our portfolio analysis AI spikes brilliantly on quantitative tasks with verifiable outputs but struggles with qualitative market sentiment that has no single correct answer. The deployment pattern that works isn’t “make the model bigger”—it’s “structure the problem so we can verify the reasoning.” This is why Cursor and Claude Code matter more than model benchmarks right now. They’re solving the interface problem: how do you give AI the context it needs, verify its outputs, and integrate it into actual workflows? The deployment layer is where 2025’s value was created, not in the model releases. Karpathy’s year in review is really a deployment roadmap disguised as a technical retrospective. The companies winning in 2026 won’t be the ones with the biggest models—they’ll be the ones who figured out how to structure problems for verifiable rewards and build interfaces that let AI access the right context. The AI capabilities plateau isn’t here. We just stopped climbing by making models bigger and started climbing by making them reason longer. #AI #AIEngineering #LLM #MachineLearning #SoftwareEngineering

borysenok's tweet photo. Karpathy just dropped his 2025 AI year in review—and buried the lead.

Everyone expected 2025 to be the year of massive model scaling. GPT-5 with trillions of parameters. Gemini Ultra consuming entire datacenters. The race to AGI through sheer compute.

That’s not what happened.

Instead, 2025 was defined by something quieter but more fundamental: we figured out how to make AI systems reason without making them bigger. RLVR (Reinforcement Learning from Verifiable Rewards) ate the compute budget that was supposed to go into pretraining. DeepSeek R1 and o3 proved you could get step-function capability gains by running longer RL loops on the same base models.

The numbers tell the story: DeepSeek V3 cost $5.6M to pretrain. The RLVR phase? Around $1M-2M. But that RL training delivered reasoning capabilities that would’ve required 10x the pretraining compute to achieve through scaling alone.

Here’s what Karpathy noticed that I think most commentary is missing: the real paradigm shifts in 2025 weren’t about model capabilities at all. They were about deployment patterns and how humans interact with AI.

Cursor revealed a new layer in the AI stack—the orchestration layer that bundles LLM calls for specific verticals. Claude Code proved that agents work better when they live on your localhost with your context and secrets, not in some sandboxed cloud container. Vibe coding crossed the threshold where programming became accessible to anyone who can describe what they want in English.

And here’s the part that connects everything: Karpathy calls this shift from “evolving animals” to “summoning ghosts.” LLMs develop jagged intelligence—genius at math because RLVR can verify answers, confused by basic reasoning that lacks ground truth. They’re simultaneously smarter and dumber than we expected because they’re optimized for completely different pressures than human intelligence.

At 8FIGURES, we’re seeing this play out in production. Our portfolio analysis AI spikes brilliantly on quantitative tasks with verifiable outputs but struggles with qualitative market sentiment that has no single correct answer. The deployment pattern that works isn’t “make the model bigger”—it’s “structure the problem so we can verify the reasoning.”

This is why Cursor and Claude Code matter more than model benchmarks right now. They’re solving the interface problem: how do you give AI the context it needs, verify its outputs, and integrate it into actual workflows? The deployment layer is where 2025’s value was created, not in the model releases.

Karpathy’s year in review is really a deployment roadmap disguised as a technical retrospective. The companies winning in 2026 won’t be the ones with the biggest models—they’ll be the ones who figured out how to structure problems for verifiable rewards and build interfaces that let AI access the right context.

The AI capabilities plateau isn’t here. We just stopped climbing by making models bigger and started climbing by making them reason longer.

#AI #AIEngineering #LLM #MachineLearning #SoftwareEngineering

Roman Borysenok

@borysenok

6 months ago

Hot take: Anthropic acquires Linear within 12 months. Not for project management. For agent orchestration. Two weeks ago, Anthropic made their first acquisition ever—Bun, the JavaScript runtime powering Claude Code. Most people saw infrastructure consolidation. I see Phase 2 of a vertical integration strategy that redefines where AI coding actually happens. 𝗧𝗵𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗲𝘃𝗲𝗿𝘆𝗼𝗻𝗲'𝘀 𝗺𝗶𝘀𝘀𝗶𝗻𝗴: Code generation is solved. Claude Sonnet 4.5 hit 72.7% on SWE-bench. Opus 4.5 is even better. But productivity gains flatlined at 10-15% instead of the expected 25-30%. Why? The Bain Technology Report 2025 found the answer: PR volume surged 98% with high AI adoption. PR review time jumped 91%. The bottleneck moved. Developers using AI blast through three tickets before lunch. Those PRs sit in review queues for days. Generation speed means nothing if orchestration breaks. Steve Yegge vibe-coded an entire project in 6 days using Claude Code. His solution? Beads—a task dependency graph that lets agents reason about complex plans across sessions. Greptile can autonomously approve PRs for side projects. But production deployments still need human CTOs to understand what changed and why. This is Karpathy's "autonomy slider" in action. We're moving from "AI writes code" to "AI coordinates work." 𝗧𝗵𝗲 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗰 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻: When autonomous agents submit PRs independently, what becomes the coordination layer? Not GitHub Issues. Not Jira. Those were built for humans. Linear already integrates with Cursor, Claude, and ChatGPT. It's used by OpenAI, Perplexity, and Vercel. Over 150,000 teams. 3.7x faster than JIRA. Linear is where engineering teams already coordinate work. It's the natural orchestration layer for multi-agent systems. Anthropic acquired Bun because Claude Code ships as a Bun executable. Control the runtime = control the distribution. Linear would give them the coordination layer where Technical PMs, CTOs, and Staff Engineers direct swarms of coding agents. From the trenches: I'm already seeing this pattern at 8FIGURES. The constraint isn't "can AI write this feature?" It's "how do we coordinate 5 parallel workstreams without human bottlenecks?" The companies building the full stack—model, runtime, orchestration—will compound their advantages faster than anyone using third-party tools. 𝗠𝘆 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻: By Q4 2026, the winners in AI coding won't be the best models. They'll be the most complete platforms. Anthropic is building that platform. Bun was the foundation. Linear (or something like it) is the missing piece. The battle isn't "who has the smartest AI?" anymore. It's "who owns the entire workflow?" Is this the smartest acquisition strategy in AI, or am I overthinking a runtime deal? #AI #AIEngineering #SoftwareEngineering #AgenticAI #EngineeringLeadership

borysenok's tweet photo. Hot take: Anthropic acquires Linear within 12 months.

Not for project management. For agent orchestration.

Two weeks ago, Anthropic made their first acquisition ever—Bun, the JavaScript runtime powering Claude Code.

Most people saw infrastructure consolidation.

I see Phase 2 of a vertical integration strategy that redefines where AI coding actually happens.

𝗧𝗵𝗲 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗲𝘃𝗲𝗿𝘆𝗼𝗻𝗲'𝘀 𝗺𝗶𝘀𝘀𝗶𝗻𝗴:

Code generation is solved. Claude Sonnet 4.5 hit 72.7% on SWE-bench. Opus 4.5 is even better.

But productivity gains flatlined at 10-15% instead of the expected 25-30%.

Why? The Bain Technology Report 2025 found the answer: PR volume surged 98% with high AI adoption. PR review time jumped 91%.

The bottleneck moved. Developers using AI blast through three tickets before lunch. Those PRs sit in review queues for days.

Generation speed means nothing if orchestration breaks.

Steve Yegge vibe-coded an entire project in 6 days using Claude Code. His solution? Beads—a task dependency graph that lets agents reason about complex plans across sessions.

Greptile can autonomously approve PRs for side projects. But production deployments still need human CTOs to understand what changed and why.

This is Karpathy's "autonomy slider" in action. We're moving from "AI writes code" to "AI coordinates work."

𝗧𝗵𝗲 𝘀𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗰 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻:

When autonomous agents submit PRs independently, what becomes the coordination layer?

Not GitHub Issues. Not Jira. Those were built for humans.

Linear already integrates with Cursor, Claude, and ChatGPT. It's used by OpenAI, Perplexity, and Vercel. Over 150,000 teams. 3.7x faster than JIRA.

Linear is where engineering teams already coordinate work. It's the natural orchestration layer for multi-agent systems.

Anthropic acquired Bun because Claude Code ships as a Bun executable. Control the runtime = control the distribution.

Linear would give them the coordination layer where Technical PMs, CTOs, and Staff Engineers direct swarms of coding agents.

From the trenches: I'm already seeing this pattern at 8FIGURES. The constraint isn't "can AI write this feature?" It's "how do we coordinate 5 parallel workstreams without human bottlenecks?"

The companies building the full stack—model, runtime, orchestration—will compound their advantages faster than anyone using third-party tools.

𝗠𝘆 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻:

By Q4 2026, the winners in AI coding won't be the best models. They'll be the most complete platforms.

Anthropic is building that platform. Bun was the foundation. Linear (or something like it) is the missing piece.

The battle isn't "who has the smartest AI?" anymore. It's "who owns the entire workflow?"

Is this the smartest acquisition strategy in AI, or am I overthinking a runtime deal?

#AI #AIEngineering #SoftwareEngineering #AgenticAI #EngineeringLeadership

Roman Borysenok

@borysenok

6 months ago

Anthropic just quietly announced Tasks Mode with 5 specialized entry points. Most people see: “Cool, Claude got an upgrade.” Here’s what they’re actually building: Seven days ago, Anthropic donated MCP to the Linux Foundation. Today, they’re testing Tasks Mode—five distinct workflows (Research, Analyze, Write, Build, Do More), each with granular controls for sources, effort levels, and output formats. This isn’t about adding features. It’s about **unbundling the AI interface**. Think about what’s happened in the past month: → Google embedded agents into Search, Finance, NotebookLM → Microsoft embedded agents into Teams and 365 → OpenAI built everything into ChatGPT → AWS released AgentCore with boundaries and memory Anthropic took a different path. They open-sourced the infrastructure layer (MCP—now adopted by ChatGPT, Copilot, Gemini, VS Code). Then they built the interface layer (Tasks Mode) as specialized workflows instead of one chat interface. The hidden play: **Make Claude the operating system for AI work.** Each task mode works like launching a different application: - Research: Configure sources (web vs peer-reviewed), effort level, interaction frequency - Analyze: Set validation type, comparison depth, output format - Write: Choose document type, writing mode, citation settings - Build: Select themes, layouts, output as code or artifacts Plus a sidebar for progress tracking (task manager) and context management (system resources). This is different from the chat-everything approach. You’re not having a conversation—you’re launching specialized workflows with pre-configured parameters for different types of work. From where I sit building AI systems, this shift matters. We’ve been forcing every workflow into conversational interfaces. Research requires different controls than writing. Analysis needs different parameters than building. Anthropic’s bet: By 2026, we won’t “chat with AI”—we’ll launch task-specific workflows in an AI operating system. The strategy becomes clear when you connect the dots: 1. Donate MCP → becomes universal protocol (the TCP/IP of AI) 1. Launch Tasks Mode → becomes the interface layer (the Windows/MacOS of AI) 1. Already have Claude Code → handles developer workflows While others embed AI into existing apps, Anthropic is building the platform layer where specialized AI applications live. The race isn’t just about better models anymore. It’s about who owns the interface layer between humans and AI workflows. #AI #AIEngineering #Anthropic #Claude #AgenticAI

borysenok's tweet photo. Anthropic just quietly announced Tasks Mode with 5 specialized entry points.

Most people see: “Cool, Claude got an upgrade.”

Here’s what they’re actually building:

Seven days ago, Anthropic donated MCP to the Linux Foundation. Today, they’re testing Tasks Mode—five distinct workflows (Research, Analyze, Write, Build, Do More), each with granular controls for sources, effort levels, and output formats.

This isn’t about adding features. It’s about **unbundling the AI interface**.

Think about what’s happened in the past month:
→ Google embedded agents into Search, Finance, NotebookLM
→ Microsoft embedded agents into Teams and 365
→ OpenAI built everything into ChatGPT
→ AWS released AgentCore with boundaries and memory

Anthropic took a different path.

They open-sourced the infrastructure layer (MCP—now adopted by ChatGPT, Copilot, Gemini, VS Code). Then they built the interface layer (Tasks Mode) as specialized workflows instead of one chat interface.

The hidden play: **Make Claude the operating system for AI work.**

Each task mode works like launching a different application:

- Research: Configure sources (web vs peer-reviewed), effort level, interaction frequency
- Analyze: Set validation type, comparison depth, output format
- Write: Choose document type, writing mode, citation settings
- Build: Select themes, layouts, output as code or artifacts

Plus a sidebar for progress tracking (task manager) and context management (system resources).

This is different from the chat-everything approach. You’re not having a conversation—you’re launching specialized workflows with pre-configured parameters for different types of work.

From where I sit building AI systems, this shift matters. We’ve been forcing every workflow into conversational interfaces. Research requires different controls than writing. Analysis needs different parameters than building.

Anthropic’s bet: By 2026, we won’t “chat with AI”—we’ll launch task-specific workflows in an AI operating system.

The strategy becomes clear when you connect the dots:

1. Donate MCP → becomes universal protocol (the TCP/IP of AI)
1. Launch Tasks Mode → becomes the interface layer (the Windows/MacOS of AI)
1. Already have Claude Code → handles developer workflows

While others embed AI into existing apps, Anthropic is building the platform layer where specialized AI applications live.

The race isn’t just about better models anymore. It’s about who owns the interface layer between humans and AI workflows.

#AI #AIEngineering #Anthropic #Claude #AgenticAI

Roman Borysenok

@borysenok

6 months ago

Cursor just launched a visual editor. Everyone’s calling it “Figma for developers.” They’re missing the real story. Four days ago, Cursor released a browser visual editor that lets you drag, drop, and point-click your way through frontend design. No context switching between design tools and code. Just describe what you want, and AI agents update the underlying code in parallel. Looks like a head-on collision with Figma, right? Look closer at what’s actually happening. While Figma spent 2025 adding code generation (Figma Make, MCP servers, design-to-code), Cursor just did the reverse—adding visual design to a code editor. They’re racing toward the same point from opposite directions. But here’s the part worth paying attention to: this convergence isn’t about designer vs. developer workflows. It’s about a $264 billion market that didn’t exist five years ago. The low-code/no-code market was $10B in 2019. It’ll hit $264B by 2032—a 32% compound annual growth rate. Right now, 70% of new applications use low-code tools, up from 25% in 2020. 60% of custom apps? Built outside IT departments. 41% of businesses run active citizen developer programs—employees with no formal coding background building production software. These aren’t side projects. Companies like Ricoh report 253% ROI replacing legacy systems with low-code platforms. Cursor’s visual editor isn’t targeting developers who want to design faster. It’s targeting the hundreds of thousands of product managers, designers, and business analysts who need to build software but can’t write React components from scratch. The same people driving that 32% growth rate. So while everyone debates “Cursor vs. Figma,” the actual question is: who reaches the convergence point first? Who makes building software feel as intuitive as using a design tool? Because whoever wins that race owns access to a market that’s adding $20-30 billion in value every single year. The skill gap driving this is real. 70% of companies can’t find the developers they need. Low-code platforms cut development time by 90% and save the cost of hiring two full-time engineers—$4.4M in value over three years. Cursor’s bet: the future of software development doesn’t look like “non-technical people learning to code.” It looks like tools smart enough that the distinction stops mattering. Is Cursor’s visual editor the beginning of the end for traditional frontend development, or are we watching two tools solve different problems heading for an inevitable merge? #AI #SoftwareDevelopment #LowCode #NoCode #AIEngineering

Roman Borysenok

@borysenok

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users