PM at JetBrains AI. Self-nominated docops evangelist. Memes as a service
🐘 @ lananovikova @techhub.social
🦋@ lananovikova .bsky.social
Opinions are my own
The future of agents isn't one genius model. It's fleets of small, fast specialists doing the toil — cheaply.
Mellum2 is out: 12B-2.5B MoE, Apache 2.0, open weights on HF.
Built on a mixture-of-experts (MoE) architecture, Mellum2 delivers high-performance inference, often twice as fast as similar-sized models, while maintaining strong quality across code generation, science, math, and reasoning benchmarks.
Try Mellum2: https://t.co/Rf4w5gs9In
My favorite part of the Codex plugin for Claude Code is rescue.
Not because everyone will use it.
Because it’s brilliant technical marketing:
don’t fight the incumbent workflow, insert yourself at the moment of failure.
That’s a much smarter wedge than asking people to switch tools entirely.
Starting today you can use Codex in Claude Code 👀
/plugin marketplace add openai/codex-plugin-cc
Try it out today with:
/codex:review for a normal read-only Codex review
/codex:adversarial-review for a steerable challenge review
/codex:rescue to let codex rescue your code
Enjoy Codex-ing!
Food for thought from this HBR piece (https://t.co/rxXcHc6WXb): AI doesn’t always reduce work – it can intensify it. In my experience, that’s often exactly what happens.
Faster drafts, summaries, and debugging don’t automatically create slack. They often just raise expectations for speed, volume, and responsiveness instead.
And what is your experience?
For agentic systems founders and dev tools founders:
People do not want to pay for raw markdown and they shouldn't have to.
But they may pay for orchestration, hosting, updates, collaboration, portability, analytics, and managed execution.
These can be great businesses.
The current AI coding paradox:
Agents increase throughput only when tasks are scoped tightly enough to run unattended.
But if tasks are scoped that tightly, humans often become the bottleneck again.
Because up to the point you described all the details, another task you're running in parallel is likely to be finished and in need of review. *thinking_dino*
That’s why I think AI vendors should invest more in open interfaces:
MCP for tools
A2A for agents
ACP for IDEs/editors
Interoperability is a product feature.
Vertical integration looks great on strategy slides. Real teams are messy. Mixed (AI) tools, legacy systems, fragmented workflows. That’s why openness often beats owning the whole stack.
@borshchguy Exactly. "Trust at 3am" usually comes down to very concrete things: previews, diffs, audit logs, permissions/guardrails, and rollback – much more than the agent loop itself.
An AI agent is not a product because it can act.
It becomes a product when a human can predict, inspect, and trust its actions.
The industry is converging on this: the real product is the harness around autonomy.
#agentorchestration#aiagent#agenticai
@getsome_air is also looking in this direction with the Agentic review, a feature that reviews the changes an agent has made and leaves comments which you then can send to other agents sessions to address. Pro tip: set a different model for review than you use for code generation 😉
Even VS Code’s agent UX points in this direction: the value is not just that the agent can act, but that humans can review changes, inspect the session, and decide what gets applied.
That’s the harness around autonomy.
https://t.co/FFYupsezcg
I agree that a good code review is context-heavy as it combines code understanding with human intent.
But "needs a lot of context" and "needs a frontier model" are different claims.
Recent work on long-context and repo-level code tasks suggests a lot depends on retrieving the right code/API context and filtering noise, not on throwing the biggest model at the long diff. We have been experimenting with compacting the long diffs for git message completions, and medium-sized models (4b/7b) were enough. Frontier models still make sense for the hardest ambiguous cases, but I doubt they’re the right default forever.
Half of AI product management isn't picking the best model. It's deciding what should never touch one – and what needs a much smaller one or even non-AI.
The PM skill isn't knowing AI. It's knowing which AI – and when "no AI" is the right call.
The most important trade-off is rarely intelligence. It's frequency × latency × cost × trust.
Frontier models are for ambiguity, reasoning, generation. Everything else is an engineering choice you're paying for with latency and trust.
Even high-hype features like AI code review could be done using smaller models.
AI’s next phase is not about better prompts. It’s about better operating models.
I use Air as a PM to orchestrate agent workflows across research, synthesis, and execution. The value is not just productivity, it’s visibility, parallelism, and staying hands-on while agents work.
Writing code isn't the hard part. Creating a productive workflow is.
Air offers parallel, isolated execution, full-project review, and support for Codex, Claude Agent, Gemini CLI, and Junie – all in one place.
Building with agents?
Download Air for free: https://t.co/mYHt6i4sa5
New Air update is live:
- Resume Codex tasks and continue where you left off
- Choose Codex thinking level for better control
- Word-level diff highlighting shows exactly what changed
Learn more and download Air at https://t.co/WSrG3yONPH