@sama I’d love to see OpenAI stay ahead with truly advanced and innovative moves, not just follow Anthropic’s playbook.
And please, keep it cheaper than Opus. lol
This is exactly why AI coding security needs to be treated as part of the development workflow, not as an afterthought.
As agents gain access to repos, terminals, files, and secrets, prompt injection and supply chain attacks become much more serious.
Teams need least privilege, secret isolation, dependency review, audit logs, and human checkpoints before sensitive actions.
A unified agent is an important direction.
Keeping context across modules can reduce a lot of friction in real development workflows.
The challenge is state management and scope control:
when an agent understands more of the system, it can also affect more of the system.
For large codebases, visibility into what context was used and why a change was made becomes critical.
This looks like a very useful workshop.
The MCP layer is powerful because it gives agents real capabilities, not just text output.
I’d also pay close attention to the production boundaries:
permissions, audit logs, safe failure modes, rate limits, and when the agent should stop and ask for human review.
Tool access is where productivity and risk both increase.
Strongly agree with this.
Once AI coding moves beyond demos, the real problem becomes system reliability.
Multi-agent workflows can fail in ways that are hard to see from the final diff alone:
conflicting assumptions, hidden state, duplicated logic, and silent architecture drift.
Guardrails and observability are not optional anymore.
I agree. The biggest change is not just that the models are better.
It is that the workflow around them is changing:
planning, generating, reviewing, testing, and iterating with AI in the loop.
For real projects, the key question becomes how to keep speed without losing architecture, codebase coherence, and review discipline.
This is a fascinating direction.
AI “employees” make sense when tasks can be split, delegated, reviewed, and iterated quickly.
The hard part is the CEO layer:
setting priorities, resolving conflicts between agents, detecting failure modes, and deciding when human judgment must override the system.
The future may be less about replacing management and more about making orchestration a core skill.
This point about teams is important.
As AI makes it easier for anyone to build, the differentiator becomes less about generating the first version and more about how teams coordinate, review, and improve the system over time.
For AI workspaces, orchestration, shared context, permissions, and accountability may become just as important as model capability.
This is a strong example of how much AI can compress the path from idea to working prototype.
What I find interesting is the next phase after the demo:
maintenance, security updates, user feedback, edge cases, and keeping the architecture clean as the product evolves.
AI can make the first version much faster. The long-term engineering layer still matters.
This is a great way to understand what AI coding agents are actually doing under the hood.
Building a smaller version yourself helps clarify the real pieces:
context gathering, tool use, planning, execution, and feedback loops.
It also makes the production challenges more visible: permissions, error recovery, observability, and keeping changes coherent across a larger codebase.
This is a great step for making agentic workflows more accessible.
Auto Mode can be very powerful when the task is well-scoped and the feedback loop is clear.
I’d also love to see more visibility into long-running behavior: what decisions were made, which files changed, why the agent continued, and where it decided to stop.
I agree with this direction.
Multi-agent workflows make sense because real engineering work already has separate roles: research, implementation, review, testing, and coordination.
One additional challenge is orchestration quality. As soon as multiple agents work in parallel, teams need clear ownership, conflict handling, observability, and human checkpoints.
This is an important angle.
AI coding tools are not only a productivity question anymore. At enterprise scale, they also become a cost predictability and governance question.
The real challenge may be less “can AI write code?” and more:
Can teams control usage, forecast spend, audit workflows, and decide where human review is still required?
I understand that reaction.
Flash models often optimize for speed and cost, so I’d be careful about judging the whole 3.5 line from Flash alone.
For Gemini 3.5 Pro, I’d want to test repeated workflows: coding, refactoring, long-context analysis, tool calls, and agent loops. That is where reliability becomes visible.
Good clarification.
A lot of the current discussion seems to mix Gemini 3.5 Flash, expected 3.5 Pro behavior, and rumored comparisons.
It is worth separating them clearly, because Flash performance does not necessarily tell us how Pro will behave in coding, reasoning, or agent workflows.
If Flash was the warmup, Gemini 3.5 Pro will be interesting to watch.
For developers, higher cost can be acceptable if it brings stronger reliability:
better coding, long-context stability, tool use, and fewer failures in agent workflows.
The key question is whether Pro is not only smarter, but more predictable.
That’s an interesting angle.
Gemma may be more exciting for some developers because open or more controllable models can be easier to evaluate, fine-tune, and integrate into custom workflows.
Gemini 3.5 Pro may win on raw capability, but Gemma could be more attractive where control, cost, and deployment flexibility matter.
That would be exciting if Gemini 3.5 Pro brings stronger reasoning and coding.
One thing I’d also like to see is stability across longer workflows.
For developers, the real test is not only solving one coding task, but handling refactoring, tests, tool use, context changes, and edge cases without drifting.