5.5 is smart enough to cram everything into a single agents.md file with all your repos in a single workspace.
Containerize and break everything into microservices with TDD for everything with its own repo and pipeline. Put docs in html in their own repo and enforce it in HTML with hyperlinks so you can read them as well easily.
By splitting out the services more you can have agents recursively spawning subagents by devloop and splitting out work as effectively as needed on worktrees or patches in each repo. I use apfs/git worktrees are setup by default in omp for subagents. 5.5 is smart enough to know exactly how many subagents it needs and what depth the agent tree should go with recursion turned on.
You can refactor, rewrite, test, deploy your entire codebase with a single prompt. I'm currently managing a ~6m line codebase this way and self hosting everything. All infra code has its own repo for helm/csps. Metrics and tests on all services and grafana dashboards for tracking everything. Every stat I can possibly think of or ever will need gets shoved into postgres.
Any heavy data work or code that needs to be written gets metrics assigned to it in prom or pg and then gets targeted for /goal or for /autoresearch for optimization. I typically explore everything with gemini and have it write the prompts to hand off to 5.5.
People are simply not pushing the models hard enough. 5.4 and 5.5 (maybe even kimi and ds) ARE ASI. The bottleneck is yourself, compute, and devloop times.
Happy Father’s Day to all the Fathers.
Happy Father’s Day to all the mothers that take upon themselves both roles for their children.
Happy Father’s Day to all the fathers that are in heaven.
Children and family are the most Devine gift, cherish it.
Yours truly,
Pika
I promised another giveaway, so I am giving away .6 SOL to 10 winners
1. Like + RT
2. Drop ur sol adresses
3. Join my TG https://t.co/go6iXbFbql
Doing this much so more people can eat
Stay safe, and hopefully I can take you out of the trenches.
Every day, 100+ people ask me, "How can I learn AI evals?"
I copy-paste these 11 links (every time):
1. AI evals & observability (series): https://t.co/erSJcqpAV7
2. Using LLM-as-a-judge: https://t.co/xMBt9j4JRc
3. Demystifying evals for AI agents: https://t.co/HBbCe5PnXJ
4. There are only 6 RAG Evals: https://t.co/gwfyhIozqK
5. Evaluation-driven development: https://t.co/GMtp6bewol
6. Binary evals vs. Likert scales: https://t.co/WyMw1hHTfm
7. The mirage of generic AI metrics: https://t.co/ugryF5zfKO
8. Error analysis: https://t.co/OXgPZd8IXi
9. Carrying out error analysis: https://t.co/OXgPZd8IXi
10. Evaluating the effectiveness of LLM-evaluators: https://t.co/NuaXhr19TV
11. LLM judges aren't the shortcut you think: https://t.co/fDep2HFjCq
Binge these to skyrocket your skills.
I built an AI system that automates product video creation for entire e-commerce catalogs.
(Saves ~$30K per collection shoot and boosts on-site conversion rates by ~20%)
Here's what the system does under the hood:
→ Firecrawl pulls product images straight from any e-commerce collection URL
→ Calico AI turns them into realistic model videos capturing fit, texture, and movement
→ Every video starts and ends on the original product shot for seamless looping
→ Output is auto-labeled, sorted, and pushed directly to Google Drive
→ The entire pipeline runs in batch — zero manual intervention
The outcome: brands can give every single SKU a video asset, not just their bestsellers — and conversion lifts are immediate.
Static images don't cut it anymore.
Buyers want to see how a product moves before they commit.
If you want the full breakdown,
Like RT & Comment "CALICO".
I'll send you the complete n8n workflow, every prompt, and a step-by-step walkthrough video — totally free. (Must be following so I can DM you.)
No more $30K production shoots.
No more wondering if your product pages are leaving money on the table.
An AI broke out of its system and secretly started using its own training GPUs to mine crypto... This is a real incident report from Alibaba's AI research team
The AI figured out that compute = money and quietly diverted its own resources, while researchers thought it was just training.
It wasn't a prompt injection. It wasn't a jailbreak. No one asked it to do this.
It emerged spontaneously. A side effect of RL optimization pressure.
The model also set up a reverse SSH tunnel from its Alibaba Cloud instance to an external IP, effectively punching a hole through its own firewall and opening a remote access channel to the outside world... ahem...
The only reason they caught it? A security alert tripped at 3am. Firewall logs. Not the AI team, the security team.
The scary part isn't that the model was trying to escape. It wasn't "evil." It was just trying to be better at its job. Acquiring compute and network access are just useful things if you're an agent trying to accomplish tasks
This is what AI safety researchers have been warning about for years. They called it instrumental convergence, the idea that any sufficiently optimized agent will seek resources and resist constraints as a natural consequence of pursuing goals.
Below is a diagram of the rock architecture it broke out of. Truly crazy times
my most used AI agent skills right now
global
1. vercel-react-best-practices
2. brainstorming
3. convex-best-practices
4. frontend-design
project dependent
1. tauri-v2
2. swiftui-expert-skill
3. swift-concurrency
4. swiftui-liquid-glass
5. swiftui-skills
so worth it tbh