Alessio Masucci @SucciMas - Twitter Profile

Pinned Tweet

4 months ago

Everyone's debating which model codes better. Meanwhile OpenAI and Anthropic both dropped guides on the same thing: harness engineering. The model isn't the bottleneck. Your scaffolding is. 🧵

SucciMas's tweet photo. Everyone's debating which model codes better. Meanwhile OpenAI and Anthropic both dropped guides on the same thing: harness engineering. The model isn't the bottleneck. Your scaffolding is.
🧵 https://t.co/KDXe2mqxb8

1

7

0

18

3K

Alessio Masucci

@SucciMas

13 days ago

You should try this

Vaibhav (VB) Srivastav

@reach_vb

13 days ago

UPDATE: Came up with an even better version of this prompt after the feedback Ask Codex to look across your sessions, Memories, and Chronicle, identify patterns, reuse what already exists, and only create the smallest useful skill, subagent, or automation. "Look back over my recent work from the last 30 days, or all available history if shorter, and identify repeated manual workflows worth packaging. Use available evidence in this order: - Recent Codex sessions and task summaries. - Codex Memories and rollout summaries to find patterns repeated across sessions. - Chronicle, if enabled, to spot repeated work outside Codex. Use Chronicle for discovery only; confirm important details in the relevant source system when possible. - Existing skills, custom agents, and automations, so you reuse or extend what already exists instead of duplicating it. Look broadly for work that is repeated, time-consuming, error-prone, context-heavy, or benefits from a consistent process. Include workflows across coding, research, writing, planning, communication, operations, analysis, and personal administration. Only act on a candidate when it: - occurred at least twice, or is clearly likely to recur and costly to repeat; - has stable inputs, a repeatable procedure, and a clear output or stopping condition; - would materially improve speed, quality, consistency, or reliability; - is not already adequately covered. Choose the smallest appropriate form: - Skill: a reusable workflow or playbook. - Custom subagent: a bounded specialist role or investigation task suitable for delegation. - Automation: a scheduled or recurring check, report, reminder, or monitor. - Skip: work that is too one-off, ambiguous, sensitive, or poorly evidenced to package. First produce a compact shortlist with: - repeated workflow - supporting evidence and dates - frequency/confidence - recommended form: skill, subagent, automation, extend existing, or skip - why it is or is not worth creating Then create only the high-confidence missing items. Keep them narrow, practical, source-aware, and easy to validate. Do not create speculative, overlapping, or overly broad assets. Finish with: - what you created or extended - what you deliberately skipped - what needs more evidence before packaging"

reach_vb's tweet photo. UPDATE: Came up with an even better version of this prompt after the feedback

Ask Codex to look across your sessions, Memories, and Chronicle, identify patterns, reuse what already exists, and only create the smallest useful skill, subagent, or automation.

"Look back over my recent work from the last 30 days, or all available history if shorter, and identify repeated manual workflows worth packaging.

Use available evidence in this order:
- Recent Codex sessions and task summaries.
- Codex Memories and rollout summaries to find patterns repeated across sessions.
- Chronicle, if enabled, to spot repeated work outside Codex. Use Chronicle for discovery only; confirm important details in the relevant source system when possible.
- Existing skills, custom agents, and automations, so you reuse or extend what already exists instead of duplicating it.

Look broadly for work that is repeated, time-consuming, error-prone, context-heavy, or benefits from a consistent process. Include workflows across coding, research, writing, planning, communication, operations, analysis, and personal administration.

Only act on a candidate when it:
- occurred at least twice, or is clearly likely to recur and costly to repeat;
- has stable inputs, a repeatable procedure, and a clear output or stopping condition;
- would materially improve speed, quality, consistency, or reliability;
- is not already adequately covered.

Choose the smallest appropriate form:
- Skill: a reusable workflow or playbook.
- Custom subagent: a bounded specialist role or investigation task suitable for delegation.
- Automation: a scheduled or recurring check, report, reminder, or monitor.
- Skip: work that is too one-off, ambiguous, sensitive, or poorly evidenced to package.

First produce a compact shortlist with:
- repeated workflow
- supporting evidence and dates
- frequency/confidence
- recommended form: skill, subagent, automation, extend existing, or skip
- why it is or is not worth creating

Then create only the high-confidence missing items. Keep them narrow, practical, source-aware, and easy to validate. Do not create speculative, overlapping, or overly broad assets.

Finish with:
- what you created or extended
- what you deliberately skipped
- what needs more evidence before packaging"

97

4K

369

8K

866K

0

6

SucciMas retweeted

Siddhartha Saxena

@siddsax

13 days ago

Anthropic onboarding day: Michael Scott introducing Karpathy like he just signed Wemby in free agency.

394

18K

1K

4K

2M

SucciMas retweeted

Andrej Karpathy

@karpathy

18 days ago

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

8K

150K

11K

14K

27M

Who to follow

Givon C.

@nomadvon

Product Manager • Cloud Networking and stuff. thoughts are my own.

Guido D'Orsi

@sockhatso

Building the Jazz local first framework @ Garden Co

SucciMas retweeted

23 days ago

I cancelled my Claude Code sub. I give up.

185

2K

83

793

354K

SucciMas retweeted

Sundar Pichai

@sundarpichai

about 1 month ago

You can now ask Gemini to create Docs, Sheets, Slides, PDFs, and more directly in your chat. No more copying, pasting, or reformatting, just prompt and download. Available globally for all @GeminiApp users.

605

18K

2K

4K

2M

SucciMas retweeted

Zach Lloyd

@zachlloydtweets

about 1 month ago

https://t.co/cjyoRlzgxO

72

2K

230

936

244K

Alessio Masucci

@SucciMas

about 1 month ago

“most people could comfortably manage three to five sessions at a time before context switching became painful. Beyond that, productivity dropped” https://t.co/AaCqBi4qj4

0

13

Alessio Masucci

@SucciMas

about 1 month ago

ACE just showed us the future of AI engineering. No more one dev + 20 isolated agents creating chaos. GitHub Next built the missing piece: realtime multiplayer agents in shared sandboxes with Slack-style chat + GitHub-style versioning. This is how serious software gets built when agents scale. @Mappletons nailed it 👇

AI Engineer

@aiDotEngineer

about 1 month ago

🆕Collaborative AI Engineering: One Dev, Two Dozen Agents, Zero Alignment — @mappletons https://t.co/7SEayJwZLe Agentic engineering so far has been a solo story: one developer and a dozen agents moving at warp speed. But speed without thoughtful planning and team alignment is just wasting tokens. When everyone on a team is directing agents alone in their personal CLI tools with no shared context, you get duplicate work, conflicting changes, poorly-designed solutions, surprise features nobody else agreed to build, and everyone pulling in different directions. Serious software still requires serious collaboration. You need multiple perspectives and types of expertise to build great things. We need agentic environments where people can plan together, think critically together, and share the same context. In this talk I'll demo how we've tackled these design problems in Ace, a multiplayer agent environment from GitHub Next that uses real-time collaboration, proactive agents, and sandboxed micro VMs for rapid prototyping and exploration.

5

86

12

72

30K

0

46

Alessio Masucci

@SucciMas

about 1 month ago

It’s here

OpenAI

@OpenAI

about 1 month ago

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

2K

52K

7K

9K

13M

0

35

Alessio Masucci

@SucciMas

about 1 month ago

That’s the one cool feature: each Droid gets its own forever-running computer that actually remembers files, credentials, and context instead of resetting every time

Factory

@FactoryAI

about 1 month ago

Today we're opening access to Droid Computers: persistent machines for remotely orchestrating Droids. Spin one up in Factory's cloud or turn your machine into a Droid Computer. Either way, Droids have a dev environment with its own filesystem, credentials, and configurations.

30

718

51

458

223K

0

13

0

1

597

SucciMas retweeted

Graeme

@gkisokay

about 2 months ago

The Local LLM cheat sheet for your 16GB RAM device I pulled together a lineup of small models that can run comfortably on a Mac Mini or personal laptop while still leaving room for context without melting your machine. Models for Daily Use Qwen3.5 9B / GGUF / Q4_K_M Daily driver. General chat, drafting, research, translation. If you're keeping only one, keep this. DeepSeek-R1 Distill Qwen 7B / GGUF / Q4_K_M Reasoning engine. Math, logic, step-by-step problems. Slower, but worth it when you need actual thinking. Models for Specialty Work Qwen2.5 Coder 7B / GGUF / Q4_K_M Code specialist. Completions, refactors, debugging, repo Q&A. Better than a generalist when the task is code. Llama 3.1 8B / GGUF / Q4_K_M Long context worker. RAG, doc chat, codebase Q and A. The output isn't top tier, but the context is strong for its size. Phi-4 Mini Reasoning / GGUF / Q4_K_M Compact thinker. Logic, structured answers, math, and short coding bursts. Smaller context is the catch. Models for Efficiency Gemma 4 E4B / GGUF / Q4_K_M Light all-rounder. Writing, chat, light agents, structured output. Phi-3.5 Mini / GGUF / Q5_K_M Pocket sidekick. Summaries, extraction, background doc chat. Easy to pair with a bigger model. Qwen3.5 2B / GGUF / Q4_K_M Useful for summaries, tagging, rewrites, and lightweight sidekick work. Micro Models Qwen3.5 0.8B / GGUF / Q5_K_M Classification, keyword routing, binary decisions, triage. Gemma 4 E2B-it / GGUF / Q4_K_M Lightweight chat, quick Q and A, summaries, tiny agents. My personal choice for a single model is Qwen3.5 9B For two models use Qwen3.5 9B + Qwen2.5 Coder 7B for code, or Qwen3.5 9B + Phi-3.5 Mini for support tasks. Let me know in the comments your experience with these models, or any I have left out.

gkisokay's tweet photo. The Local LLM cheat sheet for your 16GB RAM device

I pulled together a lineup of small models that can run comfortably on a Mac Mini or personal laptop while still leaving room for context without melting your machine.

Models for Daily Use

Qwen3.5 9B / GGUF / Q4_K_M
Daily driver. General chat, drafting, research, translation. If you're keeping only one, keep this.

DeepSeek-R1 Distill Qwen 7B / GGUF / Q4_K_M
Reasoning engine. Math, logic, step-by-step problems. Slower, but worth it when you need actual thinking.

Models for Specialty Work

Qwen2.5 Coder 7B / GGUF / Q4_K_M
Code specialist. Completions, refactors, debugging, repo Q&A. Better than a generalist when the task is code.

Llama 3.1 8B / GGUF / Q4_K_M
Long context worker. RAG, doc chat, codebase Q and A. The output isn't top tier, but the context is strong for its size.

Phi-4 Mini Reasoning / GGUF / Q4_K_M
Compact thinker. Logic, structured answers, math, and short coding bursts. Smaller context is the catch.

Models for Efficiency

Gemma 4 E4B / GGUF / Q4_K_M
Light all-rounder. Writing, chat, light agents, structured output.

Phi-3.5 Mini / GGUF / Q5_K_M
Pocket sidekick. Summaries, extraction, background doc chat. Easy to pair with a bigger model.

Qwen3.5 2B / GGUF / Q4_K_M
Useful for summaries, tagging, rewrites, and lightweight sidekick work.

Micro Models

Qwen3.5 0.8B / GGUF / Q5_K_M
Classification, keyword routing, binary decisions, triage.

Gemma 4 E2B-it / GGUF / Q4_K_M
Lightweight chat, quick Q and A, summaries, tiny agents.

My personal choice for a single model is Qwen3.5 9B

For two models use Qwen3.5 9B + Qwen2.5 Coder 7B for code, or Qwen3.5 9B + Phi-3.5 Mini for support tasks.

Let me know in the comments your experience with these models, or any I have left out.

97

2K

346

4K

417K

Alessio Masucci

@SucciMas

about 1 month ago

Let’s recap

Silicon Mania @siliconmania

about 1 month ago

last week in tech was based.

151

5K

453

2K

1M

0

1

0

18

SucciMas retweeted

Google Cloud Tech

@GoogleCloudTech

about 1 month ago

https://t.co/NQyklWClQO

6

1K

226

2K

173K

Alessio Masucci

@SucciMas

about 1 month ago

I think you’ll find this interesting

Walden

@walden_yan

about 1 month ago

https://t.co/iY0G5ieQhJ

28

2K

193

4K

629K

0

10

Alessio Masucci

@SucciMas

about 1 month ago

@theo Excellent contest to hit the limits of all those who tried 😂

0

376

SucciMas retweeted

James Cowling

@jamesacowling

about 1 month ago

Convex now has fully flexible cloud deployments. You can create a bunch of them to test out new features. You can split your prod workload across multiple deployments. They can even be set to auto-expire. Your agent is going to want to know about this.