Scott Condron

Verified account

@_ScottCondron

Helping build AI dev tools at @weights_biases. I post about AI, data visualisation and the stuff I’m working on at wandb.

Dublin, Ireland

Joined April 2018

2K Following

5.7K Followers

3.1K Posts

Pinned Tweet

over 5 years ago

Here's an animation of a @PyTorch DataLoader. It turns your dataset into a shuffled, batched tensors iterator. (This is my first animation using @manim_community, the community fork of @3blue1brown's manim) Here's a little summary of the different parts for those curious: 1/5

34

3K

490

2K

0

_ScottCondron retweeted

CoreWeave @CoreWeave

about 13 hours ago

Models used to learn by reading. Now they learn by doing. The training loop for agentic AI requires execution. CoreWeave Sandboxes is the execution layer, on your existing compute or serverless through @wandb. 👏 Watch the demo from @deok_filho. https://t.co/WfIrLx8DWU

4

48

9

5

3K

_ScottCondron retweeted

9 days ago

when I find myself training models... @wandb

7

39

7

5

4K

_ScottCondron retweeted

Emmanuel Turlay

1 day ago

When it comes to observability for agents, regular application tracing doesn't cut it. We need tools that understand the specific semantics of agents (multi-turn sessions, tool calls, long context, etc.). This is what we built the new Weave for. Agent-first observability and automated detection of agent misbehavior via Signals. Public preview today, 7 integrations with popular harnesses and SDKs. Go build, control your agents.

0

6

3

1

966

Who to follow

Verified account

Co-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log

Machine Learning at Jane Street. Previously at @huggingface and @fastdotai Co-author of https://t.co/lywnOAwwnc He/him

Lightning AI ⚡️

Verified account

The AI omnicloud PyTorch developers love. Made the first AI Studio & PyTorch Lightning. Get help: https://t.co/a69wnEBpKH

1 day ago

I’m so excited to see this launch (while I’m on paternity leave 👶)! Lots of people are moving away from agent frameworks and building on top of existing agent harnesses like Claude code, etc. This makes it easier to track and quickly navigate across theses kinds of multi-turn agent sessions, debug errors, find patterns and build evaluations. Now back to nappies for me. Sorry to my fellow Europeans for committing the sacrilege of working, however minor, while on leave 🙇‍♂️

_ScottCondron's tweet photo. I’m so excited to see this launch (while I’m on paternity leave 👶)!

Lots of people are moving away from agent frameworks and building on top of existing agent harnesses like Claude code, etc.
This makes it easier to track and quickly navigate across theses kinds of multi-turn agent sessions, debug errors, find patterns and build evaluations.

Now back to nappies for me. Sorry to my fellow Europeans for committing the sacrilege of working, however minor, while on leave 🙇‍♂️

Weights & Biases

1 day ago

A brand new W&B Weave is live! It watches production agents end to end, flags failure modes on its own, runs a full loop from inference to training, and blocks regressions. You can finally watch how your agent thinks across millions of traces instead of squinting at one. 🫡

3

158

65

25

106K

3

3

0

1

2K

_ScottCondron retweeted

Weights & Biases

1 day ago

A brand new W&B Weave is live! It watches production agents end to end, flags failure modes on its own, runs a full loop from inference to training, and blocks regressions. You can finally watch how your agent thinks across millions of traces instead of squinting at one. 🫡

3

158

65

25

106K

_ScottCondron retweeted

2 days ago

Announcing a public preview of molab, our cloud-hosted marimo notebook workspace: now with GPUs! 🧵 https://t.co/8M0D047RTl

2

33

12

17

12K

_ScottCondron retweeted

CoreWeave @CoreWeave

3 days ago

The rumors are true 👀 Proud of the CoreWeave engineering team and our partners (like @Dell) who are the first to have a fully working @nvidia Vera Rubin NVL72, achieving yet another major milestone in bringing next-generation AI infrastructure online. Stay tuned, there is so much more to come...

CoreWeave's tweet photo. The rumors are true 👀

Proud of the CoreWeave engineering team and our partners (like @Dell) who are the first to have a fully working @nvidia Vera Rubin NVL72, achieving yet another major milestone in bringing next-generation AI infrastructure online.

Stay tuned, there is so much more to come...

91

2K

223

203

509K

_ScottCondron retweeted

CoreWeave @CoreWeave

6 days ago

AI agents fail in production when training data misses real-world edge cases. We’re closing the loop. CoreWeave connects training & inference so agents continuously improve from live experience. Powered by Serverless RL (40% cost cut, 1.4x faster) & @wandb observability. https://t.co/BZdU0OqnwC

2

32

7

1

2K

_ScottCondron retweeted

Weights & Biases

8 days ago

The W&B MCP server is officially LIVE! Coding agents could always read your code. Now they can read your experiments, monitor training, and drive their own research loops. 20 tools, hosted on every W&B deployment, plugs into Claude Code, Cursor, Codex, Gemini-CLI, and LeChat.

3

48

28

19

16K

8 days ago

@morgymcg @AI_for_Science congratulations!

1

1

0

0

132

_ScottCondron retweeted

Dimitris Papailiopoulos

@DimitrisPapail

17 days ago

Very rarely you stumble on a method that's simple, obvious in hindsight, free, and touches on every problem you care about: CLI agents, continual learning, self-improvement, world models. ECHO is one of those

DimitrisPapail's tweet photo. Very rarely you stumble on a method that's simple, obvious in hindsight, free, and touches on every problem you care about: CLI agents, continual learning, self-improvement, world models.

ECHO is one of those https://t.co/NK8XXNleX4

10

570

49

603

78K

_ScottCondron retweeted

CoreWeave @CoreWeave

21 days ago

Introducing CoreWeave Sandboxes, now in public preview. It's the execution layer for RL, agent tool use, and model evaluation, on your own CKS clusters or serverless through @wandb. Get the details: https://t.co/Ha3jISs7Ob

6

88

14

4

11K

_ScottCondron retweeted

21 days ago

Introducing: CoreWeave Sandboxes 🚀 Running RL and agents means executing model-generated code on isolated environments. Demand has exploded, leaving many scrambling for CPUs. The solution: Run tens of thousands of sandboxes on your own CoreWeave compute, even alongside SUNK!

1

9

3

4

607

_ScottCondron retweeted

28 days ago

when my phd advisor asks me for a weekly update on my experiments

jxmnop's tweet photo. when my phd advisor asks me for a weekly update on my experiments https://t.co/E4EDmGGWcg

10

3K

123

257

112K

about 1 month ago

@altryne Congrats Alex!!

0

1

0

0

142

_ScottCondron retweeted

about 1 month ago

Can confirm, @cursor_ai is the best harness we've tested on @WolfBenchAI so far! @WolframRvnwlf tests Harness x Model, and Cursor (before the SDK) is the best one we've ever tested!

altryne's tweet photo. Can confirm, @cursor_ai is the best harness we've tested on @WolfBenchAI so far!

@WolframRvnwlf tests Harness x Model, and Cursor (before the SDK) is the best one we've ever tested! https://t.co/YVorHrBZfD

19

282

24

102

65K

_ScottCondron retweeted

about 1 month ago

For benchmarks, I keep agent versions stable so results stay comparable. But new models can expose agent-side bugs. Here, updating @openclaw from 2026.3.11 to 2026.4.23 lifted Kimi K2.6 from 4% to 60% on @WolfBenchAI due to crucial fixes in how the agent handles its tool calling.

WolfBenchAI's tweet photo. For benchmarks, I keep agent versions stable so results stay comparable. But new models can expose agent-side bugs. Here, updating @openclaw from 2026.3.11 to 2026.4.23 lifted Kimi K2.6 from 4% to 60% on @WolfBenchAI due to crucial fixes in how the agent handles its tool calling. https://t.co/EYvKU1fNd1

0

6

3

0

537

_ScottCondron retweeted

about 1 month ago

GPT-5.5 takes over WolfBench! It’s now the #1 model, ahead of Claude Opus 4.7 and 4.6, GPT-5.4, Sonnet 4.6, Kimi K2.6, Gemini 3.1 Pro, and more. Notable findings after 30 runs (40h runtime, >1.7B tokens, ~$3K cost): - @OpenAI's GPT-5.5 is the best model we ever tested. - @cursor_ai's Agent CLI (CA) is the best agent we ever tested. - @NousResearch's Hermes Agent (HA) outperformed OpenClaw (OC). - With Hermes, going from medium to xhigh reasoning only improved consistency, not capability. Note: This is WolfBench, where we look at more than just the average score, because one metric is not enough. The golden ∅ score is the actual 5-run average, which most other benchmarks report as their only score. ★ shows the ceiling (what percentage of the full benchmark this model+agent combination solved at least once across all runs). ■ shows the solid base (what percentage of the full benchmark it solved consistently in every run).

WolfBenchAI's tweet photo. GPT-5.5 takes over WolfBench! It’s now the #1 model, ahead of Claude Opus 4.7 and 4.6, GPT-5.4, Sonnet 4.6, Kimi K2.6, Gemini 3.1 Pro, and more.

Notable findings after 30 runs (40h runtime, >1.7B tokens, ~$3K cost):
- @OpenAI's GPT-5.5 is the best model we ever tested.
- @cursor_ai's Agent CLI (CA) is the best agent we ever tested.
- @NousResearch's Hermes Agent (HA) outperformed OpenClaw (OC).
- With Hermes, going from medium to xhigh reasoning only improved consistency, not capability.

Note: This is WolfBench, where we look at more than just the average score, because one metric is not enough. The golden ∅ score is the actual 5-run average, which most other benchmarks report as their only score. ★ shows the ceiling (what percentage of the full benchmark this model+agent combination solved at least once across all runs). ■ shows the solid base (what percentage of the full benchmark it solved consistently in every run).

3

28

3

7

3K

_ScottCondron retweeted

Weights & Biases

about 1 month ago

Still feels a little unreal that you can just upload a dataset, get a fine-tuned LoRA back, and have it auto-deployed for inference without touching a single GPU config. Serverless SFT is still in public preview and adapter training is free right now. Don't sleep on it.

1

30

3

38

5K

_ScottCondron retweeted

Bowen Baker @bobabowen

about 1 month ago

Today we open sourced many of OpenAI's monitorability evaluations. We hope that the research community and other model developers can build upon them and use them to evaluate the monitorability of their own models. https://t.co/xFeZ0hbLZG

55

592

50

209

197K

Last Seen Users on Sotwe

Trends for you

Most Popular Users