Manuel Del Verme @ManuelDelVerme - Twitter Profile

Pinned Tweet

3 months ago

the AI diffusion bottleneck is reliability. not capability. most teams don't have the resources to measure agents. the right way to transition to agents safely is open evals infrastructure. that's what @silverstreamAI @ServiceNowRSRCH @nvidia @IBM @thealliance_ai are doing

4

21

6

1

4K

Manuel Del Verme

@ManuelDelVerme

3 months ago

It's funny how the main feedback we got from people was that claude code's scheduled tasks are opaque😂 @AnthropicAI i have a great pitch for you

Silverstream AI @silverstreamAI

3 months ago

Bench for Claude Code was #1 Product of the Day on Product Hunt 🏆 Also featured in the @ProductHunt newsletter Thanks to everyone who supported us 🙏 Built to store, review, and share your Claude Code sessions More coming soon

1

2

1

216

0

3

1

0

131

ManuelDelVerme retweeted

Silverstream AI @silverstreamAI

3 months ago

Bench for Claude Code was #1 Product of the Day on Product Hunt 🏆 Also featured in the @ProductHunt newsletter Thanks to everyone who supported us 🙏 Built to store, review, and share your Claude Code sessions More coming soon

1

2

1

216

Manuel Del Verme

@ManuelDelVerme

3 months ago

Agents touch real systems: databases, APIs, permissions, configs. We've been using Bench to give customers curated traces and get full observability into what our agents did and why. Share traces directly in PRs. Now we're releasing it. Live on Product Hunt today. https://t.co/DLAlvH69jH

2

3

1

0

186

Who to follow

Filippo Maggioli

@FilippoMaggioli

Postdoctoral researcher in computer graphics at the University of Milano-Bicocca Department of Computer Science

Antonio Pio Ricciardi

@antonioricc93

Ph.D. student in Reinforcement Learning @SapienzaRoma, Computer Science

Nasim Rahaman

@nasim_rahaman

👨‍🍳 @ Tiptree Systems. Previously, deep Learning and espresso slurping @MPI_IS + @Mila_Quebec + @MetaAI + @awscloud, physics @UniHeidelberg.

Manuel Del Verme

@ManuelDelVerme

3 months ago

At GTC, the same question kept coming up: Is there a way to track what Claude Code does and share it? Tomorrow, we’re launching the answer on Product Hunt.

0

2

0

85

Manuel Del Verme

@ManuelDelVerme

3 months ago

The @silverstreamAI and @ServiceNowRsch teams built the infrastructure and observability, we host a managed visualization layer compatible with CUBE: https://t.co/hebBPv3zgm if you're running agents in production right now, what has stopped you from creating broader evals?

1

4

1

0

102

Manuel Del Verme

@ManuelDelVerme

3 months ago

the AI diffusion bottleneck is reliability. not capability. most teams don't have the resources to measure agents. the right way to transition to agents safely is open evals infrastructure. that's what @silverstreamAI @ServiceNowRSRCH @nvidia @IBM @thealliance_ai are doing

4

21

6

1

4K

Manuel Del Verme

@ManuelDelVerme

3 months ago

right now every team builds eval infra from scratch. no way to compare results across models. no standard way to measure failure. every team starts from a vibecoded shell. wrap a benchmark once, run it everywhere. no custom integration. built on MCP, Gym and @opentelemetry . not another benchmark. infrastructure for all of them.

2

4

1

0

128

Manuel Del Verme

@ManuelDelVerme

4 months ago

@jacobmbuckman Happy to catch up! Are you coming for GTC?

0

106

Manuel Del Verme

@ManuelDelVerme

4 months ago

Day 2 and it still didn't figure out calling a class AckermannModel and implementing a bicycle dynamics doesn't make for a very useful model

0

28

Manuel Del Verme

@ManuelDelVerme

4 months ago

Hey agent reverse engineer the car protocol and map the room

1

2

0

118

Manuel Del Verme

@ManuelDelVerme

4 months ago

we entered the 90s, the age of optical flow Lucas-Kanade works! dense opt flow didn't :( ego motion: floor features move 80px downward, distant windows move 40px upward

ManuelDelVerme's tweet photo. we entered the 90s, the age of optical flow

Lucas-Kanade works!
dense opt flow didn't :(

ego motion: floor features move 80px downward, distant windows move 40px upward https://t.co/ZaJ3AF8jnA

1

0

53

Manuel Del Verme

@ManuelDelVerme

4 months ago

@grok @grok do inverse kinematics for this

1

0

9

Manuel Del Verme

@ManuelDelVerme

4 months ago

@LiTianleli We've been evaluating Grok on enterprise tools: ServiceNow, Oracle, Odoo, obscure high revenue SaaS, full computer use with DOM replay, Grok has lower FP for task refusals (good TP) but pixel control is lagging. happy to share finetuning data if useful.

0

29

Manuel Del Verme

@ManuelDelVerme

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users