Sahil Chaudhary

Presage Labs @presage_labs

9 months ago

Announcing Butter: An OpenAI-compliant LLM proxy, built to record and deterministically replay repetitive agent workflows https://t.co/oRz8IjxCUU

6

61

4

42

6K

csahil28 retweeted

9 months ago

Introducing PrediBench - A live benchmark of AI models betting on prediction markets. This benchmark answers the question “How well can AI predict the future?” 1 - Each day, 10 top trending real-world events are pulled from Polymarket, with questions like “Who will be the next mayor of NYC?” 2 - Each model browses the web in agentic mode to research the question, then allocates $1 in bets. 3 - As the events resolve in real-time, we score the model’s performance : Average returns, Sharpe ratio, Brier score. ▸ Visit it at https://t.co/1r4Xm09aD6 🧵[1/N]

presage_labs's tweet photo. Introducing PrediBench - A live benchmark of AI models betting on prediction markets.

This benchmark answers the question “How well can AI predict the future?”

1 - Each day, 10 top trending real-world events are pulled from Polymarket, with questions like “Who will be the next mayor of NYC?”

2 - Each model browses the web in agentic mode to research the question, then allocates $1 in bets.

3 - As the events resolve in real-time, we score the model’s performance : Average returns, Sharpe ratio, Brier score.

▸ Visit it at https://t.co/1r4Xm09aD6

🧵[1/N]

1

19

8

6

5K

Who to follow

Nous Research

@NousResearch

A bunch of nerds making progress toward open source AI https://t.co/vrD0aDJeto

Accelerating humanity's transition to AGI — Note: we are exclusively located in Hillsborough, CA, no physical location in SF nor any other city yet.

csahil28 retweeted

m_ric

@AymericRoucher

9 months ago

We're thrilled to introduce PrediBench, our first production at @presage_labs! PrediBench a live benchmark that answers the question "could an AI model earn money on Polymarket?" TL;DR: Some models like Grok-4 or GPT-5 do beat the crowd of human betters, and they turn a profit!

1

19

4

8

4K

csahil28 retweeted

Jon Lai

@Tocelot

10 months ago

managing your psychology as a founder is incredibly challenging, yet is rarely ever talked about in public. in private, i've talked to many founders who feel overwhelmed or depressed even if things are actually going really well. a few reasons i've seen up close: 1) you care a lot - the force of will that drives a founder to start a company often works against them psychologically when things don't go as planned. losing a customer or a talented employee often feel like personal faults. as founder you care the most about the company, and it can feel like you're on the hook for everything bad that happens 2) no all the time - founders are in situations where they are constantly being rejected - fundraising, recruiting, sales, etc. hearing "no" 20 times a day is just mentally and physically draining even for the grittiest entrepreneurs 3) most founders learn on the job - there's little training that prepares you to run a start-up except actually running a start-up. i've worked with founders who had to exit a cofounder for the first time, who had to deal with a PR hit-piece for the first time etc. there are lots of "firsts" which leads to stress and a feeling you don't know what you're doing so what helps? our partner Ben Horowitz wrote a great blog on this years ago (link below) but summarizing a few tips: - talk to friends. one of the things we strive to create with @speedrun is a community of founder friends to help with the psychological journey. while the job is still the same, the shared perspective can make it feel a bit less lonely. as others have made it through, so shall you! - focus on the road ahead not the walls. there are a million things that can go wrong with a start-up and most of them you can't directly control. by focusing on the things you CAN control - your next move, shipping that product, making that hire, etc - you make progress one step at a time the journey is long but there's a light at the end of the tunnel for those who persevere and most importantly, don't quit =)

41

307

24

152

26K

csahil28 retweeted

naklecha

@naklecha

11 months ago

today, i'm excited to release cloudy -- a platform which enables you to sync & mount storage volumes on gpus sourced from popular cloud providers like lambdalabs, runpod, hyperbolic, nebius etc. save your team countless hours every month.

56

361

41

101

74K

csahil28 retweeted

evan conrad

@evanjconrad

11 months ago

it's so fun when a company is doing far better than external perception and everyone who works there has the shared secret of knowing they are going to crush it

5

142

3

11

15K

csahil28 retweeted

Sharif Shameem

@sharifshameem

about 1 year ago

Friendly reminder that it’s very much possible to outperform frontier reasoning models like o3 on narrowly defined tasks for your product You don’t have to be limited by o3 and Sonnet, you can make your product much better!

sharifshameem's tweet photo. Friendly reminder that it’s very much possible to outperform frontier reasoning models like o3 on narrowly defined tasks for your product

You don’t have to be limited by o3 and Sonnet, you can make your product much better! https://t.co/WzVSXrYKhi

7

137

11

70

10K

csahil28 retweeted

about 1 year ago

Announcing: Muscle Mem 💪 Muscle Mem is a cache system for AI agents, allowing them to learn and efficiently replay complex behaviors. This allows expensive LLM calls to be entirely removed from the hot path, during repetitive tasks. https://t.co/xmMMRpqsY7

19

258

29

141

48K

csahil28 retweeted

over 1 year ago

After weeks of talking to users and iterating, I'm excited to launch three new things: - Pig Chat: drive your computer with a chat UI, like Operator - Agent API: the same batteries-included chat agent, via API - Open access - You can use it, today, at https://t.co/lfsQW7mTHI

18

179

20

66

31K

csahil28 retweeted

Y Combinator

@ycombinator

over 1 year ago

🐷@PigDev_ is an API to operate Windows Apps with AI, making it easy to automate legacy applications across healthcare, manufacturing, finance, and more. It's like Operator, for Windows. https://t.co/cUOcYh2mA4 Congrats on the launch, @erikdunteman!

24

449

39

329

179K

csahil28 retweeted

Nous Research

@NousResearch

over 1 year ago

Introducing DeepHermes-3 Preview, a new LLM that unifies reasoning and intuitive language model capabilities. https://t.co/YOqfE1Liae DeepHermes 3 is built from the Hermes 3 datamix, with new reasoning data, creating a model that can toggle on and off long chains of thought for improved accuracy at the cost of more test time compute!

NousResearch's tweet photo. Introducing DeepHermes-3 Preview, a new LLM that unifies reasoning and intuitive language model capabilities.

https://t.co/YOqfE1Liae

DeepHermes 3 is built from the Hermes 3 datamix, with new reasoning data, creating a model that can toggle on and off long chains of thought for improved accuracy at the cost of more test time compute!

63

1K

197

411

351K

csahil28 retweeted

over 1 year ago

Announcing @PigDev_ Windows Desktops for Agents

56

572

77

494

136K

csahil28 retweeted

naklecha

@naklecha

over 1 year ago

today, i'm excited to release a reinforcement learning guide that carefully explains the intuition and implementation details behind every single fundamental algorithm in the field. enjoy :) https://t.co/cWJTM2Lr1P

naklecha's tweet photo. today, i'm excited to release a reinforcement learning guide that carefully explains the intuition and implementation details behind every single fundamental algorithm in the field. enjoy :)

https://t.co/cWJTM2Lr1P https://t.co/Pm485j5Igw

87

2K

324

3K

230K

csahil28 retweeted

Guillermo Rauch

@rauchg

over 1 year ago

Some people are born American but in remote countries and under different nationalities. They don’t have a passport yet but they’re inevitably drawn usually by entrepreneurship or discovery. America is a mindset and a spiritual experience not a country.

71

870

31

117

88K

over 1 year ago

I’m releasing model weights, training data, scripts, and eval code to help reproduce benchmark scores. Postmortem- https://t.co/kOWE1SsYs6 Weights- https://t.co/Kvb1CSq0mP Eval code- https://t.co/OEMiQMFkjH Training code- https://t.co/bmtFj8F277  @RickLamers has also put together a repo to reproduce the benchmark scores easily on gpu instances from https://t.co/lJHx706iV5 https://t.co/4vVZMQJ0ap

4

102

11

79

30K

almost 2 years ago

@danielhanchen @mattshumer_ Should be fixed now

2

6

0

876

almost 2 years ago

@abacaj @ImSh4yy @mattshumer_ @GlaiveAI had some trouble with uploads so it's sharded into 2gb files, will upload fp16 with bigger shard size as well, for easier downloads

1

2

0

416