Announcing Cloudy, a platform that seamlessly handles your training infrastructure.
You can rent a single H100 or a cluster of 1000 H100s, manage petabyte-scale storage volumes & seamlessly go from running experiments to managing large scale training runs on a single interface.
Introducing PrediBench - A live benchmark of AI models betting on prediction markets.
This benchmark answers the question “How well can AI predict the future?”
1 - Each day, 10 top trending real-world events are pulled from Polymarket, with questions like “Who will be the next mayor of NYC?”
2 - Each model browses the web in agentic mode to research the question, then allocates $1 in bets.
3 - As the events resolve in real-time, we score the model’s performance : Average returns, Sharpe ratio, Brier score.
▸ Visit it at https://t.co/1r4Xm09aD6
🧵[1/N]
We're thrilled to introduce PrediBench, our first production at @presage_labs!
PrediBench a live benchmark that answers the question "could an AI model earn money on Polymarket?"
TL;DR: Some models like Grok-4 or GPT-5 do beat the crowd of human betters, and they turn a profit!
managing your psychology as a founder is incredibly challenging, yet is rarely ever talked about in public. in private, i've talked to many founders who feel overwhelmed or depressed even if things are actually going really well. a few reasons i've seen up close:
1) you care a lot - the force of will that drives a founder to start a company often works against them psychologically when things don't go as planned. losing a customer or a talented employee often feel like personal faults. as founder you care the most about the company, and it can feel like you're on the hook for everything bad that happens
2) no all the time - founders are in situations where they are constantly being rejected - fundraising, recruiting, sales, etc. hearing "no" 20 times a day is just mentally and physically draining even for the grittiest entrepreneurs
3) most founders learn on the job - there's little training that prepares you to run a start-up except actually running a start-up. i've worked with founders who had to exit a cofounder for the first time, who had to deal with a PR hit-piece for the first time etc. there are lots of "firsts" which leads to stress and a feeling you don't know what you're doing
so what helps? our partner Ben Horowitz wrote a great blog on this years ago (link below) but summarizing a few tips:
- talk to friends. one of the things we strive to create with @speedrun is a community of founder friends to help with the psychological journey. while the job is still the same, the shared perspective can make it feel a bit less lonely. as others have made it through, so shall you!
- focus on the road ahead not the walls. there are a million things that can go wrong with a start-up and most of them you can't directly control. by focusing on the things you CAN control - your next move, shipping that product, making that hire, etc - you make progress one step at a time
the journey is long but there's a light at the end of the tunnel for those who persevere and most importantly, don't quit =)
today, i'm excited to release cloudy -- a platform which enables you to sync & mount storage volumes on gpus sourced from popular cloud providers like lambdalabs, runpod, hyperbolic, nebius etc.
save your team countless hours every month.
it's so fun when a company is doing far better than external perception and everyone who works there has the shared secret of knowing they are going to crush it
Friendly reminder that it’s very much possible to outperform frontier reasoning models like o3 on narrowly defined tasks for your product
You don’t have to be limited by o3 and Sonnet, you can make your product much better!
Announcing: Muscle Mem 💪
Muscle Mem is a cache system for AI agents, allowing them to learn and efficiently replay complex behaviors.
This allows expensive LLM calls to be entirely removed from the hot path, during repetitive tasks.
https://t.co/xmMMRpqsY7
After weeks of talking to users and iterating, I'm excited to launch three new things:
- Pig Chat: drive your computer with a chat UI, like Operator
- Agent API: the same batteries-included chat agent, via API
- Open access - You can use it, today, at https://t.co/lfsQW7mTHI
🐷@PigDev_ is an API to operate Windows Apps with AI, making it easy to automate legacy applications across healthcare, manufacturing, finance, and more. It's like Operator, for Windows.
https://t.co/cUOcYh2mA4
Congrats on the launch, @erikdunteman!
Introducing DeepHermes-3 Preview, a new LLM that unifies reasoning and intuitive language model capabilities.
https://t.co/YOqfE1Liae
DeepHermes 3 is built from the Hermes 3 datamix, with new reasoning data, creating a model that can toggle on and off long chains of thought for improved accuracy at the cost of more test time compute!
today, i'm excited to release a reinforcement learning guide that carefully explains the intuition and implementation details behind every single fundamental algorithm in the field. enjoy :)
https://t.co/cWJTM2Lr1P
Some people are born American but in remote countries and under different nationalities. They don’t have a passport yet but they’re inevitably drawn usually by entrepreneurship or discovery. America is a mindset and a spiritual experience not a country.
I’m releasing model weights, training data, scripts, and eval code to help reproduce benchmark scores.
Postmortem- https://t.co/kOWE1SsYs6
Weights- https://t.co/Kvb1CSq0mP
Eval code- https://t.co/OEMiQMFkjH
Training code- https://t.co/bmtFj8F277
@RickLamers has also put together a repo to reproduce the benchmark scores easily on gpu instances from https://t.co/lJHx706iV5
https://t.co/4vVZMQJ0ap
@abacaj@ImSh4yy@mattshumer_ @GlaiveAI had some trouble with uploads so it's sharded into 2gb files, will upload fp16 with bigger shard size as well, for easier downloads