Will Gorman

@wpgorman

Startup Mode

Winter Springs, Florida

Joined April 2009

736 Following

896 Followers

2.7K Posts

wpgorman retweeted

Ibragim

@ibragim_bad

10 days ago

🚨 SWE-rebench big March-May update! SWE-rebench is a live benchmark with fresh SWE tasks (issue+PR) from GitHub. updates: > we collected more fresh and complex tasks > we ran 110 tasks × 5 for each model / scaffold > the differences between models and scaffolds are now easier to distinguish > we will update the task set every two months, with model updates in between insights: > GPT-5.5 xhigh takes #1 with 62.7% resolved and 70.0% pass@5 > Cursor with Composer 2.5 is very cheap and strong: around 8× cheaper than Claude Code and Codex, and scores higher than open-weight models! @leerob Model updates will come in 1–2 weeks. Please send requests for models you want us to run! 🏆 Full leaderboard (check for price / tokens per problem, pass@5, scaffold params, etc): https://t.co/9jL4lt4UGl 👾 Join our Discord: https://t.co/cyWgceqqDa

ibragim_bad's tweet photo. 🚨 SWE-rebench big March-May update!

SWE-rebench is a live benchmark with fresh SWE tasks (issue+PR) from GitHub.

updates:
> we collected more fresh and complex tasks
> we ran 110 tasks × 5 for each model / scaffold
> the differences between models and scaffolds are now easier to distinguish
> we will update the task set every two months, with model updates in between

insights:
> GPT-5.5 xhigh takes #1 with 62.7% resolved and 70.0% pass@5
> Cursor with Composer 2.5 is very cheap and strong: around 8× cheaper than Claude Code and Codex, and scores higher than open-weight models! @leerob

Model updates will come in 1–2 weeks. Please send requests for models you want us to run!

🏆 Full leaderboard (check for price / tokens per problem, pass@5, scaffold params, etc):
https://t.co/9jL4lt4UGl

👾 Join our Discord:
https://t.co/cyWgceqqDa

232

114K

wpgorman retweeted

Genesis AI

@gs_ai_

10 days ago

We are back again :) After three weeks of quiet building. Introducing Genesis World 1.0, our latest simulation platform, the second release in our full-stack suite. Open-sourced. Robotics is still bottlenecked by the 1× speed of the physical world. Every model, checkpoint, and data recipe eventually needs to be tested on physical hardware, slowly, expensively, and with limited coverage. One hour in reality can become 100 days in simulation. That is how robotics model iteration moves from a wall-clock bottleneck to a compute problem. To make this work, simulation has to be both fast and trustworthy. Over the past year, we rebuilt the entire stack: a GPU-accelerated cross-platform compiler, penetration-free multi-physics contact solvers, unified rigid and deformable physics, and a photo-realistic renderer purpose-built for physical AI applications. We built Nyx, a high-performance path-traced rendering engine for robotics application. Genesis World 1.0 achieves near realtime performance with our latest development for penetration-free IPC solver, supporting various types of deformables beyond rigid bodies. It supports contact-rich, dexterous manipulation simulation across different embodiments: unitree, sharpa, wuji, genesis hand and various types of grippers. Under the hood is Quadrants, our effort in pushing forward cross-platform GPU-accelerated computation. Quadrants started as a fork of Taichi, and we rebuilt most of the critical parts for optimizing simulation workloads, giving 10x faster launch time and up to 4.6x runtime performance compared to the initial Genesis release. Together, they bring us to an unprecedentedly low sim-to-real gap, enabling zero-shot real-to-sim model evaluation and much faster iteration of GENE. All available today. Genesis World 1.0: https://t.co/aknCM3eqws Quadrants: https://t.co/uXqPNI4cb6 Nyx: https://t.co/R8j0djqGnV

966

174

539

272K

wpgorman retweeted

LeRobot

@LeRobotHF

16 days ago

We built a bipedal robot for about $2,500. A real, mostly 3D-printed robot you can build, repair, simulate, train, and control. Today we’re releasing LeRobot Humanoid: an open robot-learning platform with hardware, runtime, identification tools, and training environments. Blog post: https://t.co/zu2etb1NZo Repo: https://t.co/4myLRUtZ3W

179

481

197K

wpgorman retweeted

Yann LeCun

@ylecun

20 days ago

The salvation is Project Tapestry https://t.co/5MOgouVplV

217

84K

Who to follow

Julian Hyde

@julianhyde

On a mission to tame data. Author of Apache Calcite, Mondrian OLAP engine, and the Morel language. Formerly Looker/Google, now working on the next thing.

Brandon

@usbrandon

Christian, dad, enjoy tech, finance and business intelligence.

Matt Casters

@mattcasters

Senior Solution Architect, Apache Hop PMC

wpgorman retweeted

Paul Graham

@paulg

20 days ago

There is nothing more powerful than well-informed optimism. It has to be well-informed though. The "everything will be fine" type of optimism may also be somewhat useful, but it's not as useful as the "Hmm, what if we tried x?" kind.

210

317

619

148K

wpgorman retweeted

Marc Andreessen 🇺🇸

@pmarca

27 days ago

149

233

142K

wpgorman retweeted

akira

@realmcore_

28 days ago

Software engineering isn’t dead And in fact it might be more alive than ever You can actually do engineering now As opposed to being constrained by typing and a mythical man month

wpgorman retweeted

Beff (e/acc)

@beffjezos

29 days ago

We live in the Golden Age of building cool shit.

127

113

120K

wpgorman retweeted

Peter Steinberger 🦞

@steipete

about 1 month ago

closing what's fixed was the pre-clean. intent-based clustering is the second strike.

139

39K

wpgorman retweeted

Quintin Pope @QuintinPope5

about 2 months ago

As a percent of GDP, hyperscaling isn't that extreme. The railroads were goddamn crazy, though!

345

824

289K

wpgorman retweeted

Andrej Karpathy

@karpathy

2 months ago

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating knowledge (stored as markdown and images). The latest LLMs are quite good at it. So: Data ingest: I index source documents (articles, papers, repos, datasets, images, etc.) into a raw/ directory, then I use an LLM to incrementally "compile" a wiki, which is just a collection of .md files in a directory structure. The wiki includes summaries of all the data in raw/, backlinks, and then it categorizes data into concepts, writes articles for them, and links them all. To convert web articles into .md files I like to use the Obsidian Web Clipper extension, and then I also use a hotkey to download all the related images to local so that my LLM can easily reference them. IDE: I use Obsidian as the IDE "frontend" where I can view the raw data, the the compiled wiki, and the derived visualizations. Important to note that the LLM writes and maintains all of the data of the wiki, I rarely touch it directly. I've played with a few Obsidian plugins to render and view data in other ways (e.g. Marp for slides). Q&A: Where things get interesting is that once your wiki is big enough (e.g. mine on some recent research is ~100 articles and ~400K words), you can ask your LLM agent all kinds of complex questions against the wiki, and it will go off, research the answers, etc. I thought I had to reach for fancy RAG, but the LLM has been pretty good about auto-maintaining index files and brief summaries of all the documents and it reads all the important related data fairly easily at this ~small scale. Output: Instead of getting answers in text/terminal, I like to have it render markdown files for me, or slide shows (Marp format), or matplotlib images, all of which I then view again in Obsidian. You can imagine many other visual output formats depending on the query. Often, I end up "filing" the outputs back into the wiki to enhance it for further queries. So my own explorations and queries always "add up" in the knowledge base. Linting: I've run some LLM "health checks" over the wiki to e.g. find inconsistent data, impute missing data (with web searchers), find interesting connections for new article candidates, etc., to incrementally clean up the wiki and enhance its overall data integrity. The LLMs are quite good at suggesting further questions to ask and look into. Extra tools: I find myself developing additional tools to process the data, e.g. I vibe coded a small and naive search engine over the wiki, which I both use directly (in a web ui), but more often I want to hand it off to an LLM via CLI as a tool for larger queries. Further explorations: As the repo grows, the natural desire is to also think about synthetic data generation + finetuning to have your LLM "know" the data in its weights instead of just context windows. TLDR: raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki, and all of it viewable in Obsidian. You rarely ever write or edit the wiki manually, it's the domain of the LLM. I think there is room here for an incredible new product instead of a hacky collection of scripts.

60K

107K

21M

wpgorman retweeted

Alvin Wang Graylin

@AGraylin

2 months ago

New Research💡: “The #Enterprise #AI Playbook — Lessons from 51 Successful Deployments” Excited to share new research from Stanford @DigEconLab, I co-authored with @erikbryn and Elisa Pereira . We spent 5 months interviewing executives across 41 organizations, 9 industries, and 7 countries — focusing exclusively on AI deployments that actually delivered measurable value. Not hype. Not predictions. What’s working right now, and why. A few findings that challenged even our assumptions: The hard part isn’t the AI. 77% of the toughest challenges were invisible costs — change management, data quality, process redesign. Technology was consistently described as the easiest part. Same use case, wildly different timelines. One company deployed AI customer support in weeks. Another took years. Same models. The difference was always the #organization — its #leadership, processes, and willingness to fail. #Agentic AI works — but most firms haven’t tried it yet. Only 20% of our cases were agentic, but they delivered 71% median gains vs. 40% for high-automation. This gap will widen fast. The model is increasingly a #commodity. For 42% of implementations, model choice was fully interchangeable. The durable advantage is in orchestration, data, and process — not the foundation model. With productivity increase, headcount #reduction is common (45%), but not the majority outcome. Redeployment, hiring avoidance, and acceleration strategies accounted for 55% of cases.🚨 The window for experimentation is closing. This is no longer a question of whether AI delivers value. It’s whether organizations can evolve fast enough to capture it — and whether leaders will take responsibility for smoothing the transition for workers and communities along the way. Full report (free): https://t.co/G9eRzG9ugD @StanfordHAI

AGraylin's tweet photo. New Research💡: “The #Enterprise #AI Playbook — Lessons from 51 Successful Deployments”
Excited to share new research from Stanford @DigEconLab, I co-authored with @erikbryn and Elisa Pereira .

We spent 5 months interviewing executives across 41 organizations, 9 industries, and 7 countries — focusing exclusively on AI deployments that actually delivered measurable value. Not hype. Not predictions. What’s working right now, and why.

A few findings that challenged even our assumptions:

The hard part isn’t the AI. 77% of the toughest challenges were invisible costs — change management, data quality, process redesign. Technology was consistently described as the easiest part.

Same use case, wildly different timelines. One company deployed AI customer support in weeks. Another took years. Same models. The difference was always the #organization — its #leadership, processes, and willingness to fail.

#Agentic AI works — but most firms haven’t tried it yet. Only 20% of our cases were agentic, but they delivered 71% median gains vs. 40% for high-automation. This gap will widen fast.

The model is increasingly a #commodity. For 42% of implementations, model choice was fully interchangeable. The durable advantage is in orchestration, data, and process — not the foundation model.

With productivity increase, headcount #reduction is common (45%), but not the majority outcome. Redeployment, hiring avoidance, and acceleration strategies accounted for 55% of cases.🚨

The window for experimentation is closing. This is no longer a question of whether AI delivers value. It’s whether organizations can evolve fast enough to capture it — and whether leaders will take responsibility for smoothing the transition for workers and communities along the way.

Full report (free): https://t.co/G9eRzG9ugD

@StanfordHAI

29K

wpgorman retweeted

Chroma

@trychroma

2 months ago

Introducing Chroma Context-1, a 20B parameter search agent. > pushes the pareto frontier of agentic search > order of magnitude faster > order of magnitude cheaper > Apache 2.0, open-source

141

403

wpgorman retweeted

Andrej Karpathy

@karpathy

2 months ago

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

28K

14K

67M

wpgorman retweeted

Claude

@claudeai

2 months ago

You can now enable Claude to use your computer to complete tasks. It opens your apps, navigates your browser, fills in spreadsheets—anything you'd do sitting at your desk. Research preview in Claude Cowork and Claude Code, macOS only.

139K

14K

84K

78M

wpgorman retweeted

Rowan Cheung

@rowancheung

3 months ago

This robotic hand can be 3D printed by anyone and assembled in under 8 hours. Researchers at ETH Zurich created the Orca hand, fully open-sourced with artificial bones and tendons. For context, advanced robotic hands cost over $100,000 and require constant maintenance... Orca costs under $2,000. 50x less (!) A self-calibration system maps every motor to every joint, eliminating the manual tuning that tendon-driven hands usually need. Each fingertip has built-in tactile sensors covered by silicone skin. The hand can actually feel when it touches something, giving it feedback to grip objects without crushing them or letting them slip. It can hold over 20 lbs, learn tasks by watching human demonstrations, and transfer skills trained in simulation directly to the real world. The team proved its durability by having it pick up and place a cube over 2,000 times across 7 hours with no human intervention. The full design files and source code are open source, so any robotics lab in the world can start building one today.

109

410

213K

wpgorman retweeted

Sunday @sundayrobotics

3 months ago

We just raised $165M to end robotics demos and deploy the world's first autonomous home robots into households this year.

129

153

375

252K

wpgorman retweeted

Peter Steinberger 🦞

@steipete

3 months ago

This keeps impressing me. So many awesome builders! https://t.co/t52IGtWkML

626

163

202K

wpgorman retweeted

Bo Wang

@BoWang87

3 months ago

Prof. Donald Knuth opened his new paper with "Shock! Shock!" Claude Opus 4.6 had just solved an open problem he'd been working on for weeks — a graph decomposition conjecture from The Art of Computer Programming. He named the paper "Claude's Cycles." 31 explorations. ~1 hour. Knuth read the output, wrote the formal proof, and closed with: "It seems I'll have to revise my opinions about generative AI one of these days." The man who wrote the bible of computer science just said that. In a paper named after an AI. Paper: https://t.co/juSOmK9vOt