John Shaffer

@jdshaffer

Joined June 2023

810 Following

38 Followers

222 Posts

jdshaffer retweeted

Mitchell Hashimoto

@mitchellh

26 days ago

I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.

511

15K

jdshaffer retweeted

Jason Fried

@jasonfried

5 months ago

Is there a law for "the bigger a system is, the more likely any one part of it will be missed, ignored, not used, or underutilized"? If not, let's declare one and name it.

148

24K

jdshaffer retweeted

Pete Delkus

@wfaaweather

5 months ago

The motherlode arrives this evening!

265

83K

jdshaffer retweeted

John Carmack

@ID_AA_Carmack

7 months ago

When I started working in python, I got lazy with “single assignment”, and I need to nudge myself about it. You should strive to never reassign or update a variable outside of true iterative calculations in loops. Having all the intermediate calculations still available is helpful in the debugger, and it avoids problems where you move a block of code and it silently uses a version of the variable that wasn’t what it originally had. In C/C++, making almost every variable const at initialization is good practice. I wish it was the default, and mutable was a keyword.

186

182

731

301K

jdshaffer retweeted

Steve Yegge

@Steve_Yegge

9 months ago

This is correct. We had this same reaction in the 1980s & 1990s when compilers generated assembly for us. We hated it. Looking at the generated code made us puke. It got better.

205

117K

jdshaffer retweeted

NWS Storm Prediction Center @NWSSPC

over 1 year ago

1:35pm CST #SPC Day2 #FireWX Extremely Critical: portions of south-central texas and the texas hill country. https://t.co/LEoXKVkNcs

NWSSPC's tweet photo. 1:35pm CST #SPC Day2 #FireWX Extremely Critical: portions of south-central texas and the texas hill country. https://t.co/LEoXKVkNcs https://t.co/jUAMDCcqaO

244

296K

jdshaffer retweeted

Andrej Karpathy

@karpathy

over 1 year ago

GPT 4.5 + interactive comparison :) Today marks the release of GPT4.5 by OpenAI. I've been looking forward to this for ~2 years, ever since GPT4 was released, because this release offers a qualitative measurement of the slope of improvement you get out of scaling pretraining compute (i.e. simply training a bigger model). Each 0.5 in the version is roughly 10X pretraining compute. Now, recall that GPT1 barely generates coherent text. GPT2 was a confused toy. GPT2.5 was "skipped" straight into GPT3, which was even more interesting. GPT3.5 crossed the threshold where it was enough to actually ship as a product and sparked OpenAI's "ChatGPT moment". And GPT4 in turn also felt better, but I'll say that it definitely felt subtle. I remember being a part of a hackathon trying to find concrete prompts where GPT4 outperformed 3.5. They definitely existed, but clear and concrete "slam dunk" examples were difficult to find. It's that ... everything was just a little bit better but in a diffuse way. The word choice was a bit more creative. Understanding of nuance in the prompt was improved. Analogies made a bit more sense. The model was a little bit funnier. World knowledge and understanding was improved at the edges of rare domains. Hallucinations were a bit less frequent. The vibes were just a bit better. It felt like the water that rises all boats, where everything gets slightly improved by 20%. So it is with that expectation that I went into testing GPT4.5, which I had access to for a few days, and which saw 10X more pretraining compute than GPT4. And I feel like, once again, I'm in the same hackathon 2 years ago. Everything is a little bit better and it's awesome, but also not exactly in ways that are trivial to point to. Still, it is incredible interesting and exciting as another qualitative measurement of a certain slope of capability that comes "for free" from just pretraining a bigger model. Keep in mind that that GPT4.5 was only trained with pretraining, supervised finetuning, and RLHF, so this is not yet a reasoning model. Therefore, this model release does not push forward model capability in cases where reasoning is critical (math, code, etc.). In these cases, training with RL and gaining thinking is incredibly important and works better, even if it is on top of an older base model (e.g. GPT4ish capability or so). The state of the art here remains the full o1. Presumably, OpenAI will now be looking to further train with Reinforcement Learning on top of GPT4.5 model to allow it to think, and push model capability in these domains. HOWEVER. We do actually expect to see an improvement in tasks that are not reasoning heavy, and I would say those are tasks that are more EQ (as opposed to IQ) related and bottlenecked by e.g. world knowledge, creativity, analogy making, general understanding, humor, etc. So these are the tasks that I was most interested in during my vibe checks. So below, I thought it would be fun to highlight 5 funny/amusing prompts that test these capabilities, and to organize them into an interactive "LM Arena Lite" right here on X, using a combination of images and polls in a thread. Sadly X does not allow you to include both an image and a poll in a single post, so I have to alternate posts that give the image (showing the prompt, and two responses one from 4 and one from 4.5), and the poll, where people can vote which one is better. After 8 hours, I'll reveal the identities of which model is which. Let's see what happens :)

175

628

jdshaffer retweeted

Graham Christensen

@grhmc

over 1 year ago

So, our GitHub Actions fetch dependencies (like nix-installer!) at run time. We put a lot of work into not failing because we messed it up. One of those things is fallback infra for fetching.

grhmc's tweet photo. So, our GitHub Actions fetch dependencies (like nix-installer!) at run time. We put a lot of work into not failing because we messed it up. One of those things is fallback infra for fetching. https://t.co/OAl3dDo3rj

591

jdshaffer retweeted

dax

@thdxr

over 1 year ago

OTEL is a standardization effort that lets you tap into a rich ecosystem of the worst tools you have ever seen

353

32K

jdshaffer retweeted

Peter Welinder

@npew

over 1 year ago

@altryne We tried it, and it hurt more than it helped for most use cases. Pixels is the most flexible interface. Using the DOM can be helpful, but it’s more of an optimization.

jdshaffer retweeted

Jonatán Iván

@nofreewill42

over 1 year ago

@DrJimFan o3’s answer is more correct than the expected output “solution”

324

28K

jdshaffer retweeted

Amjad Masad

@amasad

over 1 year ago

Paraphrasing the best advice @paulg gave me: "Ask yourself if this startup is your life's work. Knowing you're in it for the long haul lets you settle into a calmer, more focused rhythm despite the daily ups and downs, as you trust you'll show up and make it succeed over time."

332

294K

jdshaffer retweeted

François Chollet

@fchollet

over 1 year ago

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.

fchollet's tweet photo. Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.

202

jdshaffer retweeted

Srini Iyer

@sriniiyer88

over 1 year ago

New paper! Byte-Level models are finally competitive with tokenizer-based models with better inference efficiency and robustness! Dynamic patching is the answer! Read all about it here: https://t.co/GJSiFtugju (1/n)

19K

jdshaffer retweeted

Garry Tan

@garrytan

over 1 year ago

I guess “Designed by Claude” is just not going to be that good until codegen datasets also get trained on aesthetics The rocks need to be able to draw, not just read and write @bnj is working on this

153

58K

jdshaffer retweeted

Windsurf Current

@WindsurfCurrent

over 1 year ago

The best part of making a product generally accessible (no waitlist!) is seeing other people's reactions. See how folks have interacted with the Windsurf Editor within the first 24 hours of launch 🧵

161

19K

jdshaffer retweeted

Arena.ai

@arena

over 1 year ago

@CopilotArena Copilot Arena link https://t.co/Zyc9iL3u9m

jdshaffer retweeted

Jeremy Daly

@jeremy_daly

over 1 year ago

Issue #305 of Off-by-none is out! This week, CloudFormation deployments get an x-ray timeline view, users report Bedrock might be on shaky ground, and we celebrate the real heroes of @awscloud! #offbynone https://t.co/coQX1atr8e

jeremy_daly's tweet photo. Issue #305 of Off-by-none is out! This week, CloudFormation deployments get an x-ray timeline view, users report Bedrock might be on shaky ground, and we celebrate the real heroes of @awscloud! #offbynone https://t.co/coQX1atr8e https://t.co/fLOIuiGhKn

jdshaffer retweeted

DHH

@dhh

over 1 year ago

We're getting real with the realization that system tests have failed to be worth their weight. We're killing ALL the system tests (359 cases) in HEY, replacing them with a minimal set of smoke tests. Then leaning on controller integration tests instead. https://t.co/Dj7BFI62Os

617

140K

jdshaffer retweeted

braai engineer

@BraaiEngineer

over 1 year ago

there are now 4 interesting things in web development: 1. @ElectricClojure 2. Rama by @nathanmarz 3. Datomic by @cognitect 4. @cursor_ai the models still hallucinate a lot, though, but it's the worst it'll ever be 🤷‍♂️

John Shaffer

@jdshaffer

Last Seen Users on Sotwe

Trends for you

Most Popular Users