A.J. Feather

@AJFeather

Did journalism-ish things in a previous life. Took the mob’s advice and learned to code. Subscribe to my newsletter 🧑‍💻:

New York, NY

Joined January 2008

1.4K Following

1.3K Followers

771 Posts

A.J. Feather @AJFeather

6 days ago

@ZaidJilani What percent of the money on that screenshot is not invested?

140

A.J. Feather @AJFeather

7 days ago

@LoganDobson Ben Thompson @stratechery was right. This is how UBI will happen. Data centers are just going to start sending people in the towns where they build checks. Is it elegant? No. Is it fair 🤷 But it’s apparently the way they get built.

666

A.J. Feather @AJFeather

about 1 month ago

@aarmlovi Her problem is that the riders are not R +30

A.J. Feather @AJFeather

about 2 months ago

@realEstateTrent I thought the whole point of investing in real estate was so you never have to live in a tent. What am I missing?

780

Who to follow

Mizzou College Republicans

@MizzouCRs

Students at @mizzou seeking to grow political understanding on campus and to foster the next generation of the GOP. Affiliates: @MissouriCR | @uscollegegop

ChiConnie

@ChiTownConnie

We must especially beware of that small group of selfish men who would clip the wings of the American eagle in order to feather their own nests.” ~ F.D.R.

Kyle Aubuchon

@mmKyleAubuchon

Catholic, Husband, Father x2, Son, Brother, Chief of Staff @BeanSenate, Former Senior Staff @GovParsonMO, Mizzou sports fan, @OzarkNPS floater #Team57Alumni

A.J. Feather @AJFeather

about 2 months ago

However this plays out, my kids will only ever learn to drive a car for fun. Incredible

Gene Munster

@munster_gene

about 2 months ago

$GOOG Waymo, now over 500k rides a week, more than doubling year over year. Growth has been consistently doubling off of higher numbers. I estimate $TSLA Robotaxi around 50k rides per week, and still in a great position to catch Waymo over the next few years. The crazy part is that self driving ride sharing is still only 1.5% of total ride sharing miles.

669

64K

A.J. Feather @AJFeather

about 2 months ago

The shamelessness required to push this right after @GavinNewsom literally tried and couldn’t make it work is astounding.

Tom Steyer

@TomSteyer

about 2 months ago

Bernie was right. I now support single-payer. Taking the profit model out of healthcare is the only way to bring costs down, so all Californians can afford care.

674

436

105

111K

A.J. Feather @AJFeather

2 months ago

@GregoryWilken @sweatystartup In New Jersey, we’re not allowed to have paper either.

A.J. Feather @AJFeather

3 months ago

@billmurphy Huh, alright. Might have to finally give this a shot. I was worried I was just going to be running up massive token bills. This would be better.

A.J. Feather @AJFeather

3 months ago

I don’t understand how this is possible when I - and hundreds of thousands like me - pay NY state income tax and LIVE and spend 90% of our time in other states. Even when I was 100% remote at my last job, I paid New York State income tax from my basement in New Jersey!

Gregory Kennedy

@gregorykennedy

3 months ago

I didn’t believe it until I saw the video. The governor of NY is begging rich people to move back to fund her social programs. So, like, I guess there are consequences to over taxation after all?

726

591

356K

A.J. Feather @AJFeather

3 months ago

Not as impressive as @karpathy but I had to write a fuzzy search function the other day. I told Copilot “write a function that finds the correct rows in this csv most of the time. Here’s some input and the expected output” Took it 10 minutes.

Andrej Karpathy

@karpathy

3 months ago

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project. This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.: - It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work. - It found that the Value Embeddings really like regularization and I wasn't applying any (oops). - It found that my banded attention was too conservative (i forgot to tune it). - It found that AdamW betas were all messed up. - It tuned the weight decay schedule. - It tuned the network initialization. This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism. https://t.co/WAz8aIztKT All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges. And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

karpathy's tweet photo. Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes, today I measured that the leaderboard's "Time to GPT-2" drops from 2.02 hours to 1.80 hours (~11% improvement), this will be the new leaderboard entry. So yes, these are real improvements and they make an actual difference. I am mildly surprised that my very first naive attempt already worked this well on top of what I thought was already a fairly manually well-tuned project.

This is a first for me because I am very used to doing the iterative optimization of neural network training manually. You come up with ideas, you implement them, you check if they work (better validation loss), you come up with new ideas based on that, you read some papers for inspiration, etc etc. This is the bread and butter of what I do daily for 2 decades. Seeing the agent do this entire workflow end-to-end and all by itself as it worked through approx. 700 changes autonomously is wild. It really looked at the sequence of results of experiments and used that to plan the next ones. It's not novel, ground-breaking "research" (yet), but all the adjustments are "real", I didn't find them manually previously, and they stack up and actually improved nanochat. Among the bigger things e.g.:

- It noticed an oversight that my parameterless QKnorm didn't have a scaler multiplier attached, so my attention was too diffuse. The agent found multipliers to sharpen it, pointing to future work.
- It found that the Value Embeddings really like regularization and I wasn't applying any (oops).
- It found that my banded attention was too conservative (i forgot to tune it).
- It found that AdamW betas were all messed up.
- It tuned the weight decay schedule.
- It tuned the network initialization.

This is on top of all the tuning I've already done over a good amount of time. The exact commit is here, from this "round 1" of autoresearch. I am going to kick off "round 2", and in parallel I am looking at how multiple agents can collaborate to unlock parallelism.
https://t.co/WAz8aIztKT

All LLM frontier labs will do this. It's the final boss battle. It's a lot more complex at scale of course - you don't just have a single train. py file to tune. But doing it is "just engineering" and it's going to work. You spin up a swarm of agents, you have them collaborate to tune smaller models, you promote the most promising ideas to increasingly larger scales, and humans (optionally) contribute on the edges.

And more generally, *any* metric you care about that is reasonably efficient to evaluate (or that has more efficient proxy metrics such as training a smaller network) can be autoresearched by an agent swarm. It's worth thinking about whether your problem falls into this bucket too.

960

20K

11K

A.J. Feather @AJFeather

4 months ago

@Austen It helps, for sure. I finish PRs much quicker. I pick up more. Could I do everyone else’s tickets too? No! I’d be launching more agents than I’d be able to review, and the bugs that slip through would slowly consume all of my time.

A.J. Feather @AJFeather

6 months ago

@nikitabier Turns out the coding and video editing weren’t the bottleneck.

A.J. Feather @AJFeather

6 months ago

@bcherny @palashkaria @karpathy Having a working integration test suite with existing examples has been a massive help for me. “Test the endpoint you just built by calling it after x but before y in this test.” Works 80% of the time, and works 100% of the time on the second try 😆

544

AJFeather retweeted

Thomas Chatterton Williams

@thomaschattwill

9 months ago

Speech is not violence. Violence is violence.

132

151

406K

AJFeather retweeted

Ezra Klein

@ezraklein

9 months ago

In the last few years we've seen: - The plot to kidnap Gretchen Whitmer - The Storming of the Capitol and pipe bombs left at the RNC and DNC - The break-in to kidnap Nancy Pelosi and the brutal on Paul Pelosi - Multiple assassination attempts against Trump - The assassination of Minnesota House Speaker Melissa Hortman and her husband and the shooting of on State Senator John Hoffman and his wife - Luigi Mangione's assassination of Brian Thompson - The assassination of Charlie Kirk Political violence is contagious. It is spreading. It is not confined to one side or belief system. It should terrify us all. The foundation of a free society is the ability to participate in it without fear of violence. Political violence is always an attack against us all. You have to be so blind not to see that.

80K

17K

11K

AJFeather retweeted

U.S. Senator John Fetterman

@SenFettermanPA

9 months ago

I condemn this in the strongest terms. There is ZERO place in our great country for these horrendous acts of political violence. We must find a better way forward. May Charlie Kirk have a full and quick recovery.

SenFettermanPA's tweet photo. I condemn this in the strongest terms.

There is ZERO place in our great country for these horrendous acts of political violence.

We must find a better way forward.

May Charlie Kirk have a full and quick recovery. https://t.co/S2DlqVXNai

87K

893

A.J. Feather @AJFeather

10 months ago

The unintended consequences being housing that is affordable

Mayor Karen Bass

@MayorOfLA

10 months ago

Today I signed a City Council resolution opposing SB79 unless it is amended to exempt cities with a state-approved and compliant Housing Element. While I support the intent to accelerate housing development statewide, as written, this bill risks unintended consequences for LA.

609

247

375

A.J. Feather @AJFeather

12 months ago

@JamesCCortes @OneJerseySchorr It’s currently a 1% fee on all properties greater than $1 million which is also a lot of single family homes particularly in Northern Jersey. The proposal doubles it to 2%

A.J. Feather @AJFeather

12 months ago

@craigarnzen @willscharf Well hopefully the dorms are nicer. Lol

A.J. Feather

@AJFeather

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users