hung

almost 2 years ago

becoming uncommon amongst uncommons

1

5

0

6K

about 2 hours ago

about 10 hours ago

Your Codex activity now has a home, and an easier way to share it. Codex profiles show your activity graph, streaks, lifetime tokens, peak daily tokens, and top features like plugins and /fast mode. Private by default. Share a card when you want to.

127

2K

118

466

165K

0

27

hungtran retweeted

thebes

@voooooogel

4 days ago

@QiaochuYuan there's something quite weird with how 4.8 has learned to 'push back' that seems related to this, too. like deliberate strawman counterarguments that are chosen to be easy to knock down, playing fake-high within low specifically to give User the chance to get a reversal and win

23

695

15

90

139K

Tweeting reminders to myself

1 day ago

@julesrosenberg @rauchg very helpful!

0

1

0

97

Who to follow

Nacho Lafuente

@naxolg

#Javascript Senior developer/Teach Lead. Father, freak... async human. #Maker #OrkKillTeamPlayer

1 day ago

@eastdakota how exactly do you determine a request is from a bot or human?

10

22

1

3

37K

2 days ago

opus 4.8 has +1.11% gain, +12 almost-resolved tasks but is ~4.2x slower and ~5.2x more expensive compared to gpt-5.5. in other words, would you hire the software engineer who can deliver code with more edge cases covered but takes far longer and costs far more? there are many good decisions in the benchmark design that I learned about through implementing programbench on our site. check out the results and let us know your thoughts!

2 days ago

ProgramBench is now live on the Vals site! Opus 4.8 is the first model to fully solve 2 tasks, but this comes at an extremely high cost.

ValsAI's tweet photo. ProgramBench is now live on the Vals site! Opus 4.8 is the first model to fully solve 2 tasks, but this comes at an extremely high cost. https://t.co/Oq7tt1itmP

14

192

16

46

66K

0

2

0

1

196

2 days ago

@Deepak_Reddy17 @ValsAI We updated the result. It stands #15 in Vibe Code Bench. Check it out here https://t.co/m6bv6jXaXu

0

21

hungtran retweeted

8 days ago

Anthropic just dropped another powerhouse model, Opus 4.8 and it’s the new SOTA on the Vals Index (70.2%) and Vals Multimodal (70.7%). Full results below.

ValsAI's tweet photo. Anthropic just dropped another powerhouse model, Opus 4.8 and it’s the new SOTA on the Vals Index (70.2%) and Vals Multimodal (70.7%). Full results below. https://t.co/HIPy2VNsSE

2

117

11

18

47K

8 days ago

@OfirPress curious to know your thoughts on this plot https://t.co/dVabvvak7k. would introducing compaction/ truncation helps with smaller context models?

Lisan al Gaib

@scaling01

8 days ago

this looks much better

1

105

2

13

13K

0

155

10 days ago

new open-weight sota model on vals index

10 days ago

Qwen 3.7 Max is Alibaba's latest reasoning model ranking 5th on the Vals Index with a score of 57.3%. We ran it across our full benchmark suite. Full results below

ValsAI's tweet photo. Qwen 3.7 Max is Alibaba's latest reasoning model ranking 5th on the Vals Index with a score of 57.3%. We ran it across our full benchmark suite. Full results below https://t.co/zcRWbj864x

4

77

3

14

5K

1

3

0

221

10 days ago

apply if you want to shape the next evolution of ai evaluation with us!

george hotz archive @geohotarchive

10 days ago

Pitch us a benchmark or eval technique. We'll fund you to build it. We're opening applications for the Vals Fellowship. 3–6 months working on the hardest open problems in AI evaluation, with the resources to actually solve them. What you get: - Unlimited API credits + budget capacity for GPUs and human data - Vals’ evaluation infrastructure - $1,000–2,500 / week stipend - A network of evals researchers across frontier labs and academia Location: Both remote / in-person in SF applications will be considered

ValsAI's tweet photo. Pitch us a benchmark or eval technique. We'll fund you to build it.

We're opening applications for the Vals Fellowship. 3–6 months working on the hardest open problems in AI evaluation, with the resources to actually solve them.

What you get:
- Unlimited API credits + budget capacity for GPUs and human data
- Vals’ evaluation infrastructure
- $1,000–2,500 / week stipend
- A network of evals researchers across frontier labs and academia

Location: Both remote / in-person in SF applications will be considered

23

513

38

861

96K

0

9

1

10

2K

hungtran retweeted

gabe

@allgarbled

about 1 year ago

To succeed on twitter you have to understand that everything is a matter of life and death. You don’t just post a picture of your sandwich and say “this was a good sandwich.” You post it and say “never kill yourself.” The sandwich represents life.

39

5K

415

287

125K

hungtran retweeted

Anthropic

@AnthropicAI

11 days ago

Anthropic co-founder Chris Olah was invited to speak at today's presentation of Pope Leo XIV's encyclical "Magnifica humanitas." Read the full text of his remarks: https://t.co/CoBfkVOVcy

418

4K

673

1K

1M

hungtran retweeted

12 days ago

The Eternal Sloptember https://t.co/kFIW7LNhNd

78

1K

191

939

572K

14 days ago

@clayhaight huge!

0

1

0

101

hungtran retweeted

Franz von Holzhausen

@woodhaus2

15 days ago

After nearly 18 years I can stop working on Model S and X. We put so much love into these products, but will continue to pour that into the future products. Thanks to everyone who believed in and supported these cars through the years. We strived for the best and will never stop. Saying goodbye to something great and making room for something even greater!

woodhaus2's tweet photo. After nearly 18 years I can stop working on Model S and X. We put so much love into these products, but will continue to pour that into the future products. Thanks to everyone who believed in and supported these cars through the years. We strived for the best and will never stop. Saying goodbye to something great and making room for something even greater!

2K

27K

2K

926

15M

hungtran retweeted

Dan Go

@CoachDanGo

16 days ago

Something I noticed when I visited China was public schools always started their days off with a run. A school in Naperville, Illinois, did an experiment on this and called it "Zero Hour". Before school, students would hit the gym at 7am and push their heart rates to 80% of their max. Then went on to do class. The result? Reading scores doubled. Math scores jumped 20x. On an international test, Naperville 8th graders finished 1st in science (beating Singapore) and 6th in math globally. Some of my entrepreneur clients swear by doing cardio in the morning. They say it keeps their brain sharp. I don't disagree. Cardio isn't just for your heart. It's brain fuel. Exhaust the body to sharpen the mind.

CoachDanGo's tweet photo. Something I noticed when I visited China was public schools always started their days off with a run.

A school in Naperville, Illinois, did an experiment on this and called it "Zero Hour".

Before school, students would hit the gym at 7am and push their heart rates to 80% of their max. Then went on to do class.

The result? Reading scores doubled. Math scores jumped 20x.

On an international test, Naperville 8th graders finished 1st in science (beating Singapore) and 6th in math globally.

Some of my entrepreneur clients swear by doing cardio in the morning. They say it keeps their brain sharp. I don't disagree.

Cardio isn't just for your heart. It's brain fuel.

Exhaust the body to sharpen the mind.

179

8K

1K

2K

993K

hungtran retweeted

OpenAI

@OpenAI

16 days ago

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

1K

27K

4K

9K

14M

hungtran retweeted

Dwarkesh Patel

@dwarkesh_sp

21 days ago

New blackboard lecture w @ericjang11 He walks through how to build AlphaGo from scratch, but with modern AI tools. Sometimes you understand the future better by stepping backward. AlphaGo is still the cleanest worked example of the primitives of intelligence: search, learning from experience, and self-play. You have to go back to 2017 to get insight into how the more general AIs of the future might learn. Once he explained how AlphaGo works, it gave us the context to have a discussion about how RL works in LLMs and how it could work better – naive policy gradient RL has to figure out which of the 100k+ tokens in your trajectory actually got you the right answer, while AlphaGo’s MCTS suggests a strictly better action every single move, giving you a training target that sidesteps the credit assignment problem. The way humans learn is surely closer to the second. Eric also kickstarted an Autoresearch loop on his project. And it was very interesting to discuss which parts of AI research LLMs can already automate pretty well (implementing and running experiments, optimizing hyperparameters) and which they still struggle with (choosing the right question to investigate next, escaping research dead ends). Informative to all the recent discussion about when we should expect an intelligence explosion, and what it would look like from the inside. Timestamps: 0:00:00 – Basics of Go 0:08:06 – Monte Carlo Tree Search 0:31:53 – What the neural network does 1:00:22 – Self-play 1:25:27 – Alternative RL approaches 1:45:36 – Why doesn’t MCTS work for LLMs 2:00:58 – Off-policy training 2:11:51 – RL is even more information inefficient than you thought 2:22:05 – Automated AI researchers

65

3K

286

3K

682K

hungtran retweeted

Trung Phan

@TrungTPhan

26 days ago

Still incredible that the DeepMind documentary has footage of exact moment Demis is told that AlphaFold can “easily” predict all known (1-2B) protein sequences “in a month” and he says to do it. Then, it shows the moment AlphaFold is released to the world.

58

7K

442

3K

1M

28 days ago

very close to move away from tmux setup

0

47