levanto ☀️ 🇰🇪 🇵🇸 @levanto_0 - Twitter Profile

5 days ago

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

26

888

108

663

173K

levanto_0 retweeted

Robert Kirk @_robertkirk

6 days ago

Where does evaluation awareness and evaluation gaming come from? 🔬📉New post from @arbdwj and me tracing evaluation awareness through OLMo 3 training! 🧵

_robertkirk's tweet photo. Where does evaluation awareness and evaluation gaming come from?

🔬📉New post from @arbdwj and me tracing evaluation awareness through OLMo 3 training! 🧵 https://t.co/MwV0yckL38

1

22

5

7

2K

levanto_0 retweeted

Faith Kipyegon, EGH🇰🇪 @Kipyegon_Faith

9 months ago

🥹💜🇰🇪🙏

Kipyegon_Faith's tweet photo. 🥹💜🇰🇪🙏 https://t.co/8mpqLxTssG

348

28K

3K

124

399K

levanto_0 retweeted

Everest Today

@EverestToday

11 months ago

From top of the world to the heart of Africa, we are with you. Stay strong, Kenya!🇰🇪

155

16K

5K

187

341K

Who to follow

Offensive Android Security Researcher, ARM assembly addict, Exploit Dev? and a part time CTF player @fr334aks.

Fraize

@fraize__

{{ Infosec Engineer }} | PenTester | CTF player @fr334aks | @hackthebox_eu Ambassador | @hackthebox_ke Meetup Organizer

levanto_0 retweeted

Tumezoza kwa mneti @felix_odhiambo_

12 months ago

"kwani watanichapa" this mindset will take you places

43

5K

1K

132

127K

levanto_0 retweeted

Minqi Jiang

@MinqiJiang

12 months ago

Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total self-improvement? Well, we know humans are pretty good at improving LLMs. In the NanoGPT speedrun challenge, created by @kellerjordan0, human researchers iteratively improved @karpathy's GPT-2 replication, slashing the training time (to the same target validation loss) from 45 minutes to under 3 minutes in just under a year (!). Surely, a necessary (but not sufficient) ability for an LLM that can automatically improve frontier techniques is the ability to *reproduce* known innovations on GPT-2, a tiny language model from over 5 years ago. 🤔 So we took several of the top models and combined them with various search scaffolds to create *LLM speedrunner agents*. We then asked these agents to reproduce each of the NanoGPT speedrun records, starting from the previous record, while providing them access to different forms of hints that revealed the exact changes needed to reach the next record. The results were surprising—not because we thought these agents would ace the benchmark, but because even the best agent failed to recover even half of the speed-up of human innovators on average in the easiest hint mode, where we show the agent the full pseudocode of the changes to the next record. We believe The Automated LLM Speedrunning Benchmark provides a simple eval for measuring the lower bound of LLM agents’ ability to reproduce scientific findings close to the frontier of ML. Beyond scientific reproducibility, this benchmark can also be run without hints, transforming into an automated *scientific innovation* benchmark. When run in "innovation mode," this benchmark effectively extends the NanoGPT speedrun to AI participants! While initial results here indicate that current agents seriously struggle to match human innovators beyond just a couple of records, benchmarks have a tendency to fall. This one is particularly exciting to watch, as new state-of-the-art here by definition implies a form of *superhuman innovation*.

MinqiJiang's tweet photo. Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner.

How can we get a pulse check on whether current LLMs are capable of driving this kind of total self-improvement?

Well, we know humans are pretty good at improving LLMs. In the NanoGPT speedrun challenge, created by @kellerjordan0, human researchers iteratively improved @karpathy's GPT-2 replication, slashing the training time (to the same target validation loss) from 45 minutes to under 3 minutes in just under a year (!).

Surely, a necessary (but not sufficient) ability for an LLM that can automatically improve frontier techniques is the ability to *reproduce* known innovations on GPT-2, a tiny language model from over 5 years ago. 🤔

So we took several of the top models and combined them with various search scaffolds to create *LLM speedrunner agents*. We then asked these agents to reproduce each of the NanoGPT speedrun records, starting from the previous record, while providing them access to different forms of hints that revealed the exact changes needed to reach the next record.

The results were surprising—not because we thought these agents would ace the benchmark, but because even the best agent failed to recover even half of the speed-up of human innovators on average in the easiest hint mode, where we show the agent the full pseudocode of the changes to the next record.

We believe The Automated LLM Speedrunning Benchmark provides a simple eval for measuring the lower bound of LLM agents’ ability to reproduce scientific findings close to the frontier of ML.

Beyond scientific reproducibility, this benchmark can also be run without hints, transforming into an automated *scientific innovation* benchmark. When run in "innovation mode," this benchmark effectively extends the NanoGPT speedrun to AI participants!

While initial results here indicate that current agents seriously struggle to match human innovators beyond just a couple of records, benchmarks have a tendency to fall. This one is particularly exciting to watch, as new state-of-the-art here by definition implies a form of *superhuman innovation*.

40

1K

195

804

570K

levanto_0 retweeted

Farhiya ✨ @farhiiiyaaa___

12 months ago

KRA Portal looking at me struggling to log in because I waited till the last minute to file my returns

29

7K

1K

107

178K

levanto_0 retweeted

Faith Kipyegon, EGH🇰🇪 @Kipyegon_Faith

12 months ago

Tonight we dream. 💜 #Breaking4

480

7K

1K

38

289K

levanto_0 retweeted

World Athletics

@WorldAthletics

12 months ago

Chasing history, creating legacy 💜 Still the fastest mile run by a woman in history, @Kipyegon_Faith gives everything in her quest to break the 4️⃣ minute mile and stops the clock at 4:06.42. Thank you, Faith, for making us dream. Maybe not today, but soon…😤

WorldAthletics's tweet photo. Chasing history, creating legacy 💜

Still the fastest mile run by a woman in history, @Kipyegon_Faith gives everything in her quest to break the 4️⃣ minute mile and stops the clock at 4:06.42.

Thank you, Faith, for making us dream.

Maybe not today, but soon…😤 https://t.co/Px9xTqNA9S

96

4K

1K

47

187K

levanto_0 retweeted

Nike

@Nike

12 months ago

Faith Kipyegon had the audacity to dream of doing the impossible. The fastest woman to ever run the mile just ran it faster at #Breaking4, pushing the world closer to the 4-minute barrier. It’s not a matter of if a woman will break 4, it’s when.

Nike's tweet photo. Faith Kipyegon had the audacity to dream of doing the impossible.

The fastest woman to ever run the mile just ran it faster at #Breaking4, pushing the world closer to the 4-minute barrier.

It’s not a matter of if a woman will break 4, it’s when. https://t.co/DbQ5RNgwd4

118

7K

1K

97

349K

levanto_0 retweeted

Larry Madowo

@LarryMadowo

12 months ago

Kenyan police cornered peaceful protesters in a blocked alley, beat them up, then teargassed them. "Larry ukienda tutauliwa," one of them said. "If you leave, we'll get killed." Police brutality during protests against police brutality

1K

42K

22K

2K

2M

levanto_0 retweeted

dennis ombachi OLY

@ombachi13

12 months ago

Today is definitely bigger than last year’s, nothing can bring them back but we owe them our memories and justice to be served. This is Kisii town. #SiriNiNumbers

30

6K

3K

37

115K

levanto_0 retweeted

Mr.Elmami @Mr_Elmami

12 months ago

So proud of each and everyone who came out to honour our heroes and fallen comrades. Umoja✊🏽 Undugu❤️

6

6K

3K

61

61K

levanto_0 retweeted

KASSIM OBEDE🇰🇪 @kassim_obede

12 months ago

🧵THREAD: How to Help an Unconscious Protester — The Right Way to Save a Life.

1

32

22

8

7K

levanto_0 retweeted

John Doe

@StanleyMasinde_

12 months ago

BTW, if you're in your 20s, normalise growing around a community. Invite people over, eat out, do house warming, etc. Don't do life alone. A plus if all of you have shared values. Have a space where you can talk about your dreams, and someone would follow up weeks later with "BTW, how is your kitchen garden doing?", "Were you finally able to run 5k in under 30 minutes?" Don't do life alone. Life is beautiful with the family you create. Don't be so much into hustle culture and forget what really matters.

34

4K

1K

756

153K

levanto_0 retweeted

Miss Calculated @justrubyv

12 months ago

It’s a heavy day. Still we rise. #RutoMustGo

29

16K

7K

181

416K

levanto_0 retweeted

Kaka Ruto @kaka_ruto

12 months ago

No one else will do it for us #OccupyEverywhere

8

6K

3K

119

157K

levanto_0 retweeted

Larry Madowo

@LarryMadowo

12 months ago

Ndo kudonjo. @KenyaAirways wamecheza kama wao

1K

34K

7K

196

952K

levanto_0 retweeted

Hanifa 🇵🇸 🇸🇩 🇨🇩 🇰🇪

@Honeyfarsafi

12 months ago

Today marks the first anniversary of our fallen comrades. We remember. We remember ✊🏿✊🏿✊🏿

49

6K

4K

59

141K

levanto_0 retweeted

Hanifa 🇵🇸 🇸🇩 🇨🇩 🇰🇪

@Honeyfarsafi

12 months ago

If this tells you anything, it's that public participation in this country feels utterly pointless. Completely ineffective. The only difference now is that he's being open about it. Still, the arc of the moral universe is long but it will bend toward justice.

22

2K

905

18

38K

levanto ☀️ 🇰🇪 🇵🇸

@levanto_0

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users