Malthe Have Musaeus 👨‍💻

Verified account

@malthe8

I dissect neural networks for fun 🧠. Passionate about AI 🌵. Expect to see Tweets about real world AI use cases 📃

Home

Joined March 2012

270 Following

291 Followers

108 Posts

malthe8 retweeted

10 days ago

Great work by @Vtrivedy10 @nikogrupen et al - great to see these results in Law, mirroring our experiments published yesterday in Medicine 1. Batch grading reduces cost by ~1 OOM 2. Small models reduce cost by ~2 OOM In non-verifiable RL, where judge latency blocks samples reaching the trainer, judge selection is a crucial knob for training efficiency. Medicine: https://t.co/CTQr1do8kn

1

23

9

10

3K

malthe8 retweeted

17 days ago

We're releasing early results from training Kos-1 Experimental, a Kimi K2.5 checkpoint post-trained on the same medical RL data we used for Kos-1 Lite. As clinical workloads become more agentic, we wanted a model that pairs medical domain knowledge with tool-calling knowhow.

bertgodel's tweet photo. We're releasing early results from training Kos-1 Experimental, a Kimi K2.5 checkpoint post-trained on the same medical RL data we used for Kos-1 Lite.

As clinical workloads become more agentic, we wanted a model that pairs medical domain knowledge with tool-calling knowhow. https://t.co/zRkJqoGxut

4

48

16

8

4K

malthe8 retweeted

3 months ago

We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models.

$bertgodel's tweet photo. We’re announcing Kos-1 Lite, a medical model that achieves SOTA on HealthBench Hard at 46.6%. As a medium sized language model (~100B), it achieves these results at a fraction of the serving cost of frontier trillion-parameter models. https://t.co/27sxAHPgZM$

40

321

59

119

27K

malthe8 retweeted

6 months ago

(1/5) New post: "Mismatch Praxis: Rollout Settings and IS Corrections". We pressure-tested solutions for inference/training mismatch. Inference/training mismatch in modern RL frameworks creates a hidden off-policy problem. To resolve the mismatch, various engineering (e.g., FP16 unification, deterministic kernels) and algorithmic (e.g., importance sampling) fixes have been proposed. In this work, we examine how rollout settings (temp, top-p, and top-k) affect mismatch, and how importance sampling corrections bear out in practice. We find that while Sequence-TIS is theoretically optimal, it can succumb to catastrophic variance in long-horizon contexts. Additionally, non-standard rollout settings create subtle mismatch patterns that require careful engineering fixes. Token-TIS with default rollout settings proved to be the most robust setting for long-horizon training.

bertgodel's tweet photo. (1/5)
New post: "Mismatch Praxis: Rollout Settings and IS Corrections". We pressure-tested solutions for inference/training mismatch.

Inference/training mismatch in modern RL frameworks creates a hidden off-policy problem. To resolve the mismatch, various engineering (e.g., FP16 unification, deterministic kernels) and algorithmic (e.g., importance sampling) fixes have been proposed. In this work, we examine how rollout settings (temp, top-p, and top-k) affect mismatch, and how importance sampling corrections bear out in practice.

We find that while Sequence-TIS is theoretically optimal, it can succumb to catastrophic variance in long-horizon contexts. Additionally, non-standard rollout settings create subtle mismatch patterns that require careful engineering fixes. Token-TIS with default rollout settings proved to be the most robust setting for long-horizon training.

8

137

43

67

31K

Who to follow

Verified account

Moving pixels and coloring buttons with AI. Vibe coding technical debt.

Jitesh Gupta🤠

26 | Travelling ✈️�� | Badminton🏸 & TT🏓 Sharing my personal experience in Marketing, Sales, Entrepreneurship & SEO

Solution Architect full time, @testtrackio and https://t.co/uPv3XAfWab in my spare time #buildinpublic Discord ➡ https://t.co/qF2kDrSuGp 🏳️‍🌈

Malthe Have Musaeus 👨‍💻

about 2 years ago

@programmerByDay Take time to learn new stuff. Always a problem not knowing enough 😃

0

3

0

0

14

Malthe Have Musaeus 👨‍💻

about 2 years ago

@chiswanjo Good thanks for asking! Building a small tool to help improve landing page conversion rates. But most importantly, I'm finishing up my thesis due in two weeks 🤓

1

0

0

0

14

Malthe Have Musaeus 👨‍💻

about 2 years ago

@chiswanjo That's awesome man!

1

0

0

0

15

Malthe Have Musaeus 👨‍💻

about 2 years ago

@SvenVD_Zee Because custom modals look better and styling is more consistent across browsers/devices

0

3

0

0

25

Malthe Have Musaeus 👨‍💻

about 2 years ago

@gabriel__xyz Hi! I'm a data science student currently writing my bachelor's thesis about systematically reducing parameters in LLMs. I tweet about cool stuff I find in the AI field and side projects I'm doing. Happy to connect!

0

0

0

0

28

Malthe Have Musaeus 👨‍💻

about 2 years ago

@programmerByDay @gabriel__xyz Hi Arman! Cool, I just got back from Australia after spending 4 months on the east coast studying. Loved it there. Would love to follow along your journey!

0

1

0

0

42

Malthe Have Musaeus 👨‍💻

about 2 years ago

@andraskindler @gabriel__xyz Sleek product. Would love to connect!

0

0

0

0

14

Malthe Have Musaeus 👨‍💻

about 2 years ago

@mohdkil Exciting, looking forward to following along 🫡

0

0

0

0

83

Malthe Have Musaeus 👨‍💻

about 2 years ago

When @huggingface is down 😢 Makes you realize how easy it is to hit bottlenecks when you rely on a few large components... #huggingface

0

3

0

0

633

Malthe Have Musaeus 👨‍💻

about 2 years ago

@SvenVD_Zee Cool idea. I'll be looking forward to following along!

0

0

0

0

8

Malthe Have Musaeus 👨‍💻

about 2 years ago

@_TripathiJi How about 4 more? 😁

malthe8's tweet photo. @_TripathiJi How about 4 more? 😁 https://t.co/6TC8M93LXV

1

0

0

0

44

Malthe Have Musaeus 👨‍💻

about 2 years ago

@marclou Interesting! Is the free instance enough for your traffic?

0

0

0

0

74

Malthe Have Musaeus 👨‍💻

about 2 years ago

@chiswanjo That's the mindset! But of course feel free to feel happy after getting a sale

1

0

0

0

13

Malthe Have Musaeus 👨‍💻

about 2 years ago

@pmitu Uff thanks for sharing. This is great!

1

1

0

0

26

Malthe Have Musaeus 👨‍💻

about 2 years ago

Spain has some of the coolest trains. Not the regular commuter trains, but the high speed ones This is a Frecciarossa 1000 that runs in Italy, France and Spain. Barcelona -> Madrid is done in just 3.5 hours and you go from center to center while working comfortably.

malthe8's tweet photo. Spain has some of the coolest trains. Not the regular commuter trains, but the high speed ones

This is a Frecciarossa 1000 that runs in Italy, France and Spain.

Barcelona -> Madrid is done in just 3.5 hours and you go from center to center while working comfortably. https://t.co/TQSkEgSIPb

0

0

0

0

85

Malthe Have Musaeus 👨‍💻

about 2 years ago

The beauty of training 12 TinyBERT models while cruising 300 km/h across the Spanish country side in a bullet train. Training in a train. Damn I love technology 🤖 🚆 #indiehackers #buildinpublic

malthe8's tweet photo. The beauty of training 12 TinyBERT models while cruising 300 km/h across the Spanish country side in a bullet train. Training in a train. Damn I love technology 🤖 🚆

#indiehackers #buildinpublic https://t.co/5UTM0AW3ns

1

2

0

0

142

Last Seen Users on Sotwe

Trends for you

Most Popular Users

Olivia

Online

✨

⭐

💫