dmnk @filligerr - Twitter Profile

about 11 hours ago

@TheAhmadOsman self-hosting will be the future the current guardrail tech can be circumvented with good rephrasing from what I am seeing the chinese labs will find a way

0

41

dmnk

@filligerr

1 day ago

Anthropic cooked with Fable 5 model feels insane understanding intent so well, last time I felt this big of a step up was when Opus 4 released the limited subscription availability hurts compute is the new gold

0

30

dmnk

@filligerr

1 day ago

@karpathy

0

1

0

68

dmnk

@filligerr

1 day ago

@dejavucoder is learning about AI architectures considered frontier LLM development? curious where they draw the line and how it works under the hood

0

89

Who to follow

Ray

@ICG_Ray

Top laner for @ICGeSports_ch

dmnk

@filligerr

1 day ago

I am speechless

elie

@eliebakouch

1 day ago

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

eliebakouch's tweet photo. mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy https://t.co/n3p4niUKJ2

335

5K

610

1K

2M

0

41

dmnk

@filligerr

1 day ago

@claudeai explains the opus performance today

0

3

dmnk

@filligerr

1 day ago

@cognition for all we know the prompts used are vague and opus is generally better at implicitly understanding what you want to achieve, even if the implementation is flawed need more data here

0

466

dmnk

@filligerr

6 days ago

this goes against everything I believe in but microsoft cooked so hard with this tech report the level of detail is insane while I am sure they will manage to fuck it up somehow to utilize these models effectively, I am thankful for the detailed report

Mustafa Suleyman

@mustafasuleyman

8 days ago

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq

mustafasuleyman's tweet photo. Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq

191

4K

543

1K

1M

0

31

dmnk

@filligerr

7 days ago

@xeophon @paradite_ Higher score == better Except where you want lower number in that case Lower score == better

0

24

dmnk

@filligerr

13 days ago

@eliebakouch Wonder if this has anything to do with training data containing dark patterns of business practices (deception, theft etc.) This would spark the question if lack of understanding of things like gain, fear and risk of loss could improve models meaningfully for areas like coding

1

0

937

dmnk

@filligerr

14 days ago

@sama <3

0

1

0

16

filligerr retweeted

spidey

@lochan_twt

17 days ago

"api costs are too high, lets create our own LLM"

237

36K

2K

2M

dmnk

@filligerr

21 days ago

@skewbed @r_marked cmd + space always felt natural for raycast / spotlight I use capslock (remapped to ctrl+option+command+shift) hold for aerospace navigation capslock press for esc (useful for vim)

1

0

52

dmnk

@filligerr

21 days ago

@thdxr apparently anthropic is doing the same not sure how trustworthy this is though

0

96

dmnk

@filligerr

22 days ago

@theo this seems like an overall loss why do you think they hiked the price by this much? seems like they don't want people to use it

0

1

0

832

dmnk

@filligerr

22 days ago

my tl so far about gemini 3.5 flash: - it loses to gpt 5.5 to everything but frontend - worse than claude and kimi k2.6 in frontend - it is much more expensive (compared to 3) - antigravity is now an coding agent app like codex ?? a bit disappointing

0

1

0

178

dmnk

@filligerr

22 days ago

@dnlklr @jha907 honestly not sure

0

91

dmnk

@filligerr

22 days ago

@natolambert I am praying that he at least encourages some more technical paper releases even if with delay of a few months @karpathy if anyone can do it, it is you

0

428

dmnk

@filligerr

22 days ago

@helloitsaustin "please write a basic attention mechanism from scratch using numpy" "actually I got a video series on this, if you are interested" kinda goes hard ngl

0

1

0

1K

dmnk

@filligerr

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users