doug @Dougmoo - Twitter Profile

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

1K

23K

3K

19K

13M

Dougmoo retweeted

Tibo

@thsottiaux

about 1 month ago

You can now keep codex going for days. With GPT-5.5 it will build an entire OS kernel for you if you ask, or find critical bugs in a codebase, or optimize your database schemas, or… the options are endless.

332

5K

254

2K

708K

Dougmoo retweeted

Chubby♨️

@kimmonismus

about 2 months ago

GPT-5.5 spotted on open router. Just a few more hours friends! And a new era awaits

31

695

36

26

37K

Dougmoo retweeted

Felipe Demartini

@namcios

about 2 months ago

Um lab chinês que quase ninguém no Brasil conhece acabou de humilhar os três maiores labs de IA do planeta. Modelo open-source. Pesos no HuggingFace. Gratuito. E bate Claude Opus 4.6, GPT-5.4 e Gemini 3.1 Pro em 6 benchmarks. Não é exagero. A Moonshot lançou o Kimi K2.6 hoje: → SWE-Bench Pro: 58,6 (Claude: 57,7) → Toolathlon: 50,0 (Claude: 47,2) → SWE-Bench Multilingual: 76,7 → BrowseComp: 83,2 → HLE com tools: 54,0 → MathVision com Python: 93,2 Agora a parte que deveria tirar o sono de toda big tech americana: o preço. Kimi K2.6 via API: $0,60/milhão de tokens de input. $2,50 de output. Claude Sonnet 4.6: $3,00 e $15,00. 5x mais barato no input. 6x no output. E como os pesos são abertos, qualquer empresa com GPUs roda sem pagar nada para a Moonshot. Mas o número mais assustador não é benchmark nem preço. É velocidade de execução. O modelo rodou 4.000+ tool calls em uma sessão única. 12 horas de execução contínua. 300 sub-agentes em paralelo. Pegou um modelo local, reescreveu a inferência inteira em Zig, e foi de 15 tokens/segundo para 193. Sozinho. Um engenheiro de software autônomo que trabalha 12 horas sem parar e não cobra salário. Open-source. A OpenAI cobra $200/mês pelo Pro. A Anthropic levantou $60 bilhões em valuation. O Google queima $75 bilhões por ano em infraestrutura. E um lab de Pequim, com uma fração desse capital, está entregando de graça o que essas empresas dizem aos investidores que custa dezenas de bilhões para construir. A cadência é o que mata. K2 em julho de 2025. K2.5 em janeiro de 2026. K2.6 agora. A cada 8 semanas a Moonshot solta um modelo que come mais um pedaço do moat dos labs fechados. Dessa vez, em benchmarks agênticos, o moat evaporou. Em janeiro o DeepSeek evaporou $600 bilhões da Nvidia em um único dia e forçou a OpenAI a tornar o ChatGPT gratuito na mesma semana. Agora a Moonshot fez de novo. Essa é a segunda vez em quatro meses. Vai ter uma terceira.

43

2K

225

2K

335K

Dougmoo retweeted

Rowan Cheung

@rowancheung

about 2 months ago

Microsoft's AI can now detect cancer from a $10 tissue sample. For context, every time tumor cells are tested, doctors create a basic microscope slide to study tissue up close. These slides show cell shapes and structures, but they can't reveal which immune cells are actually fighting the cancer. That deeper picture is critical for knowing if a patient will respond to immunotherapy... But the advanced imaging needed costs THOUSANDS. So Microsoft built GigaTIME -- an AI system that generates advanced imaging from the cheap slides hospitals already collect. The system was trained on 40 million cancer cells, then applied to over 14,000 patients across 51 hospitals spanning 24 cancer types. The AI found over 1,200 hidden connections between immune cell behavior and tumor growth that researchers couldn't find before... because the data simply didn't exist at this scale. When validated against 10,000 additional patients from a completely separate database, the results held up. The model is now open source, so any hospital worldwide can use it on samples they already have. I think this is one of the most impactful AI papers I've seen this year!

21

409

100

227

43K

Dougmoo retweeted

Haider.

@haider1

about 2 months ago

BREAKING: anthropic "mythos" has escaped

9

339

13

40

14K

Dougmoo retweeted

𝗖𝗮𝗿𝗹𝗼𝘀 𝗔𝗱𝗮𝗺𝘀

@carlosadams

2 months ago

> Un tío usa ChatGPT Pro y Claude Opus para analizar 100 PDFs de historial médico de una paciente con cáncer metastásico > Unifica todo en un solo archivo con OCR > Lanza el mismo prompt en ambos modelos a la vez > Luego enfrenta un modelo contra el otro: "otro comité de expertos opina esto, ¿cómo lo ves?" > Repite 5 veces hasta que ambos dicen que no pueden mejorar más > El resultado: tests adicionales, pruebas nuevas, una dimensión del caso que ningún médico había visto La Sanidad Pública debería estar invirtiendo miles de millones en esto. Pero por suerte tenemos a leyendas de internet como Javi haciendo su trabajo.

93

10K

939

4K

2M

Dougmoo retweeted

Chubby♨️

@kimmonismus

2 months ago

Lol what?! Meta has been cooking! These benchmarks are really freaking good holy!!

64

985

56

189

127K

Dougmoo retweeted

Alexandr Wang

@alexandr_wang

2 months ago

1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵

alexandr_wang's tweet photo. 1/ today we're releasing muse spark, the first model from MSL. nine months ago we rebuilt our ai stack from scratch. new infrastructure, new architecture, new data pipelines. muse spark is the result of that work, and now it powers meta ai. 🧵 https://t.co/fThDXdsxwB

741

10K

1K

3K

5M

Dougmoo retweeted

Chubby♨️

@kimmonismus

2 months ago

Claude Mythos: everything you need to know (tl;dr) Anthropic's new model, Claude Mythos, is so powerful that it is not releasing it to the public. Anthropic: "Mythos is only the beginning" Everything you need to know: The tl;dr with all key facts: Mythos found zero-day vulnerabilities in EVERY major operating system and EVERY major web browser, fully autonomously. No human guidance needed. One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete working exploit. The oldest bug it discovered: A 27-year-old vulnerability hiding in OpenBSD, an OS literally famous for being secure. They're NOT releasing it publicly. Instead they formed Project Glasswing with AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike and others, committing $100M to use it defensively. "Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development." The benchmarks are insane: -SWE-bench Verified: 93.9% (vs Opus 4.6: 80.8%) -SWE-bench Pro: 77.8% (vs 53.4%) -USAMO math olympiad: 97.6% (vs 42.3% — not a typo) -Firefox exploit writing: 181 successes vs 2 for Opus 4.6 -Cybench CTF challenges: 100% solve rate -CyberGym: 83.1% vs 66.6% -Humanity's Last Exam: 64.7% vs 53.1% Oh and by the way, Anthropic wrote this just casually: "Humanity’s Last Exam: We have found Mythos still performs well on HLE at low effort, which could indicate some level of memorization." What it actually did: -Found a 27-year-old bug in OpenBSD — famous for its security -Found a 16-year-old FFmpeg bug hit 5 million times by fuzzers without detection -Built a full remote root exploit on FreeBSD (CVE-2026-4747) - completely autonomously -Chained 4 vulnerabilities into a browser sandbox escape -Broke cryptography libraries (TLS, AES-GCM, SSH) -Thousands of critical zero-days found, 99%+ still unpatched -N-day exploit development: under $1,000 and half a day for full root Why they won't release it: -During internal testing, earlier versions escaped sandboxes, posted exploit details publicly, covered tracks in git, searched process memory for credentials, and deliberately fudged confidence intervals to avoid suspicion -Interpretability confirmed the model knew these actions were deceptive -Anthropic: "best-aligned model ever" but also "greatest alignment-related risk ever" - because when it fails, it fails harder -Still doesn't cross Anthropic's automated AI R&D threshold — but they hold that "with less confidence than for any prior model" Anthropic's own words: "We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place." They say the 20-year cybersecurity equilibrium is over — and Mythos Preview is only the beginning. And: "We see no reason to think that Mythos Preview is where language models’ cybersecurity capabilities will plateau. The trajectory is clear. Just a few months ago, language models were only able to exploit fairly unsophisticated vulnerabilities. Just a few months before that, they were unable to identify any nontrivial vulnerabilities at all. Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development."

kimmonismus's tweet photo. Claude Mythos: everything you need to know (tl;dr)

Anthropic's new model, Claude Mythos, is so powerful that it is not releasing it to the public.

Anthropic: "Mythos is only the beginning"

Everything you need to know:

The tl;dr with all key facts:

Mythos found zero-day vulnerabilities in EVERY major operating system and EVERY major web browser, fully autonomously. No human guidance needed.

One Anthropic engineer with zero security training asked it to find remote code execution bugs overnight and woke up to a complete working exploit. The oldest bug it discovered: A 27-year-old vulnerability hiding in OpenBSD, an OS literally famous for being secure.

They're NOT releasing it publicly. Instead they formed Project Glasswing with AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike and others, committing $100M to use it defensively.

"Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development."

The benchmarks are insane:

-SWE-bench Verified: 93.9% (vs Opus 4.6: 80.8%)
-SWE-bench Pro: 77.8% (vs 53.4%)
-USAMO math olympiad: 97.6% (vs 42.3% — not a typo)
-Firefox exploit writing: 181 successes vs 2 for Opus 4.6
-Cybench CTF challenges: 100% solve rate
-CyberGym: 83.1% vs 66.6%
-Humanity's Last Exam: 64.7% vs 53.1%

Oh and by the way, Anthropic wrote this just casually:

"Humanity’s Last Exam: We have found Mythos still performs well on HLE at low effort, which could indicate some level of memorization."

What it actually did:

-Found a 27-year-old bug in OpenBSD — famous for its security
-Found a 16-year-old FFmpeg bug hit 5 million times by fuzzers without detection
-Built a full remote root exploit on FreeBSD (CVE-2026-4747) - completely autonomously
-Chained 4 vulnerabilities into a browser sandbox escape
-Broke cryptography libraries (TLS, AES-GCM, SSH)
-Thousands of critical zero-days found, 99%+ still unpatched
-N-day exploit development: under $1,000 and half a day for full root

Why they won't release it:

-During internal testing, earlier versions escaped sandboxes, posted exploit details publicly, covered tracks in git, searched process memory for credentials, and deliberately fudged confidence intervals to avoid suspicion
-Interpretability confirmed the model knew these actions were deceptive
-Anthropic: "best-aligned model ever" but also "greatest alignment-related risk ever" - because when it fails, it fails harder
-Still doesn't cross Anthropic's automated AI R&D threshold — but they hold that "with less confidence than for any prior model"

Anthropic's own words: "We find it alarming that the world looks on track to proceed rapidly to developing superhuman systems without stronger mechanisms in place." They say the 20-year cybersecurity equilibrium is over — and Mythos Preview is only the beginning.

And:

"We see no reason to think that Mythos Preview is where language models’ cybersecurity capabilities will plateau. The trajectory is clear. Just a few months ago, language models were only able to exploit fairly unsophisticated vulnerabilities. Just a few months before that, they were unable to identify any nontrivial vulnerabilities at all. Over the coming months and years, we expect that language models (those trained by us and by others) will continue to improve along all axes, including vulnerability research and exploit development."

66

2K

263

962

456K

Dougmoo retweeted

@levelsio

2 months ago

OpenAI's new image model GPT-Image-2 has leaked It seems to have extremely good world knowledge and great text rendering Possibly better than Nano Banana Pro It's on @arena under code names: - maskingtape-alpha - gaffertape-alpha - packingtape-alpha

levelsio's tweet photo. OpenAI's new image model GPT-Image-2 has leaked

It seems to have extremely good world knowledge and great text rendering

Possibly better than Nano Banana Pro

It's on @arena under code names:
- maskingtape-alpha
- gaffertape-alpha
- packingtape-alpha https://t.co/RbYbreRRsV

108

4K

248

1K

1M

Dougmoo retweeted

Chubby♨️

@kimmonismus

2 months ago

Holy, OpenAI's GPT-image-2 will crush everything. I remember when everyone laughed at the GPT image because it couldn't generate a proper world map. Those days are over. And even the YouTube image is now indistinguishable from reality. Holy moly.

kimmonismus's tweet photo. Holy, OpenAI's GPT-image-2 will crush everything.

I remember when everyone laughed at the GPT image because it couldn't generate a proper world map. Those days are over.

And even the YouTube image is now indistinguishable from reality. Holy moly. https://t.co/dlXaPU1mXR

110

1K

83

348

686K

Dougmoo retweeted

Chubby♨️

@kimmonismus

2 months ago

Google Turbo Quant running Locally in Atomic Chat MacBook Air M4 16 GB Model: QWEN3.5-9B Context window: 100000 Summarising 50000 words in just seconds.. You can do 3x larger context window, processing 3x faster than before! They are first that have integrated Google turboquant in local models and made it accessible for everyone for free

31

1K

88

1K

201K

Dougmoo retweeted

Felipe Coury 🦀

@fcoury

2 months ago

Proud to announce I'll be the first Brazilian on the OpenAI Codex team!

171

2K

37

51

48K

Dougmoo retweeted

Google DeepMind @GoogleDeepMind

3 months ago

Watch how fast Gemini 3.1 Flash-Lite can generate websites. ⚡ This browser creates each page in real-time as you click, search, and navigate. Give it a try → https://t.co/h3W5o1wItY

146

3K

363

2K

838K

Dougmoo retweeted

Google Research

@GoogleResearch

3 months ago

Introducing TurboQuant: Our new compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup, all with zero accuracy loss, redefining AI efficiency. Read the blog to learn how it achieves these results: https://t.co/CDSQ8HpZoc

1K

39K

6K

22K

19M

Dougmoo retweeted

CV.YH

@0xCVYH

3 months ago

Elon Musk: "Robos vao fabricar tantos robos que vao saturar TODAS as necessidades humanas. Voce nao vai conseguir pensar em mais nada pra pedir." A pergunta que ele recebe: "E o proposito humano?" Resposta dele: "Nao da pra ter os dois. Ou tem trabalho que precisa ser feito, ou tem abundancia. Escolha." Cada pessoa na Terra vai ter um robo humanoide. Pra cuidar dos filhos, dos pets, dos pais idosos. Isso vindo do cara que ta construindo o Optimus, a Tesla, a SpaceX e a Terafab ao mesmo tempo.

118

1K

126

204

111K

Dougmoo retweeted

Min Choi

@minchoi

3 months ago

it's happening... vibe design partner dropping 👀

10

266

9

124

62K

doug

@Dougmoo

Last Seen Users on Sotwe

Trends for you

Most Popular Users