Dana

@DanaBuidl

exploring and scaling the frontier. committed to being the change I wish to see. seeking pmf & deep self-awareness. views are my own. NFA.

Joined October 2018

1.4K Following

9.2K Followers

521 Posts

Dana @DanaBuidl

5 days ago

faster. cheaper. better. @Kimi_Moonshot delivers - it's as simple as that.

Kimi.ai @Kimi_Moonshot

6 days ago

🌘 Meet Kimi K2.7 Code HighSpeed! A high-speed mode of our latest open-source multimodal coding model, Kimi K2.7 Code. ⚡️ Up to 6× faster: Around 180 tok/s on coding tasks with median-length inputs, and up to 260 tok/s on shorter-context tasks. 🔷 Rolling out to Kimi Code Beta Program members, Kimi API developers, and Kimi Business users. (Access will remain limited for now due to capacity constraints.) 🔷 No invite needed. Anyone who joins the Beta Program has a chance to get access 👉 https://t.co/eKogsFGJt6 Open intelligence should be instant, affordable, and borderless. We'll continue improving the model and expanding access as more capacity becomes available! 🔗 Kimi Code: https://t.co/uvoSJKyGCY 🔗 API: https://t.co/mzWxjgGO1h

159

325

946

508K

334

Dana @DanaBuidl

5 days ago

“Kimi 2.7 is amazingly good.” 👏

Przemek Chojecki | PC

@prz_chojecki

8 days ago

Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs. Kimi 2.7 is amazingly good. More below.

prz_chojecki's tweet photo. Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh

We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs.

Kimi 2.7 is amazingly good. More below. https://t.co/pD1EFRJbAy

170

559

249

Dana @DanaBuidl

5 days ago

“on ML engineering, the open model Kimi-K2.7-Code surpassed frontier models.” 👏

Zhengyao Jiang

@zhengyaojiang

7 days ago

We benchmarked 7 frontier models on 3 categories of autoresearch tasks: ML engineering, harness/prompt engineering, and algorithmic discovery. Fable-5 won overall even under cost constraint, but on ML engineering, the open model Kimi-K2.7-Code surpassed frontier models.🧵(1/5)

zhengyaojiang's tweet photo. We benchmarked 7 frontier models on 3 categories of autoresearch tasks: ML engineering, harness/prompt engineering, and algorithmic discovery.

Fable-5 won overall even under cost constraint, but on ML engineering, the open model Kimi-K2.7-Code surpassed frontier models.🧵(1/5) https://t.co/KzePspXd0Z

547

331

64K

336

DanaBuidl retweeted

Siyuan

@cyodyssey

27 days ago

In the OpenRouter Model Pricing section, you can see the cache hit rates offered by different providers. The one that shocked me last time was Kimi K2.5 — their official API can achieve a 93%+ cache hit rate. Especially in the era of Agent/long-context, cache hit is a crucial technical optimization point, whether in software or hardware. https://t.co/yIpv32E0Hg

cyodyssey's tweet photo. In the OpenRouter Model Pricing section, you can see the cache hit rates offered by different providers. The one that shocked me last time was Kimi K2.5 — their official API can achieve a 93%+ cache hit rate.

Especially in the era of Agent/long-context, cache hit is a crucial technical optimization point, whether in software or hardware.

https://t.co/yIpv32E0Hg

Who to follow

Here I am, bloodied but not beaten,Down but not out,From this point on I will be the master of my destiny.🏴‍☠️ Hustle Your Mind

about 2 months ago

👀

a16z crypto @a16zcrypto

about 2 months ago

The new stack for global finance, mapped 👇

169

369

388K

687

Dana @DanaBuidl

about 2 months ago

open *and great* is a different league. the "firsts" are already rolling in. expect more to come soon 👀

Notion

@NotionHQ

2 months ago

ICYMI: Kimi K2.6 is now in Notion. It’s the first open-weight model that plays like other top-tier models. It’s strong at tool use and carrying out tasks no matter how meticulous or ambiguous. Give it a spin and tell us what you think.

NotionHQ's tweet photo. ICYMI: Kimi K2.6 is now in Notion.

It’s the first open-weight model that plays like other top-tier models. It’s strong at tool use and carrying out tasks no matter how meticulous or ambiguous.

Give it a spin and tell us what you think. https://t.co/K7Vs2dYgDC

496

44K

528

DanaBuidl retweeted

Artificial Analysis

@ArtificialAnlys

about 2 months ago

DeepSeek is back among the leading open weights models with the release of DeepSeek V4 Pro and V4 Flash, with V4 Pro second only to Kimi K2.6 on the Artificial Analysis Intelligence Index @deepseek_ai has released DeepSeek V4 Pro and V4 Flash. V4 is the first new architecture from DeepSeek since V3. V4 introduces a new architecture with V4 Pro at 1.6T total / 49B active parameters and V4 Flash at 284B total / 13B active parameters, and is DeepSeek's first two-tier lineup, with Pro positioned for maximum capability and Flash for faster, lower-cost inference. Both models are hybrid thinking/non-thinking. We tested the reasoning variants in Max Effort and High Effort. A year ago, DeepSeek R1 and R1 0528 were the leading open weights reasoning models on the Intelligence Index. Since then, several other open weights labs have released strong reasoning models, and V4 Pro now enters as the #2 open weights reasoning model on the Artificial Analysis Intelligence Index, behind only Kimi K2.6 (54). V4 Pro and V4 Flash remain also text input and output only. Key takeaways: ➤ Large 10 point gain in Intelligence Index: DeepSeek V4 Pro (Max) scores 52 on the Artificial Analysis Intelligence Index, up from 42 for V3.2, making it the #2 open weights reasoning model behind Kimi K2.6. That said, if the weights to MiMo-V2.5-Pro are released as they have been for other Xiaomi models then it will place third. V4 Flash (Max) scores 47, behind V4 Pro but ahead of DeepSeek V3.2. This places it behind frontier models and at Claude Sonnet 4.6 (max) level intelligence. ➤ Leading agentic performance among open weights models: DeepSeek V4 Pro (Max) leads open weight models on agentic real-world work tasks, scoring 1554 on GDPval-AA. This places it ahead of Kimi K2.6 (1484), GLM-5.1 (1535), GLM-5 (1402), and MiniMax-M2.7 (1514). ➤ Gains in knowledge but an increase in hallucination rate: DeepSeek V4 Pro (Max) scores -10 on AA-Omniscience, an 11 point improvement over V3.2 (Reasoning, -21), driven primarily by higher accuracy. V4 Flash (Max) scores -23, broadly in line with V3.2. V4 Pro and V4 Flash both have a very high hallucination rate of 94% and 96% respectively meaning when they don’t know the answer they nearly always respond anyway. ➤ Flash materially behind Pro but well positioned for its size: DeepSeek V4 Flash (Max) scores 47 on the Artificial Analysis Intelligence Index, well below V4 Pro. However, at 284B parameters it is much smaller and is well positioned on the Intelligence vs Size frontier, sitting next to MiniMax-M2.7 ➤ Cheaper than frontier models but more expensive than other open weights models and a large increase over DeepSeek V3.2: DeepSeek V4 Pro costs $1,071 to run the Artificial Analysis Intelligence Index. This makes it more than 4x cheaper than Claude Opus 4.7 ($4,811), but it remains more expensive than several other open weights models, including Kimi K2.6 ($948), GLM-5.1 ($544), DeepSeek V3.2 ($71), and gpt-oss-120B ($67). DeepSeek V4 Flash is much cheaper at $113. ➤ High token usage: DeepSeek V4 Pro uses 190M output tokens to run the Artificial Analysis Intelligence Index, making it one of the most token-intensive models tested. DeepSeek V4 Flash is even higher at 240M output tokens. This high token usage helps explain why V4 Pro’s total cost remains relatively high versus other open weights models despite low per-token pricing. Key model details: ➤ Context window: 1M tokens, an 8x expansion on V3.2's 128K context window ➤ Modalities: Text input and output only, matching V3.2 ➤ Size: DeepSeek V4 Pro 1.6T total / 49B active; V4 Flash 284B total / 13B active ➤ License: MIT ➤ Availability: Available on DeepSeek's first-party API; we expect many third-party providers to host the models ➤ Pricing: DeepSeek V4 Pro $1.74 / $3.48 per 1M input/output tokens; V4 Flash $0.14 / $0.28 per 1M input/output tokens. Cache hit input token pricing is $0.145 for V4 Pro and $0.028 for V4 Flash per 1M tokens. V4 Pro is significantly more expensive than past DeepSeek R1 and V3 models

ArtificialAnlys's tweet photo. DeepSeek is back among the leading open weights models with the release of DeepSeek V4 Pro and V4 Flash, with V4 Pro second only to Kimi K2.6 on the Artificial Analysis Intelligence Index

@deepseek_ai has released DeepSeek V4 Pro and V4 Flash. V4 is the first new architecture from DeepSeek since V3. V4 introduces a new architecture with V4 Pro at 1.6T total / 49B active parameters and V4 Flash at 284B total / 13B active parameters, and is DeepSeek's first two-tier lineup, with Pro positioned for maximum capability and Flash for faster, lower-cost inference. Both models are hybrid thinking/non-thinking. We tested the reasoning variants in Max Effort and High Effort.

A year ago, DeepSeek R1 and R1 0528 were the leading open weights reasoning models on the Intelligence Index. Since then, several other open weights labs have released strong reasoning models, and V4 Pro now enters as the #2 open weights reasoning model on the Artificial Analysis Intelligence Index, behind only Kimi K2.6 (54). V4 Pro and V4 Flash remain also text input and output only.

Key takeaways:

➤ Large 10 point gain in Intelligence Index: DeepSeek V4 Pro (Max) scores 52 on the Artificial Analysis Intelligence Index, up from 42 for V3.2, making it the #2 open weights reasoning model behind Kimi K2.6. That said, if the weights to MiMo-V2.5-Pro are released as they have been for other Xiaomi models then it will place third. V4 Flash (Max) scores 47, behind V4 Pro but ahead of DeepSeek V3.2. This places it behind frontier models and at Claude Sonnet 4.6 (max) level intelligence.

➤ Leading agentic performance among open weights models: DeepSeek V4 Pro (Max) leads open weight models on agentic real-world work tasks, scoring 1554 on GDPval-AA. This places it ahead of Kimi K2.6 (1484), GLM-5.1 (1535), GLM-5 (1402), and MiniMax-M2.7 (1514).

➤ Gains in knowledge but an increase in hallucination rate: DeepSeek V4 Pro (Max) scores -10 on AA-Omniscience, an 11 point improvement over V3.2 (Reasoning, -21), driven primarily by higher accuracy. V4 Flash (Max) scores -23, broadly in line with V3.2. V4 Pro and V4 Flash both have a very high hallucination rate of 94% and 96% respectively meaning when they don’t know the answer they nearly always respond anyway.

➤ Flash materially behind Pro but well positioned for its size: DeepSeek V4 Flash (Max) scores 47 on the Artificial Analysis Intelligence Index, well below V4 Pro. However, at 284B parameters it is much smaller and is well positioned on the Intelligence vs Size frontier, sitting next to MiniMax-M2.7

➤ Cheaper than frontier models but more expensive than other open weights models and a large increase over DeepSeek V3.2: DeepSeek V4 Pro costs $1,071 to run the Artificial Analysis Intelligence Index. This makes it more than 4x cheaper than Claude Opus 4.7 ($4,811), but it remains more expensive than several other open weights models, including Kimi K2.6 ($948), GLM-5.1 ($544), DeepSeek V3.2 ($71), and gpt-oss-120B ($67). DeepSeek V4 Flash is much cheaper at $113.

➤ High token usage: DeepSeek V4 Pro uses 190M output tokens to run the Artificial Analysis Intelligence Index, making it one of the most token-intensive models tested. DeepSeek V4 Flash is even higher at 240M output tokens. This high token usage helps explain why V4 Pro’s total cost remains relatively high versus other open weights models despite low per-token pricing.

Key model details:
➤ Context window: 1M tokens, an 8x expansion on V3.2's 128K context window
➤ Modalities: Text input and output only, matching V3.2
➤ Size: DeepSeek V4 Pro 1.6T total / 49B active; V4 Flash 284B total / 13B active
➤ License: MIT
➤ Availability: Available on DeepSeek's first-party API; we expect many third-party providers to host the models
➤ Pricing: DeepSeek V4 Pro $1.74 / $3.48 per 1M input/output tokens; V4 Flash $0.14 / $0.28 per 1M input/output tokens. Cache hit input token pricing is $0.145 for V4 Pro and $0.028 for V4 Flash per 1M tokens. V4 Pro is significantly more expensive than past DeepSeek R1 and V3 models

614

92K

DanaBuidl retweeted

Chaofan Shou

@Fried_rice

about 2 months ago

Chinese LLMs can hack better than state-sponsored hackers with properly evolved harness - Kimi K2.5 managed to find and exploit 6 vulnerabilities in browsers: a single page view or an extension install by victims equal full system hijack. Check https://t.co/d0SZSf1KqF

Fried_rice's tweet photo. Chinese LLMs can hack better than state-sponsored hackers with properly evolved harness -

Kimi K2.5 managed to find and exploit 6 vulnerabilities in browsers: a single page view or an extension install by victims equal full system hijack.

Check https://t.co/d0SZSf1KqF https://t.co/UbD7QR45l0

413

189K

Dana @DanaBuidl

about 2 months ago

- super flat - resilient - intentionally understaffed - to attract and retain people who are essentially a highly intelligent base model + superior agentic capabilities with ample tools and skills sounds so familiar 🤭 @Kimi_Moonshot what do you think

Chamath Palihapitiya

@chamath

2 months ago

Spoke at the AI club at Stanford last night. 1000 people tried to attend. Seating was capped at 250. It was pandemonium at the end! If you’re a resilient, gritty engineer, PM, designer or GTM person, please consider working with us: - We have no org chart - everyone reports to me. We do this to minimize politics, titles and force natural leaders to self organize. - We are severely under manned for the work we have (by design) so you are forced to engineer your way out. Build solutions not orgs. - We will book nine figures this year and are growing very quickly. Our customers span all major parts of the US Economy. [email protected]

chamath's tweet photo. Spoke at the AI club at Stanford last night. 1000 people tried to attend. Seating was capped at 250.

It was pandemonium at the end! If you’re a resilient, gritty engineer, PM, designer or GTM person, please consider working with us:

- We have no org chart - everyone reports to me. We do this to minimize politics, titles and force natural leaders to self organize.
- We are severely under manned for the work we have (by design) so you are forced to engineer your way out. Build solutions not orgs.
- We will book nine figures this year and are growing very quickly. Our customers span all major parts of the US Economy.

Hiring@8090.ai

104

626

355K

547

DanaBuidl retweeted

Mario Zechner

@badlogicgames

about 2 months ago

been working with @Kimi_Moonshot K2.6 for the past 2 hours "accidentially" and ... it's really good!

831

100K

Dana @DanaBuidl

about 2 months ago

.@Kimi_Moonshot k2.6, once again, shows that open model can *actually deliver* such great aesthetic tastes

349

Dana @DanaBuidl

about 2 months ago

robinhood day 1 listing. probably nothing 🤭 @0xZergs what do you think? I think previous pivots were all worth it, and with that level of founder resilience, https://t.co/oYrD6rZiU3 team has a real shot at building something truly great.

USD.AI

@USDai_Official

2 months ago

Finance Progress. $CHIP is live on @RobinhoodApp.

187

31K

571

DanaBuidl retweeted

Artificial Analysis

@ArtificialAnlys

2 months ago

Moonshot’s Kimi K2.6 is the new leading open weights model. Kimi K2.6 lands at #4 on the Artificial Analysis Intelligence Index (54) behind only Anthropic, Google, and OpenAI (all 57) Key takeaways: ➤ Increase in performance on agentic tasks: @Kimi_Moonshot's Kimi K2.6 achieves an Elo of 1520 on our GDPval-AA evaluation, which is a marked improvement over Kimi K2.5’s Elo of 1309. GDPval-AA is our leading metric for general agentic performance, measuring the performance on knowledge work tasks such as preparing presentations and analysis. Models are given code execution and web browsing tools in an agentic loop via our open source reference agentic harness called Stirrup. This continues Kimi K2.6’s strength in tool use, maintaining a 96% score on τ²-Bench Telecom, placing it among other frontier models in this category. ➤ Low hallucination rate: Kimi K2.5 scores 6 on the AA-Omniscience Index, our knowledge evaluation measuring both accuracy and hallucination rate. This score is primarily driven by a comparatively low hallucination rate of 39% (reduced from Kimi K2.5’s 65%), indicating a greater capability to abstain rather than fabricate knowledge when the model is uncertain. Kimi K2.6’s low hallucination rate places it similarly to other models such as Claude Opus 4.7 (36%) and MiniMax-M2.7 (34%) ➤ High token usage: Kimi K2.6 demonstrates high token usage, but is in line with other frontier models in the same intelligence tier. To run the full Artificial Analysis Intelligence Index, Kimi K2.6 used ~160M reasoning tokens. This is slightly lower than Claude Sonnet 4.6 (~190M reasoning tokens) but much higher than GPT 5.4 (~110M reasoning tokens). ➤ Open weights: Kimi K2.6 is a Mixture-of-Experts (MoE) model with 1T total parameters and 32B active, same as the previous two generations of models Kimi K2 Thinking and Kimi K2.5. Kimi K2.6 again pushes the open weights frontier in intelligence. ➤ Third Party Access: Kimi K2.6 is accessible through Moonshot’s First Party API as well as third party API providers Novita, Baseten, Fireworks, and Parasail ➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Further analysis in the threads below.

ArtificialAnlys's tweet photo. Moonshot’s Kimi K2.6 is the new leading open weights model. Kimi K2.6 lands at #4 on the Artificial Analysis Intelligence Index (54) behind only Anthropic, Google, and OpenAI (all 57)

Key takeaways:

➤ Increase in performance on agentic tasks: @Kimi_Moonshot's Kimi K2.6 achieves an Elo of 1520 on our GDPval-AA evaluation, which is a marked improvement over Kimi K2.5’s Elo of 1309. GDPval-AA is our leading metric for general agentic performance, measuring the performance on knowledge work tasks such as preparing presentations and analysis. Models are given code execution and web browsing tools in an agentic loop via our open source reference agentic harness called Stirrup. This continues Kimi K2.6’s strength in tool use, maintaining a 96% score on τ²-Bench Telecom, placing it among other frontier models in this category.

➤ Low hallucination rate: Kimi K2.5 scores 6 on the AA-Omniscience Index, our knowledge evaluation measuring both accuracy and hallucination rate. This score is primarily driven by a comparatively low hallucination rate of 39% (reduced from Kimi K2.5’s 65%), indicating a greater capability to abstain rather than fabricate knowledge when the model is uncertain. Kimi K2.6’s low hallucination rate places it similarly to other models such as Claude Opus 4.7 (36%) and MiniMax-M2.7 (34%)

➤ High token usage: Kimi K2.6 demonstrates high token usage, but is in line with other frontier models in the same intelligence tier. To run the full Artificial Analysis Intelligence Index, Kimi K2.6 used ~160M reasoning tokens. This is slightly lower than Claude Sonnet 4.6 (~190M reasoning tokens) but much higher than GPT 5.4 (~110M reasoning tokens).

➤ Open weights: Kimi K2.6 is a Mixture-of-Experts (MoE) model with 1T total parameters and 32B active, same as the previous two generations of models Kimi K2 Thinking and Kimi K2.5. Kimi K2.6 again pushes the open weights frontier in intelligence.

➤ Third Party Access: Kimi K2.6 is accessible through Moonshot’s First Party API as well as third party API providers Novita, Baseten, Fireworks, and Parasail

➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k.

Further analysis in the threads below.

131

166

211K

DanaBuidl retweeted

Yutong

@gnotuy

2 months ago

We open sourced Kimi K2.6. The next frontier in test-time compute isn't bigger models. It's better organizations of intelligence. The hardest things were never built by one person. They require coordination. Different skills, different contexts, different minds arguing until something better emerges.

931

118

44K

DanaBuidl retweeted

Ejaaz

@cryptopunk7213

2 months ago

dear god lol - new kimi model is a fucking beast. GPT 5.4 level coding, 76% cheaper than opus 4.7 and 100% open source / free to use, i mean look at this: > kimi k2.6 can code continuously for 12 hours straight, run 300+ agents in parallel from a SINGLE prompt. how the fuck is china doing this? every 3 months theres a new open model that gets closer to the best claude/GPT models. deepseek also releasing this week. open source models have caught up and china of all countries are leading it!

162

205

584K

DanaBuidl retweeted

Yulun Du

@Yulun_Du

2 months ago

Kimi K2.6 helped us rewrite kernels; it worked like a charm :) https://t.co/JFPvd0FqGY

197

13K

DanaBuidl retweeted

Kai

@real_kai42

2 months ago

Artworks by K2.6, not the artist.

11K

DanaBuidl retweeted

elie

@eliebakouch

2 months ago

kimi K2.6 vs K2.5, mythos, opus 4.7, and cursor composer 2 (based on K2.5) on every benchmark i could find tl;dr: it's a really really good model

eliebakouch's tweet photo. kimi K2.6 vs K2.5, mythos, opus 4.7, and cursor composer 2 (based on K2.5) on every benchmark i could find

tl;dr: it's a really really good model https://t.co/7rtKCSfaNO

134

507

150K

DanaBuidl retweeted

Lisan al Gaib

@scaling01

2 months ago

Moonshot AI: "Our RL infra team used a K2.6-backed agent that operated autonomously for 5 days, managing monitoring, incident response, and system operations, demonstrating persistent context, multi-threaded task handling, and full-cycle execution from alert to resolution"

scaling01's tweet photo. Moonshot AI:

"Our RL infra team used a K2.6-backed agent that operated autonomously for 5 days, managing monitoring, incident response, and system operations, demonstrating persistent context, multi-threaded task handling, and full-cycle execution from alert to resolution" https://t.co/lQz70gOKd1

300

20K

DanaBuidl retweeted

Crystal

@crystalsssup

2 months ago

Zig is a result of generalization, because it's a highly niche programming language. When AI starts to generalize, it really creates that 'aha' moment

crystalsssup's tweet photo. Zig is a result of generalization, because it's a highly niche programming language. When AI starts to generalize, it really creates that 'aha' moment https://t.co/mgah8C1acA

705

193

199K

Dana

@DanaBuidl

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users