Chris Clark @cclark - Twitter Profile

Chris Clark

@cclark

7 days ago

@mmurph @Forbes Congrats Matt! Well deserved and we are lucky to have you involved at OpenRouter!

0

2

0

66

Chris Clark

@cclark

7 days ago

@HotAisle The 'open' originates from our belief that empirical usage is the best benchmark; since inception we have published rankings and data sharing how tokens are being routed across the platform to help builders make decisions about what models to use.

1

2

0

41

Chris Clark

@cclark

8 days ago

Companies are very excited to have managers with (thanks to AI) a zillion direct reports. Do we think that is actually *better*, or is it just *cheaper*?

0

3

0

121

Chris Clark

@cclark

15 days ago

got this on my LG smart fridge this morning (2027)

0

3

0

165

Chris Clark

@cclark

22 days ago

When I’m king, gate numbers will always correspond to how far away they are from the terminal entrance

0

3

0

160

Chris Clark

@cclark

26 days ago

This meeting is a waste of my tokens

0

2

0

138

cclark retweeted

roon

@tszzl

about 1 month ago

people are walking around with their laptops slightly ajar to keep their agents running

511

5K

198

269

747K

Chris Clark

@cclark

about 1 month ago

We built a great AI writing detector but unfortunately it’s not very scalable. @pingToven can only read so much :(

1

5

1

0

425

cclark retweeted

Artificial Analysis

@ArtificialAnlys

about 1 month ago

Moonshot’s Kimi K2.6 is the new leading open weights model. Kimi K2.6 lands at #4 on the Artificial Analysis Intelligence Index (54) behind only Anthropic, Google, and OpenAI (all 57) Key takeaways: ➤ Increase in performance on agentic tasks: @Kimi_Moonshot's Kimi K2.6 achieves an Elo of 1520 on our GDPval-AA evaluation, which is a marked improvement over Kimi K2.5’s Elo of 1309. GDPval-AA is our leading metric for general agentic performance, measuring the performance on knowledge work tasks such as preparing presentations and analysis. Models are given code execution and web browsing tools in an agentic loop via our open source reference agentic harness called Stirrup. This continues Kimi K2.6’s strength in tool use, maintaining a 96% score on τ²-Bench Telecom, placing it among other frontier models in this category. ➤ Low hallucination rate: Kimi K2.5 scores 6 on the AA-Omniscience Index, our knowledge evaluation measuring both accuracy and hallucination rate. This score is primarily driven by a comparatively low hallucination rate of 39% (reduced from Kimi K2.5’s 65%), indicating a greater capability to abstain rather than fabricate knowledge when the model is uncertain. Kimi K2.6’s low hallucination rate places it similarly to other models such as Claude Opus 4.7 (36%) and MiniMax-M2.7 (34%) ➤ High token usage: Kimi K2.6 demonstrates high token usage, but is in line with other frontier models in the same intelligence tier. To run the full Artificial Analysis Intelligence Index, Kimi K2.6 used ~160M reasoning tokens. This is slightly lower than Claude Sonnet 4.6 (~190M reasoning tokens) but much higher than GPT 5.4 (~110M reasoning tokens). ➤ Open weights: Kimi K2.6 is a Mixture-of-Experts (MoE) model with 1T total parameters and 32B active, same as the previous two generations of models Kimi K2 Thinking and Kimi K2.5. Kimi K2.6 again pushes the open weights frontier in intelligence. ➤ Third Party Access: Kimi K2.6 is accessible through Moonshot’s First Party API as well as third party API providers Novita, Baseten, Fireworks, and Parasail ➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Further analysis in the threads below.

ArtificialAnlys's tweet photo. Moonshot’s Kimi K2.6 is the new leading open weights model. Kimi K2.6 lands at #4 on the Artificial Analysis Intelligence Index (54) behind only Anthropic, Google, and OpenAI (all 57)

Key takeaways:

➤ Increase in performance on agentic tasks: @Kimi_Moonshot's Kimi K2.6 achieves an Elo of 1520 on our GDPval-AA evaluation, which is a marked improvement over Kimi K2.5’s Elo of 1309. GDPval-AA is our leading metric for general agentic performance, measuring the performance on knowledge work tasks such as preparing presentations and analysis. Models are given code execution and web browsing tools in an agentic loop via our open source reference agentic harness called Stirrup. This continues Kimi K2.6’s strength in tool use, maintaining a 96% score on τ²-Bench Telecom, placing it among other frontier models in this category.

➤ Low hallucination rate: Kimi K2.5 scores 6 on the AA-Omniscience Index, our knowledge evaluation measuring both accuracy and hallucination rate. This score is primarily driven by a comparatively low hallucination rate of 39% (reduced from Kimi K2.5’s 65%), indicating a greater capability to abstain rather than fabricate knowledge when the model is uncertain. Kimi K2.6’s low hallucination rate places it similarly to other models such as Claude Opus 4.7 (36%) and MiniMax-M2.7 (34%)

➤ High token usage: Kimi K2.6 demonstrates high token usage, but is in line with other frontier models in the same intelligence tier. To run the full Artificial Analysis Intelligence Index, Kimi K2.6 used ~160M reasoning tokens. This is slightly lower than Claude Sonnet 4.6 (~190M reasoning tokens) but much higher than GPT 5.4 (~110M reasoning tokens).

➤ Open weights: Kimi K2.6 is a Mixture-of-Experts (MoE) model with 1T total parameters and 32B active, same as the previous two generations of models Kimi K2 Thinking and Kimi K2.5. Kimi K2.6 again pushes the open weights frontier in intelligence.

➤ Third Party Access: Kimi K2.6 is accessible through Moonshot’s First Party API as well as third party API providers Novita, Baseten, Fireworks, and Parasail

➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k.

Further analysis in the threads below.

31

1K

130

165

211K

Chris Clark

@cclark

about 2 months ago

See ya later calculator. In a while, numberphile.

0

2

0

166

Chris Clark

@cclark

2 months ago

@aviel Is this like an elaborate way of saying that I can see right through your bullshit?

1

0

78

Chris Clark

@cclark

2 months ago

Seems like when most people talk about ARR they really mean ARRR - annualized revenue run-rate. I propose we start using this metric more broadly, and that we distinguish between it and ARR by saying ARRR in a pirate voice.

1

2

0

487

Chris Clark

@cclark

2 months ago

The commercial relationship between the labs and clouds is also apples and oranges. The OpenAI/Azure relationship is an IP licensing agreement with a rev share, whereas the Anthropic/Hyperscaler relationships look nothing like that. Therefore they are not accounted for the same way.

0

1

0

308

Chris Clark

@cclark

3 months ago

@5rb6jj7wtx @deedydas For sure - but still interesting. The fact that it is written directly means eg you could chuck autoresearch at it 👀 @deedydas

1

0

137

Chris Clark

@cclark

3 months ago

thanks to coding agents it's never been easier to get started, and never been harder to get finished.

0

6

1

0

355

Chris Clark

@cclark

3 months ago

In a moment of frustration, I banned my 8-year-old from saying “I’m bored” and he now has to say “time to figure out a new activity” and it’s been weirdly effective. Also I’ve threatened to take away dessert if he says it. That also is def part of the success recipe.

0

3

0

160

Chris Clark

@cclark

3 months ago

Looks great! I have not read the Chinmayananda version, but I have the Easwaran translation of the Gita and it seems more approachable. Not sure if it's public domain though. Chinmayananda: What did the sons of Pandu and also my people do when, desirous to fight, they assembled together on the holy plain of Kurukshetra, O Sanjaya? Easwaran: O Sanjaya, tell me what happened at Kurukshetra, the field of dharma, where my family and the Pandavas gathered to fight.

1

3

0

1

538

Chris Clark

@cclark

3 months ago

If the east wing ballroom had been constructed by Obama, what kind of impact would that have had on the plot of White House Down?

0

2

0

134

Chris Clark

@cclark

Last Seen Users on Sotwe

Trends for you

Most Popular Users