Kamba @therealkamba - Twitter Profile

Pinned Tweet

25 days ago

I just shipped Nexara, the capability layer for AI products that need to use real tools without turning every integration into a security project. Give it a try! https://t.co/hf2rXKbYsx https://t.co/7eChw3fkoq For the backstory: https://t.co/twz1LZCPZM

therealkamba's tweet photo. I just shipped Nexara, the capability layer for AI products that need to use real tools without turning every integration into a security project.

Give it a try!

https://t.co/hf2rXKbYsx
https://t.co/7eChw3fkoq

For the backstory:
https://t.co/twz1LZCPZM https://t.co/vwF4PUnlSC

deliberium @deliberiumai

25 days ago

Just open-sourced Nexara - our policy-bound capability runtime for AI products that actually need to use real tools. Signed skill registries, strict runtime policy enforcement, remote discovery, redacted secrets, audit trails, and optional learning signals.. 🧵

deliberiumai's tweet photo. Just open-sourced Nexara - our policy-bound capability runtime for AI products that actually need to use real tools.

Signed skill registries, strict runtime policy enforcement, remote discovery, redacted secrets, audit trails, and optional learning signals..

🧵 https://t.co/AGeZEFDECE

1

0

186

1

0

169

Kamba

@therealkamba

about 8 hours ago

@rezoundous @larreaio They switched focus to wide.

0

1

0

13

Kamba

@therealkamba

about 8 hours ago

Latest AI metrics.

Artificial Analysis

@ArtificialAnlys

about 12 hours ago

Announcing Artificial Analysis Intelligence Index v4.1: a shift toward agentic workloads, featuring upgraded benchmarks and new per-task metrics The Artificial Analysis Intelligence Index is our synthesis metric for assessing model intelligence and tracking AI progress. v4.1 marks a broader shift toward agentic workloads, with three main changes: Updated and reweighted evaluations toward agentic tasks: 1. We upgraded three evaluations, removed one, and reweighted the Intelligence Index: ➤ Upgraded Terminal-Bench Hard to Terminal-Bench 2.1 and τ²-Bench Telecom to τ³-Bench Banking. Both move to newer, more robust task sets with harder, more realistic agentic scenarios that better separate frontier models ➤ Upgraded GDPval-AA to GDPval-AA v2. The upgrade re-baselines Elo to human performance at 1000, introduces a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 for longer-horizon agent trajectories ➤ Removed IFBench due to saturation. The benchmark no longer distinguishes frontier models sufficiently, so we have removed it from the Intelligence Index. We will continue to run it and publish results on new model releases 2. Cost per Task, Time per Task, and Tokens per Task: Three new per-task metrics, reported for every model and based on the Intelligence Index. We take the total cost, total time, and total output tokens for a model to run the Intelligence Index and divide by the number of tasks across its evaluations, giving the average cost, time, and output tokens to complete a single Intelligence Index task 3. Cached input token reporting: We now report cached input tokens and their impact on cost, including the cost to run the Intelligence Index, to better reflect the real cost of running each model Key Results: ➤ Leading models: Claude Fable 5 (with Opus 4.8 fallback, 60) leads the Artificial Analysis Intelligence Index v4.1 by four points but is currently unavailable, leaving Claude Opus 4.8 (max, 56) as the most intelligent available model, ahead of GPT-5.5 (xhigh, 55) ➤ Open weights leading models: Among open weights models, DeepSeek V4 Pro (max, 44) and MiniMax M3 (44) lead, followed by Kimi K2.6 (43) and MiMo-V2.5-Pro (42) ➤Cost per Task: Claude Opus 4.8 (max) is the most expensive available model at $1.78 per task, with Claude Fable 5 the highest overall at $3.25. GPT-5.5 (xhigh) scores within a point of Opus 4.8 on the Intelligence Index at $0.99 per task. DeepSeek V4 Pro (max) stands out on the Intelligence vs Cost per Task chart at $0.04 per task, with other leading proprietary models costing 20x to 45x more ➤Time per Task: time per task (inference decode time) ranges from 1.5 minutes for Grok 4.3 (high) to 13.5 for Claude Sonnet 4.6 (max), a roughly 9x spread. Claude Opus 4.8 (max) completes a task in 6.4 minutes and GPT-5.5 (xhigh) in 3.7, while Gemini 3.1 Pro Preview stands out on the Intelligence vs Time per Task chart at 1.6 minutes for a score of 46

ArtificialAnlys's tweet photo. Announcing Artificial Analysis Intelligence Index v4.1: a shift toward agentic workloads, featuring upgraded benchmarks and new per-task metrics

The Artificial Analysis Intelligence Index is our synthesis metric for assessing model intelligence and tracking AI progress. v4.1 marks a broader shift toward agentic workloads, with three main changes:

Updated and reweighted evaluations toward agentic tasks:
1. We upgraded three evaluations, removed one, and reweighted the Intelligence Index:
➤ Upgraded Terminal-Bench Hard to Terminal-Bench 2.1 and τ²-Bench Telecom to τ³-Bench Banking. Both move to newer, more robust task sets with harder, more realistic agentic scenarios that better separate frontier models
➤ Upgraded GDPval-AA to GDPval-AA v2. The upgrade re-baselines Elo to human performance at 1000, introduces a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 for longer-horizon agent trajectories
➤ Removed IFBench due to saturation. The benchmark no longer distinguishes frontier models sufficiently, so we have removed it from the Intelligence Index. We will continue to run it and publish results on new model releases

2. Cost per Task, Time per Task, and Tokens per Task:
Three new per-task metrics, reported for every model and based on the Intelligence Index. We take the total cost, total time, and total output tokens for a model to run the Intelligence Index and divide by the number of tasks across its evaluations, giving the average cost, time, and output tokens to complete a single Intelligence Index task

3. Cached input token reporting:
We now report cached input tokens and their impact on cost, including the cost to run the Intelligence Index, to better reflect the real cost of running each model

Key Results:
➤ Leading models: Claude Fable 5 (with Opus 4.8 fallback, 60) leads the Artificial Analysis Intelligence Index v4.1 by four points but is currently unavailable, leaving Claude Opus 4.8 (max, 56) as the most intelligent available model, ahead of GPT-5.5 (xhigh, 55) ➤ Open weights leading models: Among open weights models, DeepSeek V4 Pro (max, 44) and MiniMax M3 (44) lead, followed by Kimi K2.6 (43) and MiMo-V2.5-Pro (42)
➤Cost per Task: Claude Opus 4.8 (max) is the most expensive available model at $1.78 per task, with Claude Fable 5 the highest overall at $3.25. GPT-5.5 (xhigh) scores within a point of Opus 4.8 on the Intelligence Index at $0.99 per task. DeepSeek V4 Pro (max) stands out on the Intelligence vs Cost per Task chart at $0.04 per task, with other leading proprietary models costing 20x to 45x more
➤Time per Task: time per task (inference decode time) ranges from 1.5 minutes for Grok 4.3 (high) to 13.5 for Claude Sonnet 4.6 (max), a roughly 9x spread. Claude Opus 4.8 (max) completes a task in 6.4 minutes and GPT-5.5 (xhigh) in 3.7, while Gemini 3.1 Pro Preview stands out on the Intelligence vs Time per Task chart at 1.6 minutes for a score of 46

85

1K

104

322

181K

0

13

Kamba

@therealkamba

about 8 hours ago

@TheAhmadOsman Will your lab be the one to create it?

0

26

Who to follow

about 8 hours ago

@rcanand @TheAhmadOsman Shameless plug? 😉

0

28

Kamba

@therealkamba

about 8 hours ago

@Crypto_Maximal @PolitlcsUK Hmm... You may have to delete this post once you supply your ID. 🤣

0

89

Kamba

@therealkamba

about 9 hours ago

@MsMelChen The identity-theft cyber security attack surface has just been widened, and the chasm increases with every country that adopts these laws. Not to mention that UK adults will need to temper their views. Social media black market will become the hottest new industry!

0

2

1

285

Kamba

@therealkamba

about 15 hours ago

Codex is spinning up GPU servers on Amazon AWS and running some interesting experiments I've prescribed. More on this soon...

0

1

0

18

Kamba

@therealkamba

about 20 hours ago

Opensource AI must win.

0

1

0

5

Kamba

@therealkamba

about 23 hours ago

@Teknium Tried to install the Hermes desktop app on my M1 Pro Mac running Tahoe and the build step failed. Anyone else reported issues getting it to work on their Mac M1s?

0

29

therealkamba retweeted

OpenRouter

@OpenRouter

3 days ago

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

OpenRouter's tweet photo. Introducing the Fusion API, the smartest compound model in the market.

Fusion achieves Fable-level intelligence at half the price.

How it works 👇 https://t.co/OTUQAdTQjU

692

15K

2K

13K

6M

Kamba

@therealkamba

3 days ago

@AnthropicAI @egorkabantsov @davidpattersonx @PeterDiamandis As I was saying.... No abundance and prosperity for all if this sort of thinking continues.

0

15

Kamba

@therealkamba

3 days ago

This is ridiculous. @deepseek_ai You must achieve SOTA.

Anthropic

@AnthropicAI

4 days ago

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: https://t.co/bwn0sximKZ

13K

88K

26K

24K

90M

0

1

0

10

Kamba

@therealkamba

3 days ago

@AnthropicAI @egorkabantsov Is this a joke? This is why Opensource AI has to achieve SOTA. @TheAhmadOsman

0

21

therealkamba retweeted

Mike Alfred

@mikealfred

4 days ago

In all seriousness, this is how it will go. SpaceX will finish the day up. All incentives are structurally aligned around channeling retail-oriented narratives in to short term price bumps. The banks will support the price. Then, months later, the stock will quietly trade lower.

96

1K

53

72

165K

therealkamba retweeted

Kimi.ai @Kimi_Moonshot

4 days ago

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced! 🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates. ⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code. 🔗 Kimi Code: https://t.co/uvoSJKyGCY 🔗 API: https://t.co/EOZkbOwCN4

Kimi_Moonshot's tweet photo. 🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced!

🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite.
🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6.
🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates.

⚡️ 6x High-Speed Mode coming soon!
🔌 Available today via Kimi API and Kimi Code.

🔗 Kimi Code: https://t.co/uvoSJKyGCY
🔗 API: https://t.co/EOZkbOwCN4

622

14K

2K

3K

2M

therealkamba retweeted

Codex Releases

@CodexReleases

5 days ago

Codex app update: Codex app 26.609 What changed: • Rate-limit reset banking added for Plus and Pro users, with one free reset at launch and referral invitations to earn more during the current promotion. • Developer mode for Browser use adds controlled CDP access in Chrome and the in-app browser for performance profiling and deeper debugging. • Browser use is up to 2x faster via CDP and DOM snapshot optimizations that reduce browser round trips. Details in thread.

CodexReleases's tweet photo. Codex app update: Codex app 26.609

What changed:
• Rate-limit reset banking added for Plus and Pro users, with one free reset at launch and referral invitations to earn more during the current promotion.
• Developer mode for Browser use adds controlled CDP access in Chrome and the in-app browser for performance profiling and deeper debugging.
• Browser use is up to 2x faster via CDP and DOM snapshot optimizations that reduce browser round trips.

Details in thread.

23

629

36

106

65K

therealkamba retweeted

OpenAI

@OpenAI

5 days ago

We heard you wanted to use Codex rate limit resets on your own time. Starting today, we’re rolling out the ability to save rate limit resets to use later. We’re starting Go, Plus, Pro, and Business users with one free reset: