DZ | Metir AI @SuitToSweats - Twitter Profile

Pinned Tweet

8 days ago

Your Agentic OS for Knowledge Work Move away from chat and into execution with @MetirHQ No more siloed systems, separate subscriptions and infrastructure that is locked in.

SuitToSweats's tweet photo. Your Agentic OS for Knowledge Work

Move away from chat and into execution with @MetirHQ

No more siloed systems, separate subscriptions and infrastructure that is locked in. https://t.co/ektMLvHI81

1

0

22

DZ | Metir AI @SuitToSweats

5 days ago

@RalphEcom I have a skill that references other skills. Didn’t think we needed some new architecture..?

0

3

SuitToSweats retweeted

Polymarket

@Polymarket

6 days ago

BREAKING: The NSA confirms Mythos “broke into almost all of our classified systems, not in weeks, but in hours”

1K

34K

3K

5K

7M

DZ | Metir AI @SuitToSweats

6 days ago

@Siddhos @theo Scheduled post :)

1

0

173

DZ | Metir AI @SuitToSweats

7 days ago

@pmarca It’s not you… it’s me.

0

1

0

119

SuitToSweats retweeted

Marc Andreessen 🇺🇸

@pmarca

8 days ago

Interesting. (Founder of https://t.co/PY9jRPdR6L, creator of GLM AI models.)

206

6K

322

943

652K

DZ | Metir AI @SuitToSweats

8 days ago

SuitToSweats's tweet photo. https://t.co/BEBNfvW78m

0

13

DZ | Metir AI @SuitToSweats

8 days ago

Your Agentic OS for Knowledge Work Move away from chat and into execution with @MetirHQ No more siloed systems, separate subscriptions and infrastructure that is locked in.

1

0

22

SuitToSweats retweeted

Metir @MetirHQ

9 days ago

GLM 5.2 Max (https://t.co/tyLySGoqoM) ranks 3rd in the world on agentic ability — ahead of GPT 5.5 & Gemini, behind only Opus 4.8 & Fable 5. At a fraction of the cost. On metir that's 5× the usage of Claude, 2.6× of OpenAI. Live for everyone now → https://t.co/sTayUTJsa6

$MetirHQ's tweet photo. GLM 5.2 Max (https://t.co/tyLySGoqoM) ranks 3rd in the world on agentic ability — ahead of GPT 5.5 & Gemini, behind only Opus 4.8 & Fable 5. At a fraction of the cost. On metir that's 5× the usage of Claude, 2.6× of OpenAI. Live for everyone now → https://t.co/sTayUTJsa6 https://t.co/35RssWqGAV$

0

1

0

15

SuitToSweats retweeted

Metir @MetirHQ

9 days ago

GLM 5.2 Max (https://t.co/tyLySGoqoM) ranks 3rd in the world on agentic ability — ahead of GPT 5.5 & Gemini, behind only Opus 4.8 & Fable 5. At a fraction of the cost. On metir that's 5× the usage of Claude, 2.6× of OpenAI. Live for everyone now → https://t.co/sTayUTJsa6

$MetirHQ's tweet photo. GLM 5.2 Max (https://t.co/tyLySGoqoM) ranks 3rd in the world on agentic ability — ahead of GPT 5.5 & Gemini, behind only Opus 4.8 & Fable 5. At a fraction of the cost. On metir that's 5× the usage of Claude, 2.6× of OpenAI. Live for everyone now → https://t.co/sTayUTJsa6 https://t.co/BEoUKqlOxx$

0

1

0

20

DZ | Metir AI @SuitToSweats

10 days ago

Hands down the most important webpage on the internet as we build production agents that don’t bankrupt your customers

Artificial Analysis

@ArtificialAnlys

11 days ago

Announcing Artificial Analysis Intelligence Index v4.1: a shift toward agentic workloads, featuring upgraded benchmarks and new per-task metrics The Artificial Analysis Intelligence Index is our synthesis metric for assessing model intelligence and tracking AI progress. v4.1 marks a broader shift toward agentic workloads, with three main changes: Updated and reweighted evaluations toward agentic tasks: 1. We upgraded three evaluations, removed one, and reweighted the Intelligence Index: ➤ Upgraded Terminal-Bench Hard to Terminal-Bench 2.1 and τ²-Bench Telecom to τ³-Bench Banking. Both move to newer, more robust task sets with harder, more realistic agentic scenarios that better separate frontier models ➤ Upgraded GDPval-AA to GDPval-AA v2. The upgrade re-baselines Elo to human performance at 1000, introduces a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 for longer-horizon agent trajectories ➤ Removed IFBench due to saturation. The benchmark no longer distinguishes frontier models sufficiently, so we have removed it from the Intelligence Index. We will continue to run it and publish results on new model releases 2. Cost per Task, Time per Task, and Tokens per Task: Three new per-task metrics, reported for every model and based on the Intelligence Index. We take the total cost, total time, and total output tokens for a model to run the Intelligence Index and divide by the number of tasks across its evaluations, giving the average cost, time, and output tokens to complete a single Intelligence Index task 3. Cached input token reporting: We now report cached input tokens and their impact on cost, including the cost to run the Intelligence Index, to better reflect the real cost of running each model Key Results: ➤ Leading models: Claude Fable 5 (with Opus 4.8 fallback, 60) leads the Artificial Analysis Intelligence Index v4.1 by four points but is currently unavailable, leaving Claude Opus 4.8 (max, 56) as the most intelligent available model, ahead of GPT-5.5 (xhigh, 55) ➤ Open weights leading models: Among open weights models, DeepSeek V4 Pro (max, 44) and MiniMax M3 (44) lead, followed by Kimi K2.6 (43) and MiMo-V2.5-Pro (42) ➤Cost per Task: Claude Opus 4.8 (max) is the most expensive available model at $1.78 per task, with Claude Fable 5 the highest overall at $3.25. GPT-5.5 (xhigh) scores within a point of Opus 4.8 on the Intelligence Index at $0.99 per task. DeepSeek V4 Pro (max) stands out on the Intelligence vs Cost per Task chart at $0.04 per task, with other leading proprietary models costing 20x to 45x more ➤Time per Task: time per task (inference decode time) ranges from 1.5 minutes for Grok 4.3 (high) to 13.5 for Claude Sonnet 4.6 (max), a roughly 9x spread. Claude Opus 4.8 (max) completes a task in 6.4 minutes and GPT-5.5 (xhigh) in 3.7, while Gemini 3.1 Pro Preview stands out on the Intelligence vs Time per Task chart at 1.6 minutes for a score of 46

ArtificialAnlys's tweet photo. Announcing Artificial Analysis Intelligence Index v4.1: a shift toward agentic workloads, featuring upgraded benchmarks and new per-task metrics

The Artificial Analysis Intelligence Index is our synthesis metric for assessing model intelligence and tracking AI progress. v4.1 marks a broader shift toward agentic workloads, with three main changes:

Updated and reweighted evaluations toward agentic tasks:
1. We upgraded three evaluations, removed one, and reweighted the Intelligence Index:
➤ Upgraded Terminal-Bench Hard to Terminal-Bench 2.1 and τ²-Bench Telecom to τ³-Bench Banking. Both move to newer, more robust task sets with harder, more realistic agentic scenarios that better separate frontier models
➤ Upgraded GDPval-AA to GDPval-AA v2. The upgrade re-baselines Elo to human performance at 1000, introduces a rotating panel of frontier-model judges, and raises the turn limit from 100 to 250 for longer-horizon agent trajectories
➤ Removed IFBench due to saturation. The benchmark no longer distinguishes frontier models sufficiently, so we have removed it from the Intelligence Index. We will continue to run it and publish results on new model releases

2. Cost per Task, Time per Task, and Tokens per Task:
Three new per-task metrics, reported for every model and based on the Intelligence Index. We take the total cost, total time, and total output tokens for a model to run the Intelligence Index and divide by the number of tasks across its evaluations, giving the average cost, time, and output tokens to complete a single Intelligence Index task

3. Cached input token reporting:
We now report cached input tokens and their impact on cost, including the cost to run the Intelligence Index, to better reflect the real cost of running each model

Key Results:
➤ Leading models: Claude Fable 5 (with Opus 4.8 fallback, 60) leads the Artificial Analysis Intelligence Index v4.1 by four points but is currently unavailable, leaving Claude Opus 4.8 (max, 56) as the most intelligent available model, ahead of GPT-5.5 (xhigh, 55) ➤ Open weights leading models: Among open weights models, DeepSeek V4 Pro (max, 44) and MiniMax M3 (44) lead, followed by Kimi K2.6 (43) and MiMo-V2.5-Pro (42)
➤Cost per Task: Claude Opus 4.8 (max) is the most expensive available model at $1.78 per task, with Claude Fable 5 the highest overall at $3.25. GPT-5.5 (xhigh) scores within a point of Opus 4.8 on the Intelligence Index at $0.99 per task. DeepSeek V4 Pro (max) stands out on the Intelligence vs Cost per Task chart at $0.04 per task, with other leading proprietary models costing 20x to 45x more
➤Time per Task: time per task (inference decode time) ranges from 1.5 minutes for Grok 4.3 (high) to 13.5 for Claude Sonnet 4.6 (max), a roughly 9x spread. Claude Opus 4.8 (max) completes a task in 6.4 minutes and GPT-5.5 (xhigh) in 3.7, while Gemini 3.1 Pro Preview stands out on the Intelligence vs Time per Task chart at 1.6 minutes for a score of 46

97

2K

154

506

305K

0

1

0

40

SuitToSweats retweeted

tetsuo

@tetsuoai

14 days ago

History's first trillionaire is a guy who catches rockets out of the sky with chopsticks and beams internet to every dead zone on the planet. Same guy ships cars that drive themselves, humanoid robots for the factory floor, brain chips that let paralyzed people move a cursor with pure thought, and an AI running on a supercomputer his team stood up in months instead of years. And the people crashing out about his net worth are doing it on the app he owns. The same app governments spent years trying to censor. You cannot legislate a rocket into orbit.

2K

71K

12K

7K

2M

DZ | Metir AI @SuitToSweats

13 days ago

@mark_k @OfficialLoganK @GoogleAIStudio Feel like Google need a DOGE internally to consolidate and simplify. It’s the confusion they cause where they lose out

0

30

DZ | Metir AI @SuitToSweats

14 days ago

@Pavel_Asparagus 🤣🤣

0

20

SuitToSweats retweeted

DZ | Metir AI @SuitToSweats

14 days ago

@DoWCIODavies @POTUS @SecWar 1️⃣ “America First” bans chip exports. 2️⃣ China makes its own chips 3️⃣ US pulls back on export policy after lobbying from @NVIDIA and others 4️⃣ China says 🖕and sticks to its own chips 5️⃣ America gives up its lead Now history repeats

0

1

0

24

DZ | Metir AI @SuitToSweats

14 days ago

@DoWCIODavies @POTUS @SecWar 1️⃣ “America First” bans chip exports. 2️⃣ China makes its own chips 3️⃣ US pulls back on export policy after lobbying from @NVIDIA and others 4️⃣ China says 🖕and sticks to its own chips 5️⃣ America gives up its lead Now history repeats

0

1

0

24

DZ | Metir AI @SuitToSweats

14 days ago

@benjaminmmurphy The boy who cried wolf… and then complained when the government who knew nothing about wolves took away all the sheep.

0

7

0

3K

DZ | Metir AI @SuitToSweats

14 days ago

@AnthropicAI The boy who cried wolf… and then complained when the government who knew nothing about wolves took away all the sheep.

0

1

0

11

DZ | Metir AI @SuitToSweats

14 days ago

The boy who cried wolf… and then complained when the government who knew nothing about wolves took away all the sheep.

Anthropic

@AnthropicAI

14 days ago

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: https://t.co/bwn0sximKZ

13K

88K

26K

24K

92M

0

9

2

1

2K

DZ | Metir AI @SuitToSweats

15 days ago

@theo Every dealer gives the first few rounds of crack for free 🙃😐

0

80

DZ | Metir AI @SuitToSweats

15 days ago

@RalphEcom @tibo_maker Your cheap model is opus 4.8?

0

11

DZ | Metir AI

@SuitToSweats

Last Seen Users on Sotwe

Trends for you

Most Popular Users