Marc @MarcRibalta_ - Twitter Profile

8 days ago

GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin! In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology. Compared to 5.1, GLM-5.2 (Max) climbs from #13 to #10. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%). GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window. Huge congrats @Zai_org for the incredible release! See thread for details on how GLM-5.2 (Max) performs across 5 different signals.

arena's tweet photo. GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin!

In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology.

Compared to 5.1, GLM-5.2 (Max) climbs from #13 to #10. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%).

GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window.

Huge congrats @Zai_org for the incredible release!

See thread for details on how GLM-5.2 (Max) performs across 5 different signals.

31

760

80

134

687K

Marc @MarcRibalta_

7 days ago

@eliebakouch @jxmnop

0

244

MarcRibalta_ retweeted

OpenCode

@opencode

about 1 month ago

OpenCode x Ring 2.6 1T - free for a limited time 256K context • reasoning • text only Thanks to @AntLingAGI and @novita_labs for making the model available

45

2K

75

278

100K

MarcRibalta_ retweeted

Sandro

@pupposandro

about 2 months ago

Reminder that this is the future of humanity if open source AI doesn’t win

117

3K

229

179

1M

Who to follow

LleidaHack

@lleidahack

El nostre objectiu, motivar la programació a Lleida

AlbaLamas

@alba_lamas_

👩🏻‍💻 Ingeniera Big Data (Scala + Spark) en Bluetab, an IBM Company 🤓 Desde pequeñita amante de los números y todo lo que tuviese botones 📍 Lleida

Onru 👍

@NotASeriousOnru

Passing by...

MarcRibalta_ retweeted

Unsloth AI

@UnslothAI

about 2 months ago

We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀 Learn how 3 optimizations help your home GPU train models faster: 1. Packed-sequence metadata caching 2. Double-buffered checkpoint reloads 3. Faster MoE routing Guide: https://t.co/nwvVfNC8XE

UnslothAI's tweet photo. We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀

Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing

Guide: https://t.co/nwvVfNC8XE https://t.co/j4NCke2F5o

22

941

159

649

61K

MarcRibalta_ retweeted

kache

@yacineMTB

about 2 months ago

this is what the future looks like now that i've been unshackled from stinky, ugly macos I can actually make my computer do what I want it to do

yacineMTB's tweet photo. this is what the future looks like

now that i've been unshackled from stinky, ugly macos

I can actually make my computer do what I want it to do https://t.co/Pnjy2NF5Pz

24

324

8

58

31K

MarcRibalta_ retweeted

Benjamin Marie

@bnjmn_marie

2 months ago

Qwen3.6 GGUF Evaluations For the 27B: Q2_K_XL is surprisingly recommendable. IQ3_XXS performs very similarly, uses only +0.2 GB, and generates significantly fewer tokens. If you are memory-tight, pick this one. Otherwise, if you can spare +2.5 GB, use Q3_K_XL: (almost) same accuracy and token efficiency as the original. All the results, also for the 35B, here: https://t.co/zbBCZ0Ty7a More results are coming, probably Monday, covering other GGUF providers and some abliterated models.

bnjmn_marie's tweet photo. Qwen3.6 GGUF Evaluations

For the 27B:

Q2_K_XL is surprisingly recommendable.

IQ3_XXS performs very similarly, uses only +0.2 GB, and generates significantly fewer tokens. If you are memory-tight, pick this one.

Otherwise, if you can spare +2.5 GB, use Q3_K_XL: (almost) same accuracy and token efficiency as the original.

All the results, also for the 35B, here:

https://t.co/zbBCZ0Ty7a

More results are coming, probably Monday, covering other GGUF providers and some abliterated models.

38

697

75

630

84K

Marc @MarcRibalta_

10 months ago

@gabmfrl Me pasaba, pero con el tiempo cambia. Ahora a las 23 el cuerpo ya me pide dormir.

0

1

0

55

MarcRibalta_ retweeted

Avi Chawla

@_avichawla

10 months ago

8 RAG architectures all AI Engineers should know:

17

2K

348

3K

228K

MarcRibalta_ retweeted

Skywork

@Skywork_ai

11 months ago

Matrix-Game 2.0 — The FIRST open-source, real-time, long-sequence interactive world model Last week, DeepMind's Genie 3 shook the AI world with real-time interactive world models. But... it wasn't open-sourced. Today, Matrix-Game 2.0 changed the game. 🚀 25FPS. Minutes-long interaction. Fully open-source.

48

2K

334

1K

1M

MarcRibalta_ retweeted

alex fazio

@alxfazio

11 months ago

62

5K

165

281

422K

MarcRibalta_ retweeted

Lewis Tunstall

@_lewtun

11 months ago

One line of code is all it takes to fine-tune the gpt-oss models from @OpenAI 🔥 > Support to target the MoE expert layers with PEFT > Kernels for FlashAttention3 & MegaBlocks > Fast inference with MXFP4 quantization format In our testing, these models are extremely efficient to tune and can be adapted to new domains with just a few 100 samples 🤯 Download the models: https://t.co/3cOIB3tGVt Training & inference recipes: https://t.co/aQaDzUGHXR

_lewtun's tweet photo. One line of code is all it takes to fine-tune the gpt-oss models from @OpenAI 🔥

> Support to target the MoE expert layers with PEFT
> Kernels for FlashAttention3 & MegaBlocks
> Fast inference with MXFP4 quantization format

In our testing, these models are extremely efficient to tune and can be adapted to new domains with just a few 100 samples 🤯

Download the models: https://t.co/3cOIB3tGVt
Training & inference recipes: https://t.co/aQaDzUGHXR

12

745

98

728

77K

MarcRibalta_ retweeted

Sundar Pichai

@sundarpichai

11 months ago

So many of you are loving turning your photos into short videos in the @Geminiapp and the Gemini API. Next up, we’ll be rolling this feature out to @YouTube Shorts and @GooglePhotos. And soon, Remix your Google Photos into comics, sketches + 3D animations.

131

1K

165

126

186K

MarcRibalta_ retweeted

elvis

@omarsar0

about 1 year ago

New Lens on RAG Systems RAG systems are more brittle than you think, even when provided sufficient context. Great work from Google and collaborators. Good tips for devs included. Here are my notes:

omarsar0's tweet photo. New Lens on RAG Systems

RAG systems are more brittle than you think, even when provided sufficient context.

Great work from Google and collaborators.

Good tips for devs included.

Here are my notes: https://t.co/ip4IoajAsp

33

1K

226

3K

190K

MarcRibalta_ retweeted

eric zakariasson

@ericzakariasson

about 1 year ago

we wrote a guide on how to work with documentation in @cursor_ai includes some guidance on when to use which tool, a quick MCP server example for internal docs, and some prompting tips

ericzakariasson's tweet photo. we wrote a guide on how to work with documentation in @cursor_ai

includes some guidance on when to use which tool, a quick MCP server example for internal docs, and some prompting tips https://t.co/bqLI4bY6A9

32

2K

120

2K

172K

MarcRibalta_ retweeted

Akshay 🚀

@akshay_pachaar

about 1 year ago

Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models. The visual below explains how they differ from Transformers. Let's dive in to learn more about MoE!

6

204

30

217

13K

MarcRibalta_ retweeted

Harrison Chase

@hwchase17

about 1 year ago

OpenAI recently released a guide on building agents which contains some misguided takes There's a lot of FUD, confusion, hype, and noise around agents I wrote a blog on how to think about agent frameworks. Includes: Background Info - What is an agent? - What is hard about building agents? - What is LangGraph? Flavors of agentic frameworks - “Agents” vs “workflows” - Declarative vs non-declarative - Agent abstractions - Multi agent Common Questions - What is the value of a framework? - As the models get better, will everything become agents instead of workflows? - What did OpenAI get wrong in their take? - How do all the agent frameworks compare?

hwchase17's tweet photo. OpenAI recently released a guide on building agents which contains some misguided takes

There's a lot of FUD, confusion, hype, and noise around agents

I wrote a blog on how to think about agent frameworks. Includes:

Background Info
- What is an agent?
- What is hard about building agents?
- What is LangGraph?

Flavors of agentic frameworks
- “Agents” vs “workflows”
- Declarative vs non-declarative
- Agent abstractions
- Multi agent

Common Questions
- What is the value of a framework?
- As the models get better, will everything become agents instead of workflows?
- What did OpenAI get wrong in their take?
- How do all the agent frameworks compare?

88

919

120

1K

450K

MarcRibalta_ retweeted

Elias @Eliasfiz

over 1 year ago

Today, we’re launching Orpheus, an open-source TTS model that exceeds the capabilities of both open and closed-source models such as ElevenLabs and OpenAI! (1/6)