Oriol Pont @uripont_ - Twitter Profile

Pinned Tweet

Oriol Pont @uripont_

8 months ago

good morning

0

3

0

949

Oriol Pont @uripont_

about 6 hours ago

before all of our devices have dedicated inference accelerators, the path forward to bring ~GPT-5 capability and experience to edge devices might be enough with a mix of low-bit weights, KV cache quantization, speculation, and some small variations over the current architectures

0

14

uripont_ retweeted

Ahmad

@TheAhmadOsman

about 10 hours ago

I am betting big time on GLM 6 There are many recent papers with great pre-training optimizations (e.g. DSv4) Now, if Zhipu uses some of that (+ their own novel research), and top it off with their current post-training regime, we're looking at an amazing SOTA in the making

29

438

9

39

20K

Oriol Pont @uripont_

2 days ago

@0xSero nobody has cracked yet what @PrismML did to retrain models to be 1.125bit

0

1

0

1

27

Oriol Pont @uripont_

2 days ago

@paulg 🤔

0

10

uripont_ retweeted

François Chollet

@fchollet

3 days ago

The hardest problems are rarely solved by adding more complexity to the solution -- they are solved by reframing the question until a simpler, clearer answer reveals itself.

76

3K

308

619

91K

uripont_ retweeted

Xenova

@xenovacom

8 days ago

I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference. It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible. Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s. The next day, access to Fable 5 was suspended globally.

145

5K

370

2K

1M

Oriol Pont @uripont_

8 days ago

@initjean rivet mentioned?

0

41

Oriol Pont @uripont_

10 days ago

@SoloGen @khademinori clear and to the point

0

1

0

77

uripont_ retweeted

Ray Fernando

@RayFernando1337

12 days ago

For WWDC I was hoping for realtime Siri AI. The UX pattern of holding a button or typing (who types in 2026??) and asking it to do stuff feels like a dying paradigm akin to the floppy disc era of Mac. I wanted a canvas to take over and go full immersion with voice and personalization. I didn’t want a running chatbot list view on my phone. I wanted the ability to use the SDK to keep user privacy in mind but build unique experiences across the OS. I want the concept of “apps” to die and for developers to focus on making things magical for users. We are entering a new era, where in the last half of this decade humanity will solve the most complex problems faster than ever. Things should feel like they are part of a bigger system rather than “download my app to fix cancer.” OpenAI GPT Realtime 2 is incredibly fast and extremely capable right now. I think it’s only a matter of time for people to realize where the puck is going and to push humanity in a new direction for our era of high intelligence.

7

25

1

4

3K

Oriol Pont @uripont_

12 days ago

(and yet no one realized, not even the 1 month discrepancy. someday...)

0

16

Oriol Pont @uripont_

12 days ago

about 6 months ago

Oriol Pont @uripont_

8 months ago

for single-tenant use cases, instead of multiple (different!) experts used for each token generated, which in practice requires all experts to be loaded, it would be nice if we just needed a few expert(s) for a whole response.

1

2

0

2K

1

0

153

Oriol Pont @uripont_

12 days ago

1

0

38

Oriol Pont @uripont_

12 days ago

@aaronmahlke that this is starting to become real https://t.co/Nq78iYu1S2

Oriol Pont @uripont_

12 days ago

about 6 months ago

1

0

153

0

77

Oriol Pont @uripont_

12 days ago

@mweinbach sounds familiar... https://t.co/izx6Cl2Oir

Oriol Pont @uripont_

8 months ago

for single-tenant use cases, instead of multiple (different!) experts used for each token generated, which in practice requires all experts to be loaded, it would be nice if we just needed a few expert(s) for a whole response.

1

2

0

2K

0

2

0

2K

uripont_ retweeted