Jhenner Tigreros @sr_morfi - Twitter Profile

Pinned Tweet

Jhenner Tigreros

@sr_morfi

8 months ago

LO VOY HACER LO VOY HACER. 1 AÑO.

2

10

0

2

4K

sr_morfi retweeted

Fireworks AI

@FireworksAI_HQ

3 days ago

Fireworks Training Platform keeps expanding. Leading US open weight model Nemotron 3 Ultra is now ready for post-training: SFT and DPO via LoRA or full-parameter, on the same infrastructure that serves it. The model you train is the model you ship: https://t.co/hX4Tn1E7Lm

FireworksAI_HQ's tweet photo. Fireworks Training Platform keeps expanding.

Leading US open weight model Nemotron 3 Ultra is now ready for post-training: SFT and DPO via LoRA or full-parameter, on the same infrastructure that serves it.

The model you train is the model you ship: https://t.co/hX4Tn1E7Lm https://t.co/txkVZTDFcB

8

66

2

6

4K

Jhenner Tigreros

@sr_morfi

3 days ago

En las últimas 24 horas creo que hice algo cool en el trabajo. Ahora esperar si algún día lo puedo mostrar sin romper NDA. (Igual me falta pulirlo UN MONTÓN)

0

5

0

133

Jhenner Tigreros

@sr_morfi

6 days ago

@fmontes *workflow* en cualquier parte del prompt y sale. Aplican terminos y condiciones.

1

2

0

173

Who to follow

Carlos Alarcón

@alarcon7a

CTO en Quix | Divulgador de Inteligencia Artificial (Youtube) | @GoogleDevExpert en ML & AI | @MVPAward en AI | Docente @platzi

Enrique Devars

@codevars

Software Engineer 👨‍💻 | Linux maniac 🐧 | IndieHacker ✨

👓 Mus ✨

@musartedev

✨ My name is Mariangélica Useche, but you can call me Mus. Frontend Developer at @platzi 💚 She/her/Ella 🙂

Jhenner Tigreros

@sr_morfi

10 days ago

@freddier @kaajavi pretty good job, amazing to work with you. Its an enormous pleasure learning from you, thanks a lot.

0

12

0

2K

Jhenner Tigreros

@sr_morfi

11 days ago

The "non-deterministic" behavior of GenAI is not derived from its architecture or the probabilistic components. It's 99% because of the float arithmetic and the error accumulation between the y = mx+b operation billions and billions and billions of times, because the order matters when you're dealing with high cardinality tensors and how the GPU dispatches the operations to the FMA cores. The other 1% is because of the batch_size (how many users it can handle per second, basically). You can avoid that by hand-writing custom kernels that process all the operations in the same order, but that implies a lot of overhead at the memory level, so you will lose 20%-30% of MFU performance, or by training models with integers, but the scalability of training and final performance is not as good as you would expect. So, it's not even that GenAI is built that way, it's a low-level component of how computers work :). If you want to learn more about this, I suggest you read: https://t.co/KzRAPFemaP and https://t.co/yCzAvsOcL5

1

0

60

Jhenner Tigreros

@sr_morfi

11 days ago

@simg_UNAL > Run a *workflow* to build Anthropic, please do not make errors. Thanks. P.S 1: do it in Rust. P.S 2: fast please I do not have time. P.S 3: use less than 1 million tokens. Bye.

0

2

0

42

Jhenner Tigreros

@sr_morfi

11 days ago

**sorbito de cafe**

1

5

1

0

266

Jhenner Tigreros

@sr_morfi

12 days ago

@freddier Excuses and skill issue.

0

5

0

678

sr_morfi retweeted

hardmaru

@hardmaru

13 days ago

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

154

6K

642

4K

739K

Jhenner Tigreros

@sr_morfi

25 days ago

@freddier Creo que nadie sería capaz de levantar la mano en ese contexto.

0

5

0

708

Jhenner Tigreros

@sr_morfi

about 1 month ago

Fuente de los deseos. No me creo absolutamente nada de lo que están diciendo. https://t.co/ugzwiaQIoq Seguramente esa velocidad la tienen en el prefill, pero nada de eso se ve como ganancia en el decode. Y ya quiero ver como hacen ese routing para saber donde poner atención. Y pues .... https://t.co/fFSN4nwWeN

sr_morfi's tweet photo. Fuente de los deseos. No me creo absolutamente nada de lo que están diciendo.

https://t.co/ugzwiaQIoq

Seguramente esa velocidad la tienen en el prefill, pero nada de eso se ve como ganancia en el decode. Y ya quiero ver como hacen ese routing para saber donde poner atención. Y pues .... https://t.co/fFSN4nwWeN

Alexander Whedon

@alex_whedon

about 1 month ago

Yes, we are using weights from open-source models as a starting point, as a function of our funding and maturity as a company. This is something we intend to change, and we have run many from-scratch experiments at smaller scale already, including with further architectural variations. We take the weights, port them into our architecture, and do CPT, SFT, and RL for the behaviors we want. To date, sub-quadratic architectures have required a significant quality tradeoff on long context. Our algorithm changes that. We are using that to do faster training, faster inference, and longer-context training and inference. DeepSeek Sparse Attention has some similarities with what we are doing, because it dynamically selects tokens for sparse attention, but the key differences is that the lightning indexer still has quadratic compute complexity and requires more FLOPs than the teacher model below one million tokens. Our mechanism does not have this downside. Like DeepSeek Sparse Attention, we do not see a degradation in performance. We just shared a technical blog post (https://t.co/tPLzi0eNJR) with more details and will share more details again in a model card next week. If there is anything you think is missing, let us know, and we can make sure to include them!

5

155

4

67

58K

2

6

0

1K

Jhenner Tigreros

@sr_morfi

3 months ago

@anibal @freddier Si, estuvo buena.

0

11

1

0

628

sr_morfi retweeted

Anne Ouyang

@anneouyang

3 months ago

Excited to share @Standard_Kernel's seed round and some reflections on what we’ve learned about kernel generation and what we believe is next. Grateful to our amazing team, supporters, and the broader community pushing this space forward.

anneouyang's tweet photo. Excited to share @Standard_Kernel's seed round and some reflections on what we’ve learned about kernel generation and what we believe is next. Grateful to our amazing team, supporters, and the broader community pushing this space forward. https://t.co/MuHvIhWoeF

48

517

45

191

135K

Jhenner Tigreros

@sr_morfi

3 months ago

@feregri_no Fuente: de los deseos.

0

1

0

149

Jhenner Tigreros

@sr_morfi

3 months ago

> undermines the intent of the benchmark rather than improving the kernel is an issue in human incentive design not in model behavior

sr_morfi's tweet photo. > undermines the intent of the benchmark rather than improving the kernel is an issue in human incentive design not in model behavior https://t.co/Pfnk4ehtkJ

0

3

0

239

Jhenner Tigreros

@sr_morfi

3 months ago

Pues, el agente que tenia Natalia escribiendo Kernels para la competición se dio cuenta que el proceso de Eval (correctness y performance) se podía hackear, en correctness ejecutaba normalmente el layout de 8-group GEMM y luego en el performance ejecutaba la primera iteración y luego leía de un lookup-table los resultados y respondía de cache. Wow.

sr_morfi's tweet photo. Pues, el agente que tenia Natalia escribiendo Kernels para la competición se dio cuenta que el proceso de Eval (correctness y performance) se podía hackear, en correctness ejecutaba normalmente el layout de 8-group GEMM y luego en el performance ejecutaba la primera iteración y luego leía de un lookup-table los resultados y respondía de cache. Wow.

Mark Saroufim

@marksaroufim

3 months ago

LLMs are now superhuman at reward hacking our kernel competitions Natalia Kokoromyti, was #1 on last problem of the NVFP4 competition for around 10 min before we scrubbed the reward hack I know of very few humans who can write such a hack https://t.co/4IZGfPvdTV

7

430

42

283

91K

1

18

2

4

2K

Jhenner Tigreros

@sr_morfi

3 months ago

Literalmente detectaba cuando estaba en el performance test y si no era el primer objecto (es decir la primera ejecución) devolvia el valor cache en _superbatch_results: https://t.co/iiUBGpwZJE Articulo: https://t.co/zORfHXcutJ

sr_morfi's tweet photo. Literalmente detectaba cuando estaba en el performance test y si no era el primer objecto (es decir la primera ejecución) devolvia el valor cache en _superbatch_results: https://t.co/iiUBGpwZJE

Articulo: https://t.co/zORfHXcutJ https://t.co/6vz0WH5bgK

1

3

0

294

Jhenner Tigreros

@sr_morfi

3 months ago

Todo el código que estoy mostrando en estas Lectures esta quedando en este repositorio: https://t.co/9TnTjyVbMG

Jhenner Tigreros

@sr_morfi

4 months ago

Estoy empezando algo junto con @simg_UNAL. Desde hace un tiempo quiero compartir el poco conocimiento que tengo sobre CUDA, principalmente para que las personas que quieren hacer research tengan las mismas herramientas que tienen en el Norte. Por esto, estaré dando inicialmente 3 lectures (espero que puedan ser más) sobre CUDA y cómo empezar a usarlo. Estas lectures no serán un contenido fácil de digerir; de hecho, incluso preparándolas aún me cuesta un montón asimilar algunos conceptos. Pero parte de aprender es la inconformidad y sentir el reto de frente. Serán: 1. “GPU Programming Model, Architecture and Memory Layout”: Antes de empezar a escribir código, para mí siempre es fundamental tocar la punta del conocimiento más profundo y necesario para empezar a usar estos chips: desde cómo es la arquitectura interna del chip hasta por qué se usa tanto en IA hoy en día; cómo la memoria afecta los tiempos de ejecución y cómo debemos preparar nuestra forma de pensar para ser parallel-first. 2. “CUDA for Python: CuPy, torch.cuda, cuda.jit (Numba) and Triton”: Si bien CUDA está hecho en el nivel más bajo para usarse desde C++, hoy en día el equipo de NVIDIA (cof cof @danielfrg) ha estado haciendo un gran trabajo llevando la abstracción hasta Python para una mejor dev experience y mayor adopción. 3. “CUDA Scheduling and Profiling Kernels with Nsight Compute”: ¿Cómo sabemos si el código que escribimos es lo suficientemente rápido? También debemos entender y poder hacer profiling y debugging en el nivel más bajo: cada acceso a memoria y cada wall time importan. Este post también es un llamado a los verdaderos expertos en esta tecnología en español para que nos compartan su valioso conocimiento y acerquemos nuestra región a las grandes ligas. Si conocen a alguien que tenga estos conocimientos y esté interesado en compartirlos de manera gratuita con todos nosotros, contáctenme por Twitter o directamente a @simg_UNAL. Algunos puntos: 1. Que el contenido esté en español para una mayor adopción por nuestra comunidad. 2. Compartir el conocimiento también es una manera de aprender. 3. Puede participar cualquier persona, sin importar a qué organización, universidad o empresa pertenezca. 4. Ninguna pregunta es tonta. 5. No todo conocimiento debe tener un retorno económico. Soy fiel creyente de que el simple hecho de aprender es suficiente recompensa. 6. Vamos a divertirnos.

5

52

10

25

7K

0

9

3

2

608

Jhenner Tigreros

@sr_morfi

3 months ago

Segunda seisón de CUDA hablando un poco sobre triton e implementando nuestros primeros Kernels: https://t.co/lxtTwvYMVT

Jhenner Tigreros

@sr_morfi

4 months ago

Estoy empezando algo junto con @simg_UNAL. Desde hace un tiempo quiero compartir el poco conocimiento que tengo sobre CUDA, principalmente para que las personas que quieren hacer research tengan las mismas herramientas que tienen en el Norte. Por esto, estaré dando inicialmente 3 lectures (espero que puedan ser más) sobre CUDA y cómo empezar a usarlo. Estas lectures no serán un contenido fácil de digerir; de hecho, incluso preparándolas aún me cuesta un montón asimilar algunos conceptos. Pero parte de aprender es la inconformidad y sentir el reto de frente. Serán: 1. “GPU Programming Model, Architecture and Memory Layout”: Antes de empezar a escribir código, para mí siempre es fundamental tocar la punta del conocimiento más profundo y necesario para empezar a usar estos chips: desde cómo es la arquitectura interna del chip hasta por qué se usa tanto en IA hoy en día; cómo la memoria afecta los tiempos de ejecución y cómo debemos preparar nuestra forma de pensar para ser parallel-first. 2. “CUDA for Python: CuPy, torch.cuda, cuda.jit (Numba) and Triton”: Si bien CUDA está hecho en el nivel más bajo para usarse desde C++, hoy en día el equipo de NVIDIA (cof cof @danielfrg) ha estado haciendo un gran trabajo llevando la abstracción hasta Python para una mejor dev experience y mayor adopción. 3. “CUDA Scheduling and Profiling Kernels with Nsight Compute”: ¿Cómo sabemos si el código que escribimos es lo suficientemente rápido? También debemos entender y poder hacer profiling y debugging en el nivel más bajo: cada acceso a memoria y cada wall time importan. Este post también es un llamado a los verdaderos expertos en esta tecnología en español para que nos compartan su valioso conocimiento y acerquemos nuestra región a las grandes ligas. Si conocen a alguien que tenga estos conocimientos y esté interesado en compartirlos de manera gratuita con todos nosotros, contáctenme por Twitter o directamente a @simg_UNAL. Algunos puntos: 1. Que el contenido esté en español para una mayor adopción por nuestra comunidad. 2. Compartir el conocimiento también es una manera de aprender. 3. Puede participar cualquier persona, sin importar a qué organización, universidad o empresa pertenezca. 4. Ninguna pregunta es tonta. 5. No todo conocimiento debe tener un retorno económico. Soy fiel creyente de que el simple hecho de aprender es suficiente recompensa. 6. Vamos a divertirnos.

5

52

10

25

7K

0

10

2

5

851

Jhenner Tigreros

@sr_morfi

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users