Alvaro Somoza

@OzzyGT

ML Engineer @HuggingFace

Chile

Joined March 2009

188 Following

283 Followers

121 Posts

OzzyGT retweeted

about 2 months ago

1/ we are excited to release ERNIE-Image, after 3 months of building from scratch. an 8b text-to-image model from baidu's ernie image team. honestly, we didn't expect an 8b dit to get this far, this fast. strong instruction following. best-in-class text rendering. runs on a 24gb gpu. huge thanks to the ERNIE-Image team, this wouldn't exist without an incredibly talented group of people who shipped fast and cared deeply. thread below. 👇 👇

9

101

11

26

7K

Alvaro Somoza @OzzyGT

about 2 months ago

Ernie-Image is pretty impressive, it can do a lot of things that other open source models couldn't. You can now use it with diffusers.

OzzyGT's tweet photo. Ernie-Image is pretty impressive, it can do a lot of things that other open source models couldn't. You can now use it with diffusers. https://t.co/lNhAiL6ffb

0

4

0

1

304

Alvaro Somoza @OzzyGT

3 months ago

you don't need an AI agent when you have a cat.

0

1

0

0

266

OzzyGT retweeted

3 months ago

Introducing Modular Diffusers 🔥 The `DiffusionPipeline` abstraction in Diffusers has established a standard in the community. But it has also limited flexibility. Modular Diffusers breaks those shackles & enables the next gen. of creative user workflows 🧨 Details ⬇️

7

87

9

53

8K

Who to follow

Oscar Ponce Lopez

@oscarponcel en Instagram. Cinta Negra 1er Dan Taekwondo 🥋. Parrillero al peo 💨. Runner amateur 🏃🏻. Futbolista aficionado ⚽. Cletero urbano 🚴.

Ingeniera en Alimentos, con gustos por la cocina, agradecida por esta vida que me toca,tratando de ser mejor aunque aveces se me olvide! arañita tejedora

Angeles Catalan 

De buen corazon, cultivo la paciencia todos los dias, Bendecida y Agradecida cada dia mas, Dios es mi fortaleza ❤️

OzzyGT retweeted

3 months ago

YouTube Video https://t.co/kiwjwrjeEN

1

13

1

4

2K

Alvaro Somoza @OzzyGT

4 months ago

@NoonienStar for that we need to wait for the condition pipeline to be merged. But for I2V and control it will lower the number of frames by a lot, this is just text to video, with image to video or video to video with those constraints probably it will be 10-8s for that resolution.

0

1

0

0

47

Alvaro Somoza @OzzyGT

4 months ago

Finally got some time to play with LTX2. With diffusers, you can generate 20-second videos with 24 GB of VRAM and 10-second videos with 16 GB GPUs, both with less than 32 GB of RAM. Here are some recipes to suit your needs: https://t.co/kmJAXlydlK

1

8

2

1

457

Alvaro Somoza @OzzyGT

6 months ago

@KristjanRetter thanks, I'll add those too

0

1

0

0

64

Alvaro Somoza @OzzyGT

6 months ago

I've created a new repository with diffusers recipes, starting with Z-Image. It has easy copy & paste code with benchmarks (RAM, VRAM, inference time), so you can choose the best optimization for your environment: https://t.co/pfofFby4Av

1

4

0

1

366

OzzyGT retweeted

6 months ago

Christmas came early in the Diffusers bandwagon 🎄 It's out folks! Go, check it 🔥

RisingSayak's tweet photo. Christmas came early in the Diffusers bandwagon 🎄

It's out folks! Go, check it 🔥 https://t.co/dkI1SsiDSw

0

37

2

4

2K

Alvaro Somoza @OzzyGT

6 months ago

While also testing Z-Image-Turbo, I tried by mistake a prompt with an hexadecimal color and it worked, so I tested it more and it also understands colors and gradients!

OzzyGT's tweet photo. While also testing Z-Image-Turbo, I tried by mistake a prompt with an hexadecimal color and it worked, so I tested it more and it also understands colors and gradients! https://t.co/R9iccCnELf

3

5

0

1

554

Alvaro Somoza @OzzyGT

6 months ago

I was reading the Z Image Turbo report and saw that it can understand multiple languages, so I tried Spanish, which is my native language, and it delivered everything I asked for.

OzzyGT's tweet photo. I was reading the Z Image Turbo report and saw that it can understand multiple languages, so I tried Spanish, which is my native language, and it delivered everything I asked for. https://t.co/BS9QtiHJbc

0

2

0

0

350

Alvaro Somoza @OzzyGT

6 months ago

If you want to use the new FLUX.2-dev with 8–12 GB GPUs, you can do it with Diffusers. You'll need to use the remote text encoder and this script: https://t.co/KRpEDSHWEK In case you're wondering, the remote text encoder is free to use, you just need a hf token.

OzzyGT's tweet photo. If you want to use the new FLUX.2-dev with 8–12 GB GPUs, you can do it with Diffusers. You'll need to use the remote text encoder and this script: https://t.co/KRpEDSHWEK
In case you're wondering, the remote text encoder is free to use, you just need a hf token. https://t.co/utrMr4TDM1

0

1

0

0

147

Alvaro Somoza @OzzyGT

9 months ago

@xhinker @RisingSayak that's true and you're not the first one to ask for this, I'm thinking of opening a repo with examples and best practices for the popular models so people can just browse it. Also there's some other efforts we're doing to bring a better experience to the users.

0

1

0

0

17

Alvaro Somoza @OzzyGT

9 months ago

@xhinker @RisingSayak I'll do some tests but my experience wasn't the same one, I tested WAN2.1 and saw a quality drop and for Flux I tested with a 24GB GPU and it was slower than using group offloading. It has been a while since then so I'll do some new benchmarks and see if something changed.

1

1

0

0

18

Alvaro Somoza @OzzyGT

10 months ago

@KristjanRetter I thought that was implied in my answer sorry. We don't have a limit on how many loras you can load, so yes, you can use any other loras you want but if there's one that fails you can open an issue with it.

0

0

0

0

71

Alvaro Somoza @OzzyGT

10 months ago

Qwen-Image-Lightning 8-Steps runs in 22s and using less than 16GB with a 3090. You can find the models and the code to test it here: https://t.co/HZGuSpSMrT

OzzyGT's tweet photo. Qwen-Image-Lightning 8-Steps runs in 22s and using less than 16GB with a 3090. You can find the models and the code to test it here: https://t.co/HZGuSpSMrT https://t.co/kK3nz2X0O7

2

16

2

3

1K

Alvaro Somoza @OzzyGT

10 months ago

@KristjanRetter yes, in fact, you now can just load the lighting lora without the need of using these models: https://t.co/dAigR1HvFQ

1

0

0

0

81

Alvaro Somoza @OzzyGT

10 months ago

@pelolisu Qwen2_5_VLForConditionalGeneration

0

0

0

0

54

Alvaro Somoza @OzzyGT

10 months ago

so it turns out we don't need to do anything special for the text encoder, this is with both the transformer and text encoder using bitsandbytes with 4-bit quantization, using under 16GB of VRAM and in ~1m40s with a 3090

OzzyGT's tweet photo. so it turns out we don't need to do anything special for the text encoder, this is with both the transformer and text encoder using bitsandbytes with 4-bit quantization, using under 16GB of VRAM and in ~1m40s with a 3090 https://t.co/fTggetCBlN

1

5

0

0

329

Last Seen Users on Sotwe

Trends for you

Most Popular Users