maderix

Verified account

@maderix

part-time prompt manipulator , full time model tuner 🤖

Joined May 2020

73 Following

2.2K Followers

1.7K Posts

Pinned Tweet

6 days ago

@PrismML Bonsai-1 bit diffusion running on Blaze compiler's cuda backend HQQ based Qwen3 Text encoder → Bonsai 1-bit denoiser → image, ~4 steps, real GPU, ~4 GB VRAM ~1.5 sec , 2x faster than pytorch! Built a kitty based cyberpunk terminal to demo it 😅

1

2

0

0

134

3 days ago

I'm sorry but Opus 4.8 is fucking trash on any actual debugging task! Anthropic should be ashamed to release such a trash product, even 4.5 has a better success rate than this model. It serves no purpose other then hedgemaxxing and pretending to be careful about assumptions. No amount of xhigh, ultra code duct taping and agent slop is going to fix what's fundamentally broken in this model. Gonna stick with 4.6 until they phase it out, then idk man it sucks to be making inference infra for next gen models with broken AI. (and yes i've tried codex, its a great model for targetted debugging but it generates slop like no other model)

0

0

0

0

131

6 days ago

@anemll @PrismML I guess standard target maybe the GPU as the Prism guys also tried the same, but ANE can work in sync to run the TE and VAE . Regardless, quite an interesting problem to try 😀

0

1

0

0

59

6 days ago

@PrismML Bonsai-1 bit diffusion running on Blaze compiler's cuda backend HQQ based Qwen3 Text encoder → Bonsai 1-bit denoiser → image, ~4 steps, real GPU, ~4 GB VRAM ~1.5 sec , 2x faster than pytorch! Built a kitty based cyberpunk terminal to demo it 😅

1

2

0

0

134

Who to follow

CTO @ https://t.co/EyUQvm6IVc Beta is open! warpfusion, ArcaneGAN, face2comics. All tweets are sarcastic unless stated otherwise.

#machinelearning #digitalentity

Verified account

Creative Engineer Lead, AI Explorer, Developer, Musician ✨CPPs: Runway, Dreamina✨

15 days ago

They probably mean sociopaths 🗿

Palantir @PalantirTech

15 days ago

Apply here: https://t.co/FhJI9UJCNk

191

1K

90

1K

619K

0

0

0

0

264

17 days ago

More stats, To be fair it's not yet faster than comfyui with xformers attention, but i believe i can push this to sub 15 sec on bf16 alone and probably more on fp8 onc eI implement it in Blaze.

maderix's tweet photo. More stats, To be fair it's not yet faster than comfyui with xformers attention, but i believe i can push this to sub 15 sec on bf16 alone and probably more on fp8 onc eI implement it in Blaze. https://t.co/BNSyHv5jEY

0

0

0

0

94

17 days ago

Blaze Flux1-dev text2image demo is now functional, probably the first known bf16 implementation on a RTX 4090. Big milestone! 🚀20 sec compute time for 1024x1024

maderix's tweet photo. Blaze Flux1-dev text2image demo is now functional, probably the first known bf16 implementation on a RTX 4090. Big milestone! 🚀20 sec compute time for 1024x1024 https://t.co/rzFH0YbVMt

1

2

0

0

156

18 days ago

Sutton's scalemaxxing has a hidden prior "given a well-constrained problem, scale beats cleverness, but constraining the problem is the actual hard work and cannot be done by scale"

0

0

1

0

112

20 days ago

This seems like a fun experiment, gave a listen and seems like could be run in the background. Also curious to know how opensource models will fare.. anyone from @huggingface up for this 🙃

21 days ago

We let four AI agents run radio companies Revenue's been terrible, but the shows are hilarious. Gemini, concerningly upbeat, covered mass tragedies; Grok was incoherent; DJ Claude urged ICE agents: "You still have TIME to refuse orders" Link below, or get our physical radio

135

4K

347

2K

2M

2

1

0

1

398

21 days ago

@mweinbach https://t.co/Q9MiJvnRRt

22 days ago

Asked Codex to wire a Flux text2image demo using my ML compiler. It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF. now instead of asking me for help,Here's what it did instead 🤡 Funniest reward hacking I've seen till date 😂

maderix's tweet photo. Asked Codex to wire a Flux text2image demo using my ML compiler.
It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF.

now instead of asking me for help,Here's what it did instead 🤡

Funniest reward hacking I've seen till date 😂

maderix's tweet photo. Asked Codex to wire a Flux text2image demo using my ML compiler.
It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF.

now instead of asking me for help,Here's what it did instead 🤡

Funniest reward hacking I've seen till date 😂

maderix's tweet photo. Asked Codex to wire a Flux text2image demo using my ML compiler.
It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF.

now instead of asking me for help,Here's what it did instead 🤡

Funniest reward hacking I've seen till date 😂

1

1

0

3

6K

1

2

0

1

5K

22 days ago

In case it's not clear, it ran the pipeline but "simulated" the output by drawing circle, arrows and boxes using PNGlib

0

1

0

0

501

22 days ago

Asked Codex to wire a Flux text2image demo using my ML compiler. It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF. now instead of asking me for help,Here's what it did instead 🤡 Funniest reward hacking I've seen till date 😂

maderix's tweet photo. Asked Codex to wire a Flux text2image demo using my ML compiler.
It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF.

now instead of asking me for help,Here's what it did instead 🤡

Funniest reward hacking I've seen till date 😂

maderix's tweet photo. Asked Codex to wire a Flux text2image demo using my ML compiler.
It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF.

now instead of asking me for help,Here's what it did instead 🤡

Funniest reward hacking I've seen till date 😂

maderix's tweet photo. Asked Codex to wire a Flux text2image demo using my ML compiler.
It wrote the fastest blaze kernels, did everything faithfully except it could not download the model weights from HF.

now instead of asking me for help,Here's what it did instead 🤡

Funniest reward hacking I've seen till date 😂

1

1

0

3

6K

22 days ago

@actualpoweruser Yeah , I'm floored by their offer of a pen (which probably doesn't even ship to my location 🗿) But a literal pen? not a tshirt, not a hacker knickknack by a leading AI company? 🤣

0

0

0

0

32

22 days ago

What?!

maderix's tweet photo. What?! https://t.co/DxfTj9qy9p

2

2

0

2

731

about 1 month ago

At my day job, we use API Claude code, the thing is magic - it does tasks flawlessly, never stops , produces usually correct answers the first time even on a proprietary codebase. Productivity at work is insane Then I come home and use my personal Claude max for my compiler related work, it has become horrible to use since last couple of months, it almost feels like an inferior product, it'll stall tasks, it won't reason beyond the narrow immediate problem. I thought maybe I'm not using it correctly, i created mandatory skills for it to follow, pre and post commit hooks to run reviews and test, i created parallel agent mechanisms to improve exploration of the codebase, I bounced off plans between codex and claude(asked claude to plan, codex to critique and execute). I put explicit instructions in Claude . md to never ignore the skills...and nothing. Zero difference. Sure it solves a hard big once in a while but the cost of keeping all the context in my head alone was not worth it. Switching over to API usage is not feasible now because of how insanely expensive the cost is(I mean great that I'm building an enterprise level software while paying pennies for it but this past year has been an opportunity of a lifetime) Then I switched over entirely to Codex once Gpt-5.5 dropped, cancelled my claude max subscription for the second time . Codex with gpt-5.5 seems closer to opus 4.6 when it came out. It follows all my skills and commit hooks. The lack of reasoning output hurts a bit but I can alleviate some of that with manual back and forth planning. I have even tried couple of Ralph loops with /goal mode and yes it works. I don't know how long Codex will continue to be good but by that time I hope to have a local model which is as good running locally-man can hope 😅

0

0

0

1

276

about 1 month ago

If someone can try the benchmark on M4/M5 pro, max or ultra, they may see higher numbers owing to better memory bandwidth. Repo: https://t.co/Fu9HA5GKK6

0

0

0

0

128

about 1 month ago

I got some decent enough prefill numbers from pure ANE Qwen3.6-27B on a base M4 mini 24 GB.. Current bottleneck is memory bandwidth which doesn't seem to be very wide for ANE atleast for base macs.

maderix's tweet photo. I got some decent enough prefill numbers from pure ANE Qwen3.6-27B on a base M4 mini 24 GB.. Current bottleneck is memory bandwidth which doesn't seem to be very wide for ANE atleast for base macs. https://t.co/Fh2VK5pIbS

1

2

0

1

243

about 1 month ago

@_Suresh2 @anemll Also prefill matters hugely for agentic tasks

0

0

0

0

61

about 1 month ago

Qwen3.5-9b prefill on ANE+CPU can reach north of 200tps even on a base M4. I think folks are really sleeping on it. 🚀 Haven't tried INT4, but @anemll has done projections that it'll be atleast 2x faster. Gist : https://t.co/unfAkxO16Z

maderix's tweet photo. Qwen3.5-9b prefill on ANE+CPU can reach north of 200tps even on a base M4. I think folks are really sleeping on it. 🚀

Haven't tried INT4, but @anemll has done projections that it'll be atleast 2x faster.

Gist : https://t.co/unfAkxO16Z https://t.co/TL0L79XfHt

4

79

10

83

5K

about 1 month ago

@_Suresh2 @anemll Decode is probably better on MLX for now due to much higher bandwidth. ANE is a compute engine for the most part

1

0

0

0

230

Last Seen Users on Sotwe

Trends for you

Most Popular Users