Listening to Tchaikovskys first piano concerto and within the first few mins it becomes inescapably clear: this dude was *definitely* gay. Only a gay man could have composed this. Looked it up and yep, clocked him
The interesting part is that QAT outputs are not simply BF16 models but with 4bit precision, right? Presumably they can still use the full 16bits of precision, but they were just trained to reduce well to 4bits, right? So that’s where my question is coming from.
Google published the BF16 for the QAT’d Gemmas. I wonder, do all the various quant methods properly interact with QAT? That is, will gguf, autoround, awq, gptq etc all recover similar levels? Or does QAT lend itself more to some strategies than others?
It seems crazy, but if this doesn’t happen, local llm devs will be better off with either a 6 year old nvidia card (3090) or an Intel B70, simply because they are supported when the R9700 is seemingly not.
@AnushElangovan can the R9700/rdna4/gfx1201 get some love from AITER etc etc? It’s really close to being *the* best choice for builders and tinkerers, and support is partly there, but it still feels like an afterthought.
I should be able to launch vllm and get native performance out of the box by default. I shouldn’t have to rely on community patches and hacks and workarounds to make my “AI Pro” card actually usable for AI.
@soft_fox_lad In the public’s imagination, having a data center near your home is like living near a radioactive dump. Somehow this pervades both the extremely-liberal (treehugger type) and extremely conservative (git yer liberal tech away from my property values)
I’m been using the 4bit mlx of lfm-2.5-vl-1.6 for an iPhone project, and it’s amazing!! Will def try this extract version, since of course I’m asking it to output json anyway. I suppose decoding with a json grammar would achieve similar outcomes on the surface, but I expect having it trained specifically on this task will be better overall
I thought my intern would be supercharged by coding agents, but what I’m realizing is it puts them even more behind. If you never had experience doing stuff at 5mph, then being handed a jetpack doesn’t really make you faster, it just makes you feel even more lost
Here's our video of the explosion at Launch Complex 36. It happened about 9 pm ET (0100 UTC) as Blue Origin was beginning a static fire test of its New Glenn rocket.
Watch live views: https://t.co/tm2wZQmAVD
Working on a project that uses on-device vlm on iPhone mlx, and woooow Liquid LFM2.5-1.6B is soooo good and so fast. Hoping I finish soon and have something worthwhile to share
@natolambert@demishassabis I recall lots of Qwen researchers resigning following 3.5? I’m curious if this impacted the velocity of their work? The incredible jump in Qwen 3.6 seems to indicate no, but then again maybe it’s just too soon to tell?
what's really crazy about this whole datacenter psychosis thing is that they literally do the opposite. as you can see, using nothing but a datacenter, I can purify their drinking water.
@giffmana Rockstar realizing they could just never release GTA VI and instead make more money by selling it as a world model / simulator to autonomous vehicle startups