im really glad chatgpt makes insane images when pushed slightly off the rails thus proving machine creativity is real and probably very nerfed in daily operation
Hot take: Universities charge $300,000 for a degree that teaches you skills any LLM can do for free. At some point we need to have an honest conversation about whether higher education is the greatest individuals misallocation of capital in recent history.
watch gemma 4 12b q8 dancing on a single rtx 3090 at 33 tokens a second average.
google dropped this two days ago and it's the kind of thing that quietly moves the floor. a fully multimodal model, text image and audio in one net, 256k context, apache licensed, running entirely on one consumer gpu, no one metering your tokens.
what you're watching is the whole loop live: the server streaming tokens top left, the gpu pegged bottom left, the answer landing on the right. all local, all mine.
a year ago this needed someone else's datacenter. today it's a card you can buy. open source isn't catching up anymore, it's setting the pace.
how fast does yours run?
Compelling essay by sci-fi writer Ted Chiang on why LLMs are nowhere near consciousness, but why it serves the interests of LLM companies to constantly suggest that they might be.
I've pulled one quote below, but the whole article is worth reading.
how to be good at your job
- realize this one thing is actually made up of two separate things
- realize instead of solving the direct problem you can solve a broader problem
- instead of implementing thing, implement other thing that makes it easier to implement thing
this is an interesting point in the new ted chiang piece – no one really claims that alphafold is conscious, or that sora or midjourney or dall-e are conscious
Playing around with a Kent Beck-inspired prompt today:
Before implementation, look for opportunities to prefactor the code to make the implementation easier. "Make the change easy, then make the easy change."
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are — it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.
LOVON: Legged Open Vocabulary Object Navigator
https://t.co/wScONLKUlF
LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments.
TIL: You can optimize any agent (cli) with GEPA to automatically optimize your prompts.
GEPA accepts any `(str) -> str` callable, it works with your own custom CLI, local models, or API agents. Wrap your agent in a python function and let it self-optimize.
I can’t sleep at night because my mind races with all the cool shit I could be building. AI has turned my workdays into 24 hour grind sessions. I code until I literally collapse from exhaustion 7 days a week.