New Paper: mmWave Radar Aware Dual-Conditioned GAN for Speech reconstruction of Signals with Low SNR. 🧵(1/4)
Demo Website with audios and spectrograms: https://t.co/v7X1EobN8y
Paper: https://t.co/DXTi4Wb0CD
@sdianahu Grading reasoning traces is probably one of the hardest things to do considering how much traces vary across model providers and even releases across the same model provider. You'd just have to keep coming up with new measures/evals even if there's a small weight change.
most of what is considered "taste" (read: design) is in the realm of zero sum signaling games
your tasteslop will just be the next AI slop
and then your anti-tasteslop will become the next tasteslop
taste is defined in terms of slop and therefore can never transcend slop.
Too many developers don't understand what "compounding slop" is.
A loop that prompts agents is a great way to automate slop creation. Constrain the state-action space so the loop can't drift, then automate inside it.
Human-in-the-loop = feature, not bottleneck.
New Paper: mmWave Radar Aware Dual-Conditioned GAN for Speech reconstruction of Signals with Low SNR. 🧵(1/4)
Demo Website with audios and spectrograms: https://t.co/v7X1EobN8y
Paper: https://t.co/DXTi4Wb0CD
Anthropic has seriously shattered my AI psychosis, so now I feel extremely anxious about the sudden lack of perceived hyperproductivity. I don't trust any of my agents, Claude or otherwise, will get a single thing even partially correct. Even for vibe code fun the magic is gone
This Meta + Stanford + Illinois survey paper argues that AI agents work better when code becomes their main working layer.
The problem is that an LLM by itself is mostly a text predictor, so long tasks can lose state, hide mistakes, and turn plans into actions in fragile ways.
The real advance is not “AI writes code,” but “AI uses code as the environment it thinks inside.”
The authors call the surrounding system an agent harness, meaning the tools, memory, sandboxes, checks, and feedback loops that turn a model into an agent.
Their core idea is that code should sit at the center of that harness, because code can be run, inspected, checked, saved, edited, and shared.
Tests become sensors.
Repositories become memory.
Logs become history.
Sandboxes become boundaries.
A generated script is no longer merely an answer; it is a handle the system can run, check, revise, share, and roll back.
The main finding is a pattern across many fields: code helps agents reason through executable steps, act through tool calls or control programs, and model environments through tests, traces, logs, repositories, and simulators.
----
Paper Link – arxiv. org/abs/2605.18747
Paper Title: "Code as Agent Harness"
Sloppification is long to be a large scale problem soon enough and whoever builds smthng that can balance abstraction without missing out on nuance that actually solves issues in production grade code wins no doubt
I do think long term the health of repos everywhere will be near dogshjt. The instant gratification from seeing the sloppy code "work" makes most devs completely ignorant to anything long term. Moreover the increased expectations in terms of output doesn't help this either
"let's build a review agent" sure that'll report 100 bugs in its report. Will YOU fix it? No.
You feed it back to an agent that does the same thing all over again