(this is not to hate on laurie please i am just giving my opinion here)
when i was a broke college student my computer was literally made of parts scraped off facebook marketplace and i used to run games sometimes with lowspecgamer-tier cfgs
but at least i OWNED it lol
People like this fundamentally don't understand input latency, you can even explain it to them and they still won't get it. They just don't play video games. It's an unsolvable problem because the physical distance is specifically important. that's why this will never happen
AI agents are tackling more and more "human work"
But are they benchmarked on the work people actually do?
tl;dr: Not really
Most benchmarks focus on math & coding, while most human labor and capital lie elsewhere.
📒 We built a database linking agent benchmarks & real-world work
Submit new tasks + agent trajectories today 🧵
@andimarafioti@mervenoyann@HKydlicek Just don't have access unfortunately, easier for user to provide recordings in my case. Definitely agree if I just had the text it would be easier, it's fine if there isn't an efficient automated solution here too
@vivnat@GoogleDeepMind Shot in the dark, any chance a psychologist could get access? I'm quite interested in this use case applied to the social sciences, especially since ground 0 is whether an AI co-scientist can be taught to think about psychometrics and latent constructs.
Is this perspective a regression to reading nature's secrets? The same idea goes with the Deepmind paper and other research applications on automating scientific inquiry, turning over these processes to agnostic language models.
Tangent but it's interesting how this evokes a natural history/history of science perspective on "reading the secrets of nature" that predates formal experimentalism. It'd be quite interesting to revisit Eamon in the context of recent technology that is so language focused.
Fei-Fei Li (@drfeifei) on limitations of LLMs.
"There's no language out there in nature. You don't go out in nature and there's words written in the sky for you.. There is a 3D world that follows laws of physics."
Language is purely generated signal.
@willccbb Encountering some of this as the domain expert asked to work with ML people, figuring out what's the right thing to evaluate and how to do it is complex and ill-defined. I've got some good resources but also it seems like some answers just don't exist yet!
Built a lightweight trace viewer to speed up LLM evals—heavily inspired by lessons from @sh_reya and @HamelHusain's evals course. Kept it simple: FastAPI + vanilla HTML/JS.
Features: failure banner, execution-flow timeline (LLM ↔ tools), keyboard shortcuts, and an annotation panel (pass/fail/defer + tags). We’re already using it internally to review a small agentic loop over GitHub activity.
🎥 2-min demo video link in the thread below
#LLMOps #Evals #Agents #FastAPI
I went for dinner with a “what-if” guy last night.
You know the type. Smart, curious, intense.
We sat down, ordered wine, and for a while it was normal. Work, travel, you know.
I told him worked in R&D for 20 years.
Then he leaned across the table and asked, dead serious: 1/
For those interested, here is a link to the final typeset power paper with @TheYiFeng:
Hancock, G. R., & Feng, Y. (2026). nmax and the quest to
restore caution, integrity, and practicality to the sample size planning process. Psychological Methods.
https://t.co/7YnzIbDElE
📝"Combining Psychology with Artificial Intelligence: What could possibly go wrong?” https://t.co/34YvWuOs2D
— Brief review paper by @o_guest and me, highlighting traps to avoid when combining Psych with AI, and why this is so important. Check out our proposed way forward! 🌟💡
I keep telling these guys they should never accept any medical treatment made possible by NIH-funded research.
But such an act of noble public service is beyond them