@iScienceLuvr In retrospect, the leak mostly exposed well designed scaffolding rather than revolutionary in agent design. The constraint still seems to be high-quality, long-running agents at scale, which needs crazy amount of inference compute and isn’t trivial for most people or small teams.
@AlexiGlad In many ways autoresearcher and agentic loop systems are meta learners, just not with model parameters. It’s working well to generate AI slop right now😁.
@agtx007@yunamorichan IMO Trill is good, but overpriced for what it is. You can get a good burger and steak anywhere. HTX has an amazing mix of TX and the east. Places like Blood Bros BBQ (mix of Chinese and Texas BBQ) and Crawfish & Noodles/any Viet-Cajun place are extremely unique to the area.
What if instead of autoresearch reflecting incremental progress from a single person, it reflected *hundreds* of researchers’ progress, updated live, with every run’s data point being interactive and reproducible? You could survey the full space of explored ideas, see which changes actually move the benchmark, and fold the most promising ideas into your next run.
Labless (https://t.co/B00yoWPOK3) is my attempt at this. It can support thousands of simultaneous training runs across contributors around the world, each learning & improving together to hillclimb on a fixed benchmark, validated using a standardized codebase scaffold.
Importantly, all valid training runs auto-submit to Labless, so failed experiments are as visible and reproducible as the successes. We have an agentic API that lets you or your coding agent study every run submitted so far so you don't waste time investigating something someone already tried, and you can source novel ideas from other contributors.
Labless already hosts our Nanopath challenge to build the best pathology foundation model that trains in 1 hour on a single GPU. We are hoping to expand the platform further and support a suite of open-source research challenges.
If this resonates with you, if you think the path to research innovation lies in open-source collaborative hill-climbing, please reach out.
And unfortunately lower prices are partly achieved because suppliers charge other retailers more to compensate for the steep discounts granted to Walmart. Thus independent and smaller shops, lacking walmart’s volume leverage, cannot get similar discounts and are forced to buy at higher prices from suppliers.
@Polymarket One of my biggest wonders is how people have 5 kids and live in SF Bay Area xhigh COL areas and earn <100k/year. My hunch is welfare and/or extreme budgeting. But maybe those who are claiming they’re living in poverty with 180k/year are living well beyond their means.
@sincethestudy@iScienceLuvr I’m bearish on the /goal tool in codex right now. I’ve given it a bunch of different problems, and it seems to have low success rate due to the low diversity of solutions it tries out. It might be great for hyperparameter tuning, but certainly not anything open-ended.
As more people start working with AI the bar for what's an acceptable work product (slides, spreadsheet, documents, etc) will increase. It will be even less acceptable to present poorly thought through results. People who present sub-AI quality will have a hard keeping their job. It just hasn't happened broadly yet because most managers still don't know how to use AI well.
Just like in chess the top people will get even better.
@mattshu04@boknilev They also tend to be quite jargon heavy, often including combinations of terms that were not explained previously and put together into sentences that look like sentences but make absolutely no sense. Good prompting goes a long way.
@mirandanover Everyone’s promoting in combo with loops. The goal is to write skills and then arm agents with these skills. One agent does research, another is an expert in a subfield, another is a coder, another is a verifier, and one manages. Orchestration is key to unlock new frontiers.
@gabriel1 Intelligence is also multidimensional. Someone in the humanities might not be a math or ML superstar, but they bring interdisciplinary thoughts, values, and ideas to hard problems IRL. Diversity of ideas is as important. Anthropic is a good example of this.
I was wrong about the Midjourney ultra-sound scanner.
Well, maybe not wrong, but at a minimum I missed something obvious because I was thinking like a doctor who's been practicing for 25 years.
And I didn't explain my point well.
First, where I was wrong:
All historical precendent that showed that widespread screening imaging is net neutral or harmful was imaging that was expensive, inconvenient, gated by physicians and couldn't practically be repeated frequently short term.
If the Midjourney ultrasound is high resolution, harmless, inexpensive and convenient, people can get an initial scan, then if there are abnormalities concerning for cancer, they can get weekly follow up scans to see if it's growing/changing, and if it's not, they can leave it alone.
In retrospect, that is obvious but it never occurred to me.
Now, you'd assume that that approach would have to lead to it being useful and saving lives, and it probably will. But we won't really know it does until we have a couple years of data. Lots of things that seem obvious in medicine end up being wrong once we collect data.
Second, what I didn't explain well:
It's not that I think non-doctors are 'too dumb' to use the results effectively.
Its that historically it was literally impossible to use the results effectively, and that is super, super counterintuitive. It seems obvious that finding stuff early is beneficial, but experience has shown that it isn't.
Here's why:
The vast majority of abnormalities (i.e. possible cancer) isn't cancer - like over 90% of them, ends up being harmless - something thay your body could have handled on it's own.
But the only way to find out was to have invasive, risky procedures to biopsy or remove what was found.
And overall, the side effects from all the risky, invasive procedures to track down the over 90% of stuff that was harmless equal or outweigh the benefit from removing the less than 10% of stuff that wasn't harmless.
If the MIdjourney device can be repeated frequently, like weekly, at a low cost and is harmless, it could negate the need for the risky, invasive procedures.
Not saying it will, but it seems like it could and I confidently posted yesterday that it was a bad idea.
I was wrong to confidently post that.
@paradite_ Aren’t novel ideas are interpolations of existing ideas? LLMs traverse the space of ideas, and the tricks like harnesses and agents are used to guide them.
You'll see a lot of doctors come out "against" this kind of broad screening system. They can even get quite agitated about it. This resistance stems from a well-established clinical consensus: traditional population-level imaging fails to improve health outcomes because false positives and invasive follow-ups do more harm than good. But this view suffers from an obvious blind spot. Existing studies rely on static data and completely ignore time-series imaging. And time-series is ignored because we haven't been able to afford to do high frequency imaging at population scale. Clearly, time series is going to be immensely more valuable than a single image. If you drop costs, value can go from 0 -> 1. On a more fundamental level, the argument against screening rests on an obviously false precept "More information is bad" -- just clearly untrue. More information better, you just have to interpret it correctly.