The bitter lesson in 26 words:
Don’t be distracted by human knowledge, as AI has been historically.
Instead focus on methods for creating knowledge that scale with computation, like search and learning.
Introducing HRM-Text.
An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure.
Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models.
The kicker? The full model trains in roughly one day on a $1,000 budget.
This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game.
Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.
This works really well btw, at the end of your query ask your LLM to "structure your response as HTML", then view the generated file in your browser. I've also had some success asking the LLM to present its output as slideshows, etc.
More generally, imo audio is the human-preferred input to AIs but vision (images/animations/video) is the preferred output from them. Around a ~third of our brains are a massively parallel processor dedicated to vision, it is the 10-lane superhighway of information into brain. As AI improves, I think we'll see a progression that takes advantage:
1) raw text (hard/effortful to read)
2) markdown (bold, italic, headings, tables, a bit easier on the eyes) <-- current default
3) HTML (still procedural with underlying code, but a lot more flexibility on the graphics, layout, even interactivity) <-- early but forming new good default
...4,5,6,...
n) interactive neural videos/simulations
Imo the extrapolation (though the technology doesn't exist just yet) ends in some kind of interactive videos generated directly by a diffusion neural net. Many open questions as to how exact/procedural "Software 1.0" artifacts (e.g. interactive simulations) may be woven together with neural artifacts (diffusion grids), but generally something in the direction of the recently viral https://t.co/z21CP5iQfu
There are also improvements necessary and pending at the input. Audio nor text nor video alone are not enough, e.g. I feel a need to point/gesture to things on the screen, similar to all the things you would do with a person physically next to you and your computer screen.
TLDR The input/output mind meld between humans and AIs is ongoing and there is a lot of work to do and significant progress to be made, way before jumping all the way into neuralink-esque BCIs and all that. For what's worth exploring at the current stage, hot tip try ask for HTML.
Bill Burr NAILED it about the LA fires on Jimmy Kimmel last night:
“I think they did a great job, unlike the internet.”
“Mismanaged…like some person on the internet knows how to manage the worst fire in LA, sitting there in his underwear.”
Watch this.
Ring around the galaxy… Here’s Webb’s stunning new mid-infrared image of M104.
This bright core of the galaxy is dim in this view, revealing a smooth inner disk as well as details of how the clumpy gas in the outer ring is distributed. https://t.co/wQSE9xGTXX
Surge Cam 2 is in place in the main traffic circle on St. Armands Key, Florida at 11’ MSL. Video & Weather data now live streaming. Check the link below.
https://t.co/1hRB6QXCAi
#FLwx#Milton
Can @AnthropicAI Claude 3.5 sonnet outperform @OpenAI o1 in reasoning? Combining Dynamic Chain of Thoughts, reflection, and verbal reinforcement, existing LLMs like Claude 3.5 Sonnet can be prompted to increase test-time compute and match reasoning strong models like OpenAI o1. 👀
TL;DR:
🧠 Combines Dynamic Chain of thoughts + reflection + verbal reinforcement prompting
📊 Benchmarked against tough academic tests (JEE Advanced, UPSC, IMO, Putnam)
🏆 Claude 3.5 Sonnet outperformes GPT-4 and matched O1 models
🔍 LLMs can create internal simulations and take 50+ reasoning steps for complex problems
📚 Works for smaller, open models like Llama 3.1 8B +10% (Llama 3.1 8B 33/48 vs GPT-4o 36/48)
❌ Didn’t benchmark like MMLU, MMLU pro, or GPQA due to computing and budget constraints
📈 High token usage - Claude Sonnet 3.5 used around 1 million tokens for just 7 questions
Let’s see if we can crowdsource a robust definition of “agent” (with respect to AI and LLMs) that fits in a <=280 character tweet
Reply to this with your best attempt, then scroll through the replies and fave the ones that makes sense to you
OK, here is my best guess on the state of LLMs:
- The scale increase between gpt-3 and gpt-4 was 100x
- Doing that for the next model is going to be very hard
- We're nearly out of general language tokens. So let's say we can 2x that. And perhaps get more proprietary tokens and get to 3-4x. And do a lot of data cleaning and get to 6-7x.
- A 100x training run also requires a Gigawatt datacenter which we don't have yet
- Synthetic data is great, but it's not clear how that can be used for general language. I suspect this is why both OAI and Anthropic are focusing on math and code which can be improved via various "synthetic" compute methods (simulated data, or recursive self improvement of some sort)
- In the meantime, there is focus on getting more learnings from the same data. Perhaps there is a breakthrough there but I've not heard of it
- Planning can be pushed to inference in some domains (e.g. coding) which we're starting to hear about. But again, not clear how much this buys.
- Moronic policies like SB 1047 are threatening to slow all this down.
So tl;dr I don't see where the 100x jump will come from for general language reasoning. This is why we're seeing a focus on math and code. I'm glad teams are working hard at new algorithmic unlocks.
(btw, this is pure speculation, would love to know where I'm wrong!)
Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the generated diffs), and doing a bit of "half-coding" where you write the first chunk of the code you'd like, maybe comment it a bit so the LLM knows what the plan is, and then tab tab tab through completions. Sometimes you get a 100-line diff to your code that nails it, which could have taken 10+ minutes before.
I still don't think I got sufficiently used to all the features. It's a bit like learning to code all over again but I basically can't imagine going back to "unassisted" coding at this point, which was the only possibility just ~3 years ago.
"What is the ultimate quantification of success? For me, it’s not how much time you spend doing what you love. It’s how little time you spend doing what you hate." - @Casey.
Weekly wisdom from @Schwarzenegger's daily newsletter.