Excited to share our work on disentangled/abstract representations, to appear at #ICLR2025 (@iclr_conf)!
We mathematically prove and experimentally demonstrate that multi-task learning leads to disentangled representations, and propose a unifying mechanism for generalization in brains and machines: parallel processing (🧵+paper below)
Our work connects to the Platonic representation hypothesis, suggests why alignment across models/organisms can occur, and shows why transformers excel at constructing world models 🤖🚀
@Im_IrushiK LLMs can’t see what letters comprise a token, so they cannot possible answer this question. Nothing new here, it’s been said a million times
Today we launched @CallosumAI.
We are building the infrastructure where heterogeneous chips & intelligence co-evolve to solve the world's hardest problems.
Today we present our first results.
Across four large problem spaces, we break SOTA and deliver orders-of-magnitude improvements in capabilities, cost and speed: 12× cheaper deep context. New web SOTA with open-source, 3x cheaper and faster. 2.4× cache speedups. 1,767× faster tool calling. This is the worst our infrastructure will ever be.
We do it by co-evolving heterogeneous chips and multi-agent intelligence - workflows aware of their hardware, models aware of their task graph, kernels aware of their output constraints. An Intelligent System.
https://t.co/t0KiP6q3eJ
Two of the most cracked ppl I’ve ever known building something that can truly make LLMs personalized. Congrats @ABhargava2000 and @witkowski_cam, excited to see where this goes!
Announcing Bread Technologies. We’re building machines that learn like humans.
We raised a $5 million seed round led by Menlo Ventures and have been building in stealth for 10 months.
Today, we rise 🍞
@YinChaoqun@KordingLab Agreed. But this blogpost shows that the criterion used to detect “line attractors” is extremely lax: even random dynamical systems pass it 50% of the time, and get classified as “approximate line attractors”, which is clearly wrong.
@rudzinskimaciej Yes, I could certainly see inhibition helping with decorrelation. Thanks for pointing it out. Not sure about the frequencies part though. Any references?
Yes great point, I forgot to reference the 1000 brains theory here but we do in the paper. One main difference is that they require dense signals to map the world, while we show that it can be done with sparse signals
With regards to differentiation of cortical columns, it may simply come about due to the random initial projections (similar to heads in transformers or filters in CNNs)
If you're at #ICLR2025, and interested in how we can guarantee true out-of-distribution generalization in neural networks (extrapolation), Aman Bhargava (@ABhargava2000) and I will be presenting our work tomorrow Saturday the 26th at 3:00-5:30pm, at Hall 3 (poster number #69)
We will be happy to see you there!
short presentation + slides: https://t.co/1ClMrZSTTC
Finally, huge thanks to amazing collaborator Aman Bhargava (@ABhargava2000) for recognizing the mathematical potential of this project and doing the theory part, and advisor Antonio Rangel! This project a prime example of the amplifying effect of great collaborations. Looking forward to more!
Link to top:
Excited to share our work on disentangled/abstract representations, to appear at #ICLR2025 (@iclr_conf)!
We mathematically prove and experimentally demonstrate that multi-task learning leads to disentangled representations, and propose a unifying mechanism for generalization in brains and machines: parallel processing (🧵+paper below)
Our work connects to the Platonic representation hypothesis, suggests why alignment across models/organisms can occur, and shows why transformers excel at constructing world models 🤖🚀
Excited to share our work on disentangled/abstract representations, to appear at #ICLR2025 (@iclr_conf)!
We mathematically prove and experimentally demonstrate that multi-task learning leads to disentangled representations, and propose a unifying mechanism for generalization in brains and machines: parallel processing (🧵+paper below)
Our work connects to the Platonic representation hypothesis, suggests why alignment across models/organisms can occur, and shows why transformers excel at constructing world models 🤖🚀
Thanks for reading this far! For an in depth view of the above, I include the paper below (it’s 40 pages long!). Tldr: it worked no matter what we threw at it!
And if you happen to be in Singapore for #ICLR2025, we will be presenting at poster session 6 on Saturday the 26th, 3:30-5 pm (Hall 3 + Hall 2B #69). We will be happy to see you there!
https://t.co/JndBB4zlyM