@ylecun@eladgil Born in Zurich 🇨🇭:
Vision Transformer (ViT) Google
SigLIP, BiT, MLP-Mixer (Google Brain)
Google Lens
Google Maps
pLSA (foundation of modern NLP)
Microsoft HoloLens vision/SLAM
Multimodal core of Gemini (DeepMind ZH)
- Drafted a blog post
- Used an LLM to meticulously improve the argument over 4 hours.
- Wow, feeling great, it’s so convincing!
- Fun idea let’s ask it to argue the opposite.
- LLM demolishes the entire argument and convinces me that the opposite is in fact true.
- lol
The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
ARC-AGI-3 is out now! We've designed the benchmark to evaluate agentic intelligence via interactive reasoning environments. Beating ARC-AGI-3 will be achieved when an AI system matches or exceeds human-level action efficiency on all environments, upon seeing them for the first time.
We've done extensive human testing that shows 100% of these environments are solvable by humans, upon first contact, with no prior training and no instructions.
Meanwhile, all frontier AI reasoning models do under 1% at this time.
@yoavgo Nobody wants this, especially if you can have Claude generate Lean typst code (or LaTeX Beamer, if you must) for presentations that you can check into git.
Thankfully the LiteLLM package has now been marked as "quarantined" on PyPI so attempting to install the compromised update via pip et al shouldn't work
The shift is about WHO is in control.
Most teams are still in prompt engineering mode, trying to get the LLM to "behave." The real shift is architectural: build systems that constrain what the LLM can do.
5/6
How do you handle context loss when your AI agent works with large specifications?
Tulla is my experimental open-source implementation of Semantic SDD.
It is research, beware. But I am open for feedback.
no warranties.
https://t.co/Gi0VONXW36
6/6
SDD: hand the construction crew a bunch of textfiles and hope they remember page 42.
Semantic SDD: feed them verified instructions for the specific brick they're holding.
5/6