If you still think AI agents can't do real research, this paper will end that argument.
Researchers from Google and Meta built a framework where Claude Code proposes its own algorithms for making LLMs reason better, then tests them, then refines them based on what failed. No human in the loop after the environment is set up.
In 5 rounds the agent discovered a controller with 4 coordinated mechanisms working together. EMA momentum stopping. Coupled width-depth control. Alignment-aware depth allocation. Conservative branch abandonment.
The paper says directly: "a level of coordinated complexity that would be difficult to arrive at through manual intuition alone."
That's a polite way of saying the agent built something a human probably wouldn't have.
The cost of the entire discovery was $39.90.
The cost of one researcher's coffee budget just outperformed years of hand-tuned work.
Paper is from Google and Meta.
Read it here: https://t.co/UtbbwO7ITR
@Figure_robot That's it for this week's AI and Robotics breakdown.
I share the latest research every week, so follow me @adcock_brett for more.
If you found this valuable, consider a like/retweet to spread the word. https://t.co/ce3979Ms5e
@karpathy To avoid losing your valuable open tabs you can use plugins like One tab, it saves all your links in a handy way and you can export and import them in any browser.
@ThrillaRilla369 He is a free man, he had his trial and was sentenced. He has all rights to be back to society and do whatever he wishes without having to explain himself to people like you.
He made a mistake and paid for it. He was 19, she was 12.
We’re excited to release the weights of our Time Series Foundation Model (TimesFM) on Hugging Face!
To access, visit our HuggingFace (https://t.co/tBHXp0mB4C) & GitHub (https://t.co/gC6s1hQAbA) repositories.
Learn more ↓
#TimesFM#TimeSeries#Forecasting#FoundationModels
Delighted to release ✨Llama-3-8B-Web✨, the most capable agent built for web navigation by following instructions and replying💬. It surpasses GPT-4V* by 18% on WebLINX, a benchmark for web navigation with dialogue.
Model: https://t.co/NHBYLyH1eF
Code: https://t.co/cF3nxkDwez
We just released Mixtral-8x22B-v0.1 and Mixtral-8x22B-Instruct-v0.1:
- Free to use under Apache 2.0 license
- Outperforms all open models
- Native function calling
- Masters English, French, Italian, German and Spanish.
- Seq_len = 64K
https://t.co/SCG8s06Dbl
@ice_blockchain@ice_blockchain
Why haven't I received my coins?
- I have passed the quiz
- retweeted your post 2x
- was not slashed
- added a Binance Smart Chain address
- completed all the requirements
No ice-coins received.
Has anybody received the ice coins?
I mined 1767 coins.
Huge day indeed for AI and LLMs, congrats to Meta 👏
This is now the most capable LLM available directly as weights to anyone from researchers to companies.
The models look quite strong, e.g. Table 4 in the paper: MMLU is good to look at, the 70B model is just below GPT-3.5. But HumanEval (bad misnomer) shows coding capability is quite a bit lower (48.1 vs 29.9).
Code Interpreter Beta (rolling out to ChatGPT Plus) is quite powerful. It's your personal data analyst: can read uploaded files, execute code, generate diagrams, statistical analysis, much more. I expect it will take the community some time to fully chart its potential.
To turn on:
In ChatGPT on bottom left click on name > Settings > Beta features > turn on Code Interpreter.
Very nice & inspiring, "no-gradient architecture" for high-level skills/learning. LLM here is the "prefrontal cortex" orchestrating the lower-level mineflayer API via code generation++.
Meta-comment is that I remember how hopeless it felt to work on agents in environments like Minecraft around ~2016, feeling stuck on how RL at the time would ever randomly explore their way into performing long-horizon tasks from super sparse rewards. This block has now to a very large extent been lifted - the correct thing was to forget all that, first train LLMs that learn (1) world knowledge, (2) reasoning and (3) tool-use (esp writing code) all from internet text, then point them back at the problem in this kind of a way. TLDR If I had read about this "no-gradient" approach to agents in 2016 my mind would certainly be blown.
Also haha @ source code in the voyager/prompts/*.txt directory :D
I'm excited to introduce you Camoscio: an Italian instruction-tuned LLaMA, following Stanford Alpaca.
The model should provide output of similar quality to GPT text-davinci-003 and has been finetuned by translating the Alpaca dataset to Italian.
https://t.co/2YxggqmW6w
1/3
AI advancement continues.
The EITCA/AI Academy is an online programme attesting AI skills under the European IT Certification framework.
It consists of 12 EITC Certificates including Python, TensorFlow, Cloud AI and Deep Neural Networks.
Learn more at: https://t.co/p21yHEBQLa
@johnjnay@TheEconomist@stateofai I once interviewed with OpenAI and explicitly asked them if I would be allowed to publish. I was told that they publish very selectively and most knowledge is proprietary. I didn't like the sound of it... they have an Open in their name... :)
📣 #KickStart Round C is in less than 24 hours!
You still have time to:
✅ register → https://t.co/Ri3Z2nsJ5x
✅ practice with past problems
✅ charge your laptop, get some snacks, and take a nap
We'll see you on the scoreboard → https://t.co/QKJ1sQJNCL
This is a 393-years old Greenland Shark that was located in the Arctic Ocean. It's been wandering the ocean since 1627. It is the oldest living vertebrate known on the planet. Photo by Julius Nielsen
Obviously, we can process audio information at higher speeds 👇, but in normal spoken conversation, the paper shows that the amount of information transmitted per second falls into a narrow range across many languages with widely different origins. https://t.co/VMRifDwidh