@peterrhague Among other things, @Neuralink will enable quadriplegics to use their hands and walk again and the completely blind to see.
Jesus-level miracles.
We trained Brain2Qwerty v2 on ~22,000 sentences from 9 volunteers, each recorded for 10 hours wearing an MEG device while typing.
By using end-to-end deep learning on raw brain signals from MEG devices and fine-tuning LLMs, the system effectively bridges the gap between noisy neural data and coherent language.
The results are promising:
- Avg word accuracy of 61% across participants
- 78% word accuracy and 50%+ of sentences decoded with ≤ 1 word error for the top-performing participant
- Performance scales log-linearly with data volume
Sol is a noticeable step forward in coding, and is a real step function improvement for design related tasks like slide decks.
Sol’s adept usage of subagents is also creating emergent approaches for tackling our toughest problems.
Excited to increasingly roll this out to everyone!
This is the beginning of serious autoresearch that can't be withdrawn with the push of a button by some responsible AI safety committee. No matter what happens next, we have early AI scientist assistants already. RSI can be local, if slow.
1/2 This is obviously not a comprehensive benchmark, but it’s clear that we finally have an open model that can be trusted and depended upon on difficult research tasks. You can easily run autoresearch yourself with GLM 5.2 by changing ‘arxiv’ to ‘autoarxiv’ for any arXiv URL: https://t.co/UNOSGwsFn0
This is the fastest way to get your first 50 customers:
>go to https://t.co/2469bPJlMJ
>type your domain
>it scrapes your website + builds your ideal customer profile
> releases a team of AI agents that start reaching out to potential customers for you
>doesn't stop until you hit your customer goal
We're excited to support BitRobot in open-sourcing the largest humanoid whole-body teleoperation dataset collected in real homes. We hope it accelerates progress toward general-purpose humanoid robots.😉
1/ Introducing HIW-500 (Humanoids-in-the-Wild 500):
the largest open-source humanoid teleop dataset collected in real homes
Built w/ @UnitreeRobotics@huggingface across 12 homes in Southeast Asia, it covers:
> 500+ hrs
> 23K+ episodes
> 10+ TB
> 10+ household tasks
I guess MCP won.
Jokes aside, this is super cool from OpenRouter.
Just making it easier for devs to run their long-running agents with the right level of intelligence. More of this, please.
Introducing InfiniteDiffusion, my independent paper accepted to #SIGGRAPH2026!
I have one RTX 3090 Ti. No funding, advisors, or team. By day I'm a new grad SWE at Walmart.
The paper has two main contributions:
- InfiniteDiffusion: a new approach to infinite generation with diffusion models.
- Terrain Diffusion: the world’s first learned procedural terrain generator.
Here’s why this matters, and how they are connected. 🧵
i'm obsessed with these MIT Maker Portfolios
this is the best one i've ever seen. this is real engineering.
how does he only have 3 followers?
Ethan built:
01. omni-directional wheels / swerve modules
02. a mechanical integrator that does calculus with moving parts
03. a polar plotter that can draw all sorts of wild patterns
04. a modded Ender 3 that he turned into a CoreXY claw machine
05. a cycloidal gearbox
06. a 5-axis 3D printer and slicer inspired by Joshua Bird's 4-axis design
now he's working on a compact, battery-powered, ultra-portable expanding 3D printer
a body of work like this usually takes a full team years
Ethan did it before college
this guy is an absolute machine
if anyone deserves more support, it’s him
A few thoughts on OpenAI's Jalapeño chip announcement today:
1. This chip is most likely the first one virtually entirely developed by Codex/GPT. Codex with whatever internal coding model (GPT 5.6/6.0 whatever) coded the entire software stack and most likely the hardware design
2. OpenAI will write all of their inference serving in pure Jalapeño ISA (instruction set architecture). Why? They only need to get say a few production models serving on Jalapeño. They can handwrite with Codex the entire model in pure ISA to get very high performance
3. They are most likely running Codex/GPT in custom RL envs to teach the models direct Jalapeño chip programming at ISA level
4. This is a massive cost savings for OpenAI and only possible IMO due to the breakthroughs in agentic coding. An AI company with frontier coding models can now become a hardware vendor with only a small team of experienced SWEs and an infinite amount of tokens
This is the first chip program fully accelerated by frontier AI.
Finally finished building my AI datacenter! 🚀
32x3090s across 4 servers (8 GPUs each), all connected over InfiniBand.
The whole setup is solar-powered with a massive battery bank and generator backup.
More technical details and benchmarks coming soon.
Why Would GLM-5.2 Move Away From GRPO?
🌟Insights from Zhihu contributor 九老师
TL;DR: GLM-5.2 dropping GRPO does not mean GRPO is “bad.” It means the assumptions that made GRPO attractive for short LLM RL tasks may no longer hold for long-horizon agentic tasks. When rollouts get longer, environments get noisier, and credit assignment gets harder, PPO + value modeling starts looking useful again.
The key question is not simply “why did GLM-5.2 stop using GRPO?” A better question is: why did GRPO become useful for LLM RL in the first place?
If the reasons that made GRPO attractive no longer hold, then going back to PPO becomes natural.
GRPO can be understood as a sampled-baseline method. Instead of training a separate value model, it samples multiple responses for the same prompt and uses the group average as a baseline.
That is elegant. You get a relative reward signal without paying for a separate critic. In short tasks, this is very appealing.
But there is a tradeoff.⚖️
PPO uses a learned value function, or critic. This critic is expensive and harder to tune. It also has its own problems: the policy keeps changing, so the value model is always trying to follow a moving target. That can introduce bias.
GRPO avoids that by using an up-to-date sampled baseline. It is closer to low-bias, but it tends to have higher variance.
For early LLM RL tasks, that tradeoff made sense:
• Rollouts were short
• Final rewards were clear
• Memory savings mattered a lot
• Multiple samples per prompt were manageable
• Math/code tasks were relatively easy to verify
That is why GRPO worked so well for many short, verifiable reasoning tasks.
But long-horizon agentic tasks change the game. 🎮
A long agent task can look much more like a game environment:
• Many steps
• Tool calls
• Partial progress
• Delayed failure
• Noisy observations
• Intermediate rewards
• Wrong action penalties
• Context compression
• Different paths to the same final answer
This is where GRPO starts to struggle.
The biggest issue is credit assignment. In GRPO, the final reward is applied broadly across the whole trajectory. If a task succeeds, many tokens get rewarded. If it fails, many tokens get punished.
But in a long task, that is too coarse.
Maybe the first half was bad, but the final recovery was good. Maybe one tool call at step 30 caused failure at step 100. Maybe two successful trajectories are not really comparable because one used 4K tokens and another used 200K tokens with heavy tool use and context compression.
GRPO sees the final outcome. It does not naturally know which step actually mattered.
That creates high variance.
In short tasks, group comparison works well. In long tasks, group sampling can collapse into two bad cases:
1. All samples fail
The whole expensive rollout gives almost no useful training signal.
2. Only one sample succeeds
That single success may be luck, but GRPO may treat it as a strong positive signal and over-reward the trajectory.
Both are dangerous for long agentic training.
This is where PPO’s critic becomes valuable again. A value model can learn expected value under noisy states. It can provide denser feedback before the full rollout ends. It is more expensive, but it helps with long-horizon credit assignment.
So the author’s view is: GRPO is not being rejected because it was wrong. It is being outgrown by the task format.
For short, deterministic, verifiable tasks, GRPO remains strong.
For long, noisy, tool-heavy agentic tasks, PPO-style value modeling may simply be the better fit.
The “compaction problem” mentioned around long contexts is likely more of a symptom. The deeper issue is that GRPO’s weaknesses become costly when trajectories are long and states keep changing.
Could GRPO still work? Yes, if paired with a strong Process Reward Model. The author points out that DeepSeek MathV2 uses this direction. Process-level signals can help fix GRPO’s sparse-reward weakness.
But without that, returning to PPO makes sense.
🎯The bigger takeaway:
GRPO saved the value model. PPO brings it back.
GRPO’s main advantage was efficiency. It removed the critic and saved resources. But for long-horizon agentic tasks, the critic’s ability to generalize and assign credit may be worth the cost again.
In the Agent era, RL for LLMs is becoming less like solving a short math problem and more like training an agent to play a long, noisy game.
And for that world, value models may still be the soul of RL.
🔗Full Reading (CN):
https://t.co/hf1GsDBc3e
While Mythos has popularized the idea of finding vulnerabilities with LLMs, Aisle was doing it earlier. From an engineering point of view, it's interesting to see that a small, open weight model with a structured search system is competitive at this task
Introducing Claude Tag, a new way for teams to work with Claude.
In Slack, Claude joins as a team member with access to the channels and tools you choose. Tag Claude in and delegate tasks to it while you focus on other work.