😎

@Som3thingNNN

ai is cool i guess follow me , i want ai friends :)

Joined January 2018

2.4K Following

104 Followers

2.5K Posts

Som3thingNNN retweeted

Elon Musk

@elonmusk

2 days ago

@peterrhague Among other things, @Neuralink will enable quadriplegics to use their hands and walk again and the completely blind to see. Jesus-level miracles.

26K

918

801K

Som3thingNNN retweeted

AI at Meta

@AIatMeta

3 days ago

We trained Brain2Qwerty v2 on ~22,000 sentences from 9 volunteers, each recorded for 10 hours wearing an MEG device while typing. By using end-to-end deep learning on raw brain signals from MEG devices and fine-tuning LLMs, the system effectively bridges the gap between noisy neural data and coherent language. The results are promising: - Avg word accuracy of 61% across participants - 78% word accuracy and 50%+ of sentences decoded with ≤ 1 word error for the top-performing participant - Performance scales log-linearly with data volume

AIatMeta's tweet photo. We trained Brain2Qwerty v2 on ~22,000 sentences from 9 volunteers, each recorded for 10 hours wearing an MEG device while typing.

By using end-to-end deep learning on raw brain signals from MEG devices and fine-tuning LLMs, the system effectively bridges the gap between noisy neural data and coherent language.

The results are promising:
- Avg word accuracy of 61% across participants
- 78% word accuracy and 50%+ of sentences decoded with ≤ 1 word error for the top-performing participant
- Performance scales log-linearly with data volume

190

226K

Som3thingNNN retweeted

Derek Feriancek

@DerekFeriancek

6 days ago

Sol is a noticeable step forward in coding, and is a real step function improvement for design related tasks like slide decks. Sol’s adept usage of subagents is also creating emergent approaches for tackling our toughest problems. Excited to increasingly roll this out to everyone!

535

123

130K

😎 @Som3thingNNN

6 days ago

@daniel_mac8 @sama china will catch up in a few months regardless. no biggie :)

Who to follow

Precision Beats Power - Timing Beats Speed 💫 MMA FANATIC 👽 NEVER MISS A FIGHT 🤟💥

Som3thingNNN retweeted

Sam Altman

@sama

6 days ago

oh and also...750 token/sec coming to 5.6 sol in july!

203

133

201

Som3thingNNN retweeted

kache

@yacineMTB

7 days ago

@teortaxesTex first autoresearch task.. make yourself faster at researching : )

871

Som3thingNNN retweeted

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

7 days ago

This is the beginning of serious autoresearch that can't be withdrawn with the push of a button by some responsible AI safety committee. No matter what happens next, we have early AI scientist assistants already. RSI can be local, if slow.

402

189

30K

Som3thingNNN retweeted

alphaXiv

@askalphaxiv

7 days ago

1/2 This is obviously not a comprehensive benchmark, but it’s clear that we finally have an open model that can be trusted and depended upon on difficult research tasks. You can easily run autoresearch yourself with GLM 5.2 by changing ‘arxiv’ to ‘autoarxiv’ for any arXiv URL: https://t.co/UNOSGwsFn0

126

11K

Som3thingNNN retweeted

Origami

@origamichat

8 days ago

This is the fastest way to get your first 50 customers: >go to https://t.co/2469bPJlMJ >type your domain >it scrapes your website + builds your ideal customer profile > releases a team of AI agents that start reaching out to potential customers for you >doesn't stop until you hit your customer goal

862K

Som3thingNNN retweeted

Joon Sung Park @joon_s_pk

8 days ago

Back to the roots :) Come build with us!

168

27K

Som3thingNNN retweeted

Unitree

@UnitreeRobotics

8 days ago

We're excited to support BitRobot in open-sourcing the largest humanoid whole-body teleoperation dataset collected in real homes. We hope it accelerates progress toward general-purpose humanoid robots.😉

500

156

64K

Som3thingNNN retweeted

BitRobot 🦾

@BitRobotNetwork

8 days ago

1/ Introducing HIW-500 (Humanoids-in-the-Wild 500): the largest open-source humanoid teleop dataset collected in real homes Built w/ @UnitreeRobotics @huggingface across 12 homes in Southeast Asia, it covers: > 500+ hrs > 23K+ episodes > 10+ TB > 10+ household tasks

375

187

386K

Som3thingNNN retweeted

elvis

@omarsar0

7 days ago

I guess MCP won. Jokes aside, this is super cool from OpenRouter. Just making it easier for devs to run their long-running agents with the right level of intelligence. More of this, please.

275

270

59K

Som3thingNNN retweeted

Alexander Goslin

@xandurglar

7 days ago

Introducing InfiniteDiffusion, my independent paper accepted to #SIGGRAPH2026! I have one RTX 3090 Ti. No funding, advisors, or team. By day I'm a new grad SWE at Walmart. The paper has two main contributions: - InfiniteDiffusion: a new approach to infinite generation with diffusion models. - Terrain Diffusion: the world’s first learned procedural terrain generator. Here’s why this matters, and how they are connected. 🧵

153

625

914K

Som3thingNNN retweeted

ＤＡＭＩＡＮ 🤖💡🏠

@HyperLogistix

7 days ago

i'm obsessed with these MIT Maker Portfolios this is the best one i've ever seen. this is real engineering. how does he only have 3 followers? Ethan built: 01. omni-directional wheels / swerve modules 02. a mechanical integrator that does calculus with moving parts 03. a polar plotter that can draw all sorts of wild patterns 04. a modded Ender 3 that he turned into a CoreXY claw machine 05. a cycloidal gearbox 06. a 5-axis 3D printer and slicer inspired by Joshua Bird's 4-axis design now he's working on a compact, battery-powered, ultra-portable expanding 3D printer a body of work like this usually takes a full team years Ethan did it before college this guy is an absolute machine if anyone deserves more support, it’s him

164

12K

Som3thingNNN retweeted

Patrick C Toulme

@PatrickToulme

8 days ago

A few thoughts on OpenAI's Jalapeño chip announcement today: 1. This chip is most likely the first one virtually entirely developed by Codex/GPT. Codex with whatever internal coding model (GPT 5.6/6.0 whatever) coded the entire software stack and most likely the hardware design 2. OpenAI will write all of their inference serving in pure Jalapeño ISA (instruction set architecture). Why? They only need to get say a few production models serving on Jalapeño. They can handwrite with Codex the entire model in pure ISA to get very high performance 3. They are most likely running Codex/GPT in custom RL envs to teach the models direct Jalapeño chip programming at ISA level 4. This is a massive cost savings for OpenAI and only possible IMO due to the breakthroughs in agentic coding. An AI company with frontier coding models can now become a hardware vendor with only a small team of experienced SWEs and an infinite amount of tokens This is the first chip program fully accelerated by frontier AI.

133

199

845

403K

Som3thingNNN retweeted

Max Zanoga

@zanoga

9 days ago

Finally finished building my AI datacenter! 🚀 32x3090s across 4 servers (8 GPUs each), all connected over InfiniBand. The whole setup is solar-powered with a massive battery bank and generator backup. More technical details and benchmarks coming soon.

zanoga's tweet photo. Finally finished building my AI datacenter! 🚀

32x3090s across 4 servers (8 GPUs each), all connected over InfiniBand.

The whole setup is solar-powered with a massive battery bank and generator backup.

More technical details and benchmarks coming soon. https://t.co/8GfedrSzNp

589

409

799K

Som3thingNNN retweeted

Zhihu Frontier

@ZhihuFrontier

9 days ago

Why Would GLM-5.2 Move Away From GRPO? 🌟Insights from Zhihu contributor 九老师 TL;DR: GLM-5.2 dropping GRPO does not mean GRPO is “bad.” It means the assumptions that made GRPO attractive for short LLM RL tasks may no longer hold for long-horizon agentic tasks. When rollouts get longer, environments get noisier, and credit assignment gets harder, PPO + value modeling starts looking useful again. The key question is not simply “why did GLM-5.2 stop using GRPO?” A better question is: why did GRPO become useful for LLM RL in the first place? If the reasons that made GRPO attractive no longer hold, then going back to PPO becomes natural. GRPO can be understood as a sampled-baseline method. Instead of training a separate value model, it samples multiple responses for the same prompt and uses the group average as a baseline. That is elegant. You get a relative reward signal without paying for a separate critic. In short tasks, this is very appealing. But there is a tradeoff.⚖️ PPO uses a learned value function, or critic. This critic is expensive and harder to tune. It also has its own problems: the policy keeps changing, so the value model is always trying to follow a moving target. That can introduce bias. GRPO avoids that by using an up-to-date sampled baseline. It is closer to low-bias, but it tends to have higher variance. For early LLM RL tasks, that tradeoff made sense: • Rollouts were short • Final rewards were clear • Memory savings mattered a lot • Multiple samples per prompt were manageable • Math/code tasks were relatively easy to verify That is why GRPO worked so well for many short, verifiable reasoning tasks. But long-horizon agentic tasks change the game. 🎮 A long agent task can look much more like a game environment: • Many steps • Tool calls • Partial progress • Delayed failure • Noisy observations • Intermediate rewards • Wrong action penalties • Context compression • Different paths to the same final answer This is where GRPO starts to struggle. The biggest issue is credit assignment. In GRPO, the final reward is applied broadly across the whole trajectory. If a task succeeds, many tokens get rewarded. If it fails, many tokens get punished. But in a long task, that is too coarse. Maybe the first half was bad, but the final recovery was good. Maybe one tool call at step 30 caused failure at step 100. Maybe two successful trajectories are not really comparable because one used 4K tokens and another used 200K tokens with heavy tool use and context compression. GRPO sees the final outcome. It does not naturally know which step actually mattered. That creates high variance. In short tasks, group comparison works well. In long tasks, group sampling can collapse into two bad cases: 1. All samples fail The whole expensive rollout gives almost no useful training signal. 2. Only one sample succeeds That single success may be luck, but GRPO may treat it as a strong positive signal and over-reward the trajectory. Both are dangerous for long agentic training. This is where PPO’s critic becomes valuable again. A value model can learn expected value under noisy states. It can provide denser feedback before the full rollout ends. It is more expensive, but it helps with long-horizon credit assignment. So the author’s view is: GRPO is not being rejected because it was wrong. It is being outgrown by the task format. For short, deterministic, verifiable tasks, GRPO remains strong. For long, noisy, tool-heavy agentic tasks, PPO-style value modeling may simply be the better fit. The “compaction problem” mentioned around long contexts is likely more of a symptom. The deeper issue is that GRPO’s weaknesses become costly when trajectories are long and states keep changing. Could GRPO still work? Yes, if paired with a strong Process Reward Model. The author points out that DeepSeek MathV2 uses this direction. Process-level signals can help fix GRPO’s sparse-reward weakness. But without that, returning to PPO makes sense. 🎯The bigger takeaway: GRPO saved the value model. PPO brings it back. GRPO’s main advantage was efficiency. It removed the critic and saved resources. But for long-horizon agentic tasks, the critic’s ability to generalize and assign credit may be worth the cost again. In the Agent era, RL for LLMs is becoming less like solving a short math problem and more like training an agent to play a long, noisy game. And for that world, value models may still be the soul of RL. 🔗Full Reading (CN): https://t.co/hf1GsDBc3e

ZhihuFrontier's tweet photo. Why Would GLM-5.2 Move Away From GRPO?
🌟Insights from Zhihu contributor 九老师

TL;DR: GLM-5.2 dropping GRPO does not mean GRPO is “bad.” It means the assumptions that made GRPO attractive for short LLM RL tasks may no longer hold for long-horizon agentic tasks. When rollouts get longer, environments get noisier, and credit assignment gets harder, PPO + value modeling starts looking useful again.

The key question is not simply “why did GLM-5.2 stop using GRPO?” A better question is: why did GRPO become useful for LLM RL in the first place?
If the reasons that made GRPO attractive no longer hold, then going back to PPO becomes natural.
GRPO can be understood as a sampled-baseline method. Instead of training a separate value model, it samples multiple responses for the same prompt and uses the group average as a baseline.
That is elegant. You get a relative reward signal without paying for a separate critic. In short tasks, this is very appealing.

But there is a tradeoff.⚖️
PPO uses a learned value function, or critic. This critic is expensive and harder to tune. It also has its own problems: the policy keeps changing, so the value model is always trying to follow a moving target. That can introduce bias.
GRPO avoids that by using an up-to-date sampled baseline. It is closer to low-bias, but it tends to have higher variance.
For early LLM RL tasks, that tradeoff made sense:
• Rollouts were short
• Final rewards were clear
• Memory savings mattered a lot
• Multiple samples per prompt were manageable
• Math/code tasks were relatively easy to verify
That is why GRPO worked so well for many short, verifiable reasoning tasks.

But long-horizon agentic tasks change the game. 🎮
A long agent task can look much more like a game environment:
• Many steps
• Tool calls
• Partial progress
• Delayed failure
• Noisy observations
• Intermediate rewards
• Wrong action penalties
• Context compression
• Different paths to the same final answer
This is where GRPO starts to struggle.

The biggest issue is credit assignment. In GRPO, the final reward is applied broadly across the whole trajectory. If a task succeeds, many tokens get rewarded. If it fails, many tokens get punished.
But in a long task, that is too coarse.

Maybe the first half was bad, but the final recovery was good. Maybe one tool call at step 30 caused failure at step 100. Maybe two successful trajectories are not really comparable because one used 4K tokens and another used 200K tokens with heavy tool use and context compression.

GRPO sees the final outcome. It does not naturally know which step actually mattered.
That creates high variance.
In short tasks, group comparison works well. In long tasks, group sampling can collapse into two bad cases:
1. All samples fail
The whole expensive rollout gives almost no useful training signal.
2. Only one sample succeeds
That single success may be luck, but GRPO may treat it as a strong positive signal and over-reward the trajectory.
Both are dangerous for long agentic training.
This is where PPO’s critic becomes valuable again. A value model can learn expected value under noisy states. It can provide denser feedback before the full rollout ends. It is more expensive, but it helps with long-horizon credit assignment.
So the author’s view is: GRPO is not being rejected because it was wrong. It is being outgrown by the task format.

For short, deterministic, verifiable tasks, GRPO remains strong.
For long, noisy, tool-heavy agentic tasks, PPO-style value modeling may simply be the better fit.

The “compaction problem” mentioned around long contexts is likely more of a symptom. The deeper issue is that GRPO’s weaknesses become costly when trajectories are long and states keep changing.
Could GRPO still work? Yes, if paired with a strong Process Reward Model. The author points out that DeepSeek MathV2 uses this direction. Process-level signals can help fix GRPO’s sparse-reward weakness.

But without that, returning to PPO makes sense.
🎯The bigger takeaway:
GRPO saved the value model. PPO brings it back.
GRPO’s main advantage was efficiency. It removed the critic and saved resources. But for long-horizon agentic tasks, the critic’s ability to generalize and assign credit may be worth the cost again.
In the Agent era, RL for LLMs is becoming less like solving a short math problem and more like training an agent to play a long, noisy game.
And for that world, value models may still be the soul of RL.

🔗Full Reading (CN):
https://t.co/hf1GsDBc3e

800

105

265K

Som3thingNNN retweeted

Ian Goodfellow

@goodfellow_ian

9 days ago

While Mythos has popularized the idea of finding vulnerabilities with LLMs, Aisle was doing it earlier. From an engineering point of view, it's interesting to see that a small, open weight model with a structured search system is competitive at this task

301

170

56K

Som3thingNNN retweeted

Claude

@claudeai

9 days ago

Introducing Claude Tag, a new way for teams to work with Claude. In Slack, Claude joins as a team member with access to the channels and tools you choose. Tag Claude in and delegate tasks to it while you focus on other work.

28K

13K

20M

😎

@Som3thingNNN

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users