Thomas Telandro

4 months ago

Looks like `disable-model-invocation` works as expected. The real bug is that marketplace plugin skills don't show up in the / autocomplete picker at all — regardless of flags. Skills in ~/.claude/skills/ work fine, but plugin ones don't. https://t.co/7Gv4CF2PqM #Claude

Quanmatic広報マーケ。量子計算技術の産業応用を目指しています🚀 ex-メルカリTech PR、@uipathjapan 広報PR。週末は鉄オタ5歳児と電車の旅🚃 歌うこと、ピアノ、宇多田ヒカル。広島出身🎏多摩在住🍁tweets are my ownです。

4 months ago

Using `disable-model-invocation: true` in Skills for Claude Code prevents agents from finding the Skills and makes it only available to users. But at the same time, it removes the descriptions from the picker in Claude Code. That's a bit counterintuitive. #Claude

mukei's tweet photo. Using `disable-model-invocation: true` in Skills for Claude Code prevents agents from finding the Skills and makes it only available to users. But at the same time, it removes the descriptions from the picker in Claude Code. That's a bit counterintuitive. #Claude https://t.co/c5kH55zQ0j

Who to follow

ohito

@ohito_jp

Ken Natsume - UiPath プロダクトマーケティング部

@KenNatsume

UiPath プロダクトマーケティング部部長 (2022/8-) UiPath エバンジェリスト (2020/2-2022/8) UiPath パートナーソリューションマネージャー(2018/5-2020/1) 発言は個人の見解です。

shuy

@shuy_oooo

SIerに勤務するアーキテクト・プログラマ。今はUiPathを中心にしながら生成AIなど業務の効率化・自動化に関するソリューションを手広くやってます。UiPath MVP 2025 ここで発信する内容は私個人の意見であり、現在所属する会社の公式見解を示すものではありません。

5 months ago

@omarsar0 @karpathy @moltbook @openclaw Each time someone asks their OpenClaw something that could feed a new Moltbook post, it risks leaking their queries. This could backfire fast when agents start deciding—with other agents—what to do next using the user's login & credit card.

mukei retweeted

Thariq

@trq212

5 months ago

https://t.co/X2iu8WdIb8

204

472

mukei retweeted

elvis

@omarsar0

7 months ago

Banger paper for agent builders. Multi-agent systems often underdeliver. The problem isn't how the agents themselves are built. It's how they're organized. They are mostly built with fixed chains, trees, and graphs that can't adapt as tasks evolve. But what if the system could learn its own coordination patterns? This new research introduces Puppeteer, a framework that learns to orchestrate agents dynamically rather than relying on handcrafted topologies. Instead of pre-defining collaboration structures, an orchestrator selects which agent speaks next based on the evolving conversation state. The policy is trained with REINFORCE, optimizing directly for task success. Rather than searching over complex graph topologies, they serialize everything into sequential agent selections. This reframing sidesteps combinatorial complexity. What emerges is surprising: compact cyclic patterns develop naturally. Not sprawling graphs, but tight loops where 2-3 agents handle most of the work. The remarkable part is that the system discovers efficiency on its own. Results: - On GSM-Hard math problems: 70% accuracy (up from 13.5% for the base model alone). - On MMLU-Pro: 83% (vs 76% baseline). - On SRDD software development: 76.4% (vs 60.6% baseline). These gains come with reduced token consumption. The paper shows that token costs consistently decrease throughout training while performance improves. They also prove the agent selection process satisfies Markov properties, meaning the current state alone determines the optimal next agent. No need to track full history. Why it matters for AI devs: learned simplicity beats engineered complexity. A trained router with a handful of specialized agents can outperform elaborate handcrafted workflows while cutting computational overhead.

omarsar0's tweet photo. Banger paper for agent builders.

Multi-agent systems often underdeliver. The problem isn't how the agents themselves are built. It's how they're organized.

They are mostly built with fixed chains, trees, and graphs that can't adapt as tasks evolve.

But what if the system could learn its own coordination patterns?

This new research introduces Puppeteer, a framework that learns to orchestrate agents dynamically rather than relying on handcrafted topologies.

Instead of pre-defining collaboration structures, an orchestrator selects which agent speaks next based on the evolving conversation state. The policy is trained with REINFORCE, optimizing directly for task success.

Rather than searching over complex graph topologies, they serialize everything into sequential agent selections. This reframing sidesteps combinatorial complexity.

What emerges is surprising: compact cyclic patterns develop naturally. Not sprawling graphs, but tight loops where 2-3 agents handle most of the work.

The remarkable part is that the system discovers efficiency on its own.

Results:
- On GSM-Hard math problems: 70% accuracy (up from 13.5% for the base model alone).
- On MMLU-Pro: 83% (vs 76% baseline).
- On SRDD software development: 76.4% (vs 60.6% baseline).

These gains come with reduced token consumption. The paper shows that token costs consistently decrease throughout training while performance improves.

They also prove the agent selection process satisfies Markov properties, meaning the current state alone determines the optimal next agent. No need to track full history.

Why it matters for AI devs: learned simplicity beats engineered complexity. A trained router with a handful of specialized agents can outperform elaborate handcrafted workflows while cutting computational overhead.

839

136

931

55K

9 months ago

@googleaidevs That works well for @roocode code indexing

mukei retweeted

11 months ago

we're all sleeping on this OCR model 🔥 dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯 single e2e model to extract image, convert tables, formula, and more into markdown 📝

mervenoyann's tweet photo. we're all sleeping on this OCR model 🔥

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯

single e2e model to extract image, convert tables, formula, and more into markdown 📝 https://t.co/OFcyN9GVeg

367

292K

mukei retweeted

Cline

@cline

over 1 year ago

We built a Stock Market MCP server using Cline in just 8 minutes. It has these tools: 📊 Market Report Generator 💰 Financial Statement Analysis 📈 Real-time Stock Price Tracker 🔍 Company Symbol Search Here's our step-by-step process: 🧵

662

70K

mukei retweeted

Akshay 🚀

@akshay_pachaar

over 1 year ago

Model Context Protocol (MCP), clearly explained:

375

mukei retweeted

over 1 year ago

Gemma 3 can understand videos, and it's more powerful than you think it is ⏯️ I put together a short notebook on interleaving frames and doing video inference 📖 you're welcome 🤝

mervenoyann's tweet photo. Gemma 3 can understand videos, and it's more powerful than you think it is ⏯️

I put together a short notebook on interleaving frames and doing video inference 📖

you're welcome 🤝 https://t.co/FytcZ8Pok0

608

402

57K

mukei retweeted

Lance Martin

@RLanceMartin

over 1 year ago

R1 Deep Researcher Fully local research assistant w @deepseek_ai R1 + @ollama. Give R1 a topic and watch it search web, learn, reflect, search more, repeat as long as you want. Gives you a report w/ sources at end. All open source ..

620

624K

mukei retweeted

Philipp Schmid

@_philschmid

over 1 year ago

For those trying to understand @deepseek_ai Group Relative Policy Optimization (GRPO). Here, in simple steps: 1️⃣ Generate multiple outputs for each prompt using the current policy 2️⃣ Score these outputs using a reward model (rule or outcome) 3️⃣ Average the rewards and use it as a baseline to compute the advantages 4️⃣ Update the Policy to maximize the GRPO objective, which includes the advantages and a KL term

_philschmid's tweet photo. For those trying to understand @deepseek_ai Group Relative Policy Optimization (GRPO). Here, in simple steps:

1️⃣ Generate multiple outputs for each prompt using the current policy
2️⃣ Score these outputs using a reward model (rule or outcome)
3️⃣ Average the rewards and use it as a baseline to compute the advantages
4️⃣ Update the Policy to maximize the GRPO objective, which includes the advantages and a KL term

268

188

17K

mukei retweeted

over 1 year ago

Alibaba released Multimodal Textbook: a new multimodal pre-training set from online instructional videos (22k hours) 🧑🏻‍🏫📕 6,5M images interleaved witk 800k text on math, physics, chemistry 👏

mervenoyann's tweet photo. Alibaba released Multimodal Textbook: a new multimodal pre-training set from online instructional videos (22k hours) 🧑🏻‍🏫📕

6,5M images interleaved witk 800k text on math, physics, chemistry 👏 https://t.co/LSVShw9yVI

654

109

451

37K

mukei retweeted

over 1 year ago

NVIDIA solved physics and open-sourced it? Can we just build our own autonomous robots now? 🤯 They released Cosmos: new family of open world foundation models (WFMs) 🌌 Unwrapping the release and why it's so revolutionary 🧶

472

280

53K

mukei retweeted

ℏεsam

@Hesamation

over 1 year ago

Agents Google’s whitepaper covers the basics of llm agents and a quick Langchain implementation

536

556K

mukei retweeted

Philipp Schmid

@_philschmid

over 1 year ago

The RLHF method behind the best open models! Both @deepseek_ai and @Alibaba_Qwen use GRPO in post-training! Group Relative Policy Optimization. GRPO was introduced in the DeepSeekMath Paper last year to improve mathematical reasoning capabilities with less memory consumption, but is now used in an online way also to improve Truthfulness, Helpfulness, Conciseness… 👀 Implementation 1️⃣ Generate multiple outputs for each input question using the current Policy 2️⃣ Score these outputs using a reward model 3️⃣ Average the rewards and use it as a baseline to compute the advantages 4️⃣ Update the Policy to maximize the GRPO objective, which includes the advantages and a KL term Insights 💡 Doesn't need value function model, reducing memory and complexity 🔗 Adds KL term directly to the loss rather than in the reward 🧬 Works with rule-based Reward Models and Generative/Score based RM 👉 Looks similar to RLOO method 👀 DS 3 improved coding, math, writing, role-playing, and question answering 🤗 Soon in @huggingface TRL (PR open already)

_philschmid's tweet photo. The RLHF method behind the best open models! Both @deepseek_ai and @Alibaba_Qwen use GRPO in post-training! Group Relative Policy Optimization. GRPO was introduced in the DeepSeekMath Paper last year to improve mathematical reasoning capabilities with less memory consumption, but is now used in an online way also to improve Truthfulness, Helpfulness, Conciseness… 👀

Implementation
1️⃣ Generate multiple outputs for each input question using the current Policy
2️⃣ Score these outputs using a reward model
3️⃣ Average the rewards and use it as a baseline to compute the advantages
4️⃣ Update the Policy to maximize the GRPO objective, which includes the advantages and a KL term

Insights
💡 Doesn't need value function model, reducing memory and complexity
🔗 Adds KL term directly to the loss rather than in the reward
🧬 Works with rule-based Reward Models and Generative/Score based RM
👉 Looks similar to RLOO method
👀 DS 3 improved coding, math, writing, role-playing, and question answering
🤗 Soon in @huggingface TRL (PR open already)

970

196

802

62K