Grey Joo

@giljae

내일을 사랑하오!

Seoul

Joined November 2008

57 Following

1.3K Followers

2.8K Posts

Grey Joo @giljae

6 days ago

강추!

Joruno

@wsl8297

7 days ago

在终端里同时跑多个 AI Agent，最头疼的从来不是启动，而是状态一丢就得重来、窗口越开越重、上下文来回切得心累。 https://t.co/PrE1QUopsA Herdr 是个轻量级的终端 Agent 编排工具，专为习惯命令行、又不想被 Superset / Conductor 这类重 GUI 绑架的人设计。它解决问题的方式很直接： - 原生跑在现有 terminal 里，配置沿用不折腾 - 类 tmux 的持久化会话，断了也能续上 - 可感知 Agent 状态，多 Agent 一眼可控 - 支持多 Agent 并行，跑得更顺 - Rust 二进制，轻、快、占用低 - 响应式 TUI，手机上看也不别扭当你的工作流变成“一个人 + 一群编码/研究 Agent”，这种轻量的编排层会从锦上添花，变成刚需。

wsl8297's tweet photo. 在终端里同时跑多个 AI Agent，最头疼的从来不是启动，而是状态一丢就得重来、窗口越开越重、上下文来回切得心累。

https://t.co/PrE1QUopsA

Herdr 是个轻量级的终端 Agent 编排工具，专为习惯命令行、又不想被 Superset / Conductor 这类重 GUI 绑架的人设计。

它解决问题的方式很直接：
- 原生跑在现有 terminal 里，配置沿用不折腾
- 类 tmux 的持久化会话，断了也能续上
- 可感知 Agent 状态，多 Agent 一眼可控
- 支持多 Agent 并行，跑得更顺
- Rust 二进制，轻、快、占用低
- 响应式 TUI，手机上看也不别扭

当你的工作流变成“一个人 + 一群编码/研究 Agent”，这种轻量的编排层会从锦上添花，变成刚需。

giljae retweeted

Muhammad Ayan

@socialwithaayan

8 days ago

The smartest people on the internet just open-sourced their brain. 11 GitHub repos worth bookmarking: - PilotDeck — OpenBMB's open-source AI agent framework. Build and deploy autonomous agents in minutes. https://t.co/ozmSncagqb - andrej-karpathy-skills — Karpathy's AI coding wisdom in a single markdown file. 109K+ stars. https://t.co/tOr4XGZnDy - MemPalace — Milla Jovovich co-built this AI memory system with Claude Code. Near-perfect LongMemEval score. https://t.co/zjSwfv3PeV - OpenClaw — Peter Steinberger's personal AI assistant. 300K+ stars. Fastest growing repo in GitHub history. https://t.co/vgWKVDhXyZ - autoresearch — Karpathy's research automation framework. 23K stars in three days. https://t.co/fVnXmLjpcH - awesome-claude-code — The canonical Claude Code playbook. Used inside FAANG, OpenAI, and Anthropic. https://t.co/ylSdRRATgg - agent-skills — Addy Osmani's production-grade engineering skills for AI coding agents. 30K+ stars. https://t.co/ClswBl8zCO - AI-Agents-for-Beginners — Microsoft's free 12-lesson course on building AI agents. https://t.co/DhS6mUJuDk - awesome-llm-apps — 106K+ stars. The largest collection of working AI apps on GitHub. https://t.co/ilZKbFPxp7 - hermes-agent — Self-evolving AI agent. Gets smarter the more you use it. https://t.co/06jfIpEy6W - qlib — Microsoft's full quant investment platform. A hedge fund brain, free to clone. https://t.co/sBbYjvXzkx Save this post! Follow me for more ♻️ Repost so others don't miss it.

socialwithaayan's tweet photo. The smartest people on the internet just open-sourced their brain.

11 GitHub repos worth bookmarking:

- PilotDeck — OpenBMB's open-source AI agent framework. Build and deploy autonomous agents in minutes.
https://t.co/ozmSncagqb

- andrej-karpathy-skills — Karpathy's AI coding wisdom in a single markdown file. 109K+ stars.
https://t.co/tOr4XGZnDy

- MemPalace — Milla Jovovich co-built this AI memory system with Claude Code. Near-perfect LongMemEval score.
https://t.co/zjSwfv3PeV

- OpenClaw — Peter Steinberger's personal AI assistant. 300K+ stars. Fastest growing repo in GitHub history.
https://t.co/vgWKVDhXyZ

- autoresearch — Karpathy's research automation framework. 23K stars in three days.
https://t.co/fVnXmLjpcH

- awesome-claude-code — The canonical Claude Code playbook. Used inside FAANG, OpenAI, and Anthropic.
https://t.co/ylSdRRATgg

- agent-skills — Addy Osmani's production-grade engineering skills for AI coding agents. 30K+ stars.
https://t.co/ClswBl8zCO

- AI-Agents-for-Beginners — Microsoft's free 12-lesson course on building AI agents.
https://t.co/DhS6mUJuDk

- awesome-llm-apps — 106K+ stars. The largest collection of working AI apps on GitHub.
https://t.co/ilZKbFPxp7

- hermes-agent — Self-evolving AI agent. Gets smarter the more you use it.
https://t.co/06jfIpEy6W

- qlib — Microsoft's full quant investment platform. A hedge fund brain, free to clone.
https://t.co/sBbYjvXzkx

Save this post!

Follow me for more ♻️ Repost so others don't miss it.

979

194

150K

Grey Joo @giljae

7 days ago

빵터짐;;

Grey Joo @giljae

9 days ago

I'm in love with Pi. :)

LinearUncle

@LinearUncle

10 days ago

pi版本的动态工作流来啦，对标opus 4.8新出的动态工作流，你可以用任意的大模型（例如deepseek, codex-5.5等）这位大神就是之前破解claude code generative UI(动态交互式UI)的作者，我到现在还在用他的插件。而且根据我的经验，请大家放心，估计codex版本的动态工作流将来也会上线。我仍然继续all in codex app. @thsottiaux

LinearUncle's tweet photo. pi版本的动态工作流来啦，对标opus 4.8新出的动态工作流，你可以用任意的大模型（例如deepseek, codex-5.5等）

这位大神就是之前破解claude code generative UI(动态交互式UI)的作者，我到现在还在用他的插件。

而且根据我的经验，请大家放心，估计codex版本的动态工作流将来也会上线。我仍然继续all in codex app.

@thsottiaux

306

442

100K

giljae retweeted

Kshitij Mishra | AI & Tech

@DAIEvolutionHub

10 days ago

Trains billion-parameter LLMs from scratch on a single GPU Most people think training an LLM needs a datacenter and millions of dollars. This repo proves otherwise. It shows how to build and train GPT-style models from scratch with techniques that make large-scale training possible on consumer hardware. From tokenization to distributed tricks — everything is open-source. https://t.co/QgyPOYUxge

DAIEvolutionHub's tweet photo. Trains billion-parameter LLMs from scratch on a single GPU

Most people think training an LLM needs a datacenter and millions of dollars.

This repo proves otherwise.

It shows how to build and train GPT-style models from scratch with techniques that make large-scale training possible on consumer hardware.

From tokenization to distributed tricks — everything is open-source.

https://t.co/QgyPOYUxge

782

118

865

23K

giljae retweeted

고영혁 (Dylan Ko)

@Gonnector

10 days ago

Anthropic 이 Claude Opus4.8 을 발표하면서 Harness 본체인 Claude Code 쪽에는 신기능인 Dynamic Workflow 를 발표했습니다. 진정한 오케스트레이션을 위해서는 사실상 workflow 관련 기능이 필요한데, 이제야 나온거죠. 오늘 이 발표로 AI agent 는 또 한 번 크게 진화하게 됩니다. 멀티 에이전트를 하네스로서 최초로 구현한 것이 Claude Code 에서였고, 그 이후로 agent team 을 론칭한 이후 대략 반년 정도의 공백만에 이걸 적절히 조합한 것, 아니 그 이상의 의미를 지니는 것으로 보시면 됩니다. 공식 문서는 https://t.co/o2r6bGm0pO 입니다. 기존에 몇몇 오픈소스들도 바로 영향을 받겠네요. 아니 그렇다기보단, 그런 오픈소스들 덕분에 선도 회사에서 이런 것들을 더 늦지 않게 내놓는다고 보는 게 맞을지도요. 소셜에 올라오는 내용들 보면 새로 생긴 ultracode 명령어로 서브에이전트 수백개를 동시에 돌려서 오랫동안 복잡한 일들을 알아서 처리하는 것에 집중해서 이야기하고 있는데, 틀린 이야기는 아닙니다만 진짜 본질은 다른 곳에 있다고 봅니다. 우선, 두번째 이미지를 보세요. 저 공식문서에 있는 영문 테이블을 한글로 번역해서 터미널 출력을 캡쳐한 것인데, subagent, skill, workflow(이번에 새로 추가) 이 세가지 핵심 개념을 헷갈리지 않게 잘 정리해 놓은 표입니다. 이 이해가 핵심입니다. 즉, 어떤 경우에 스킬, 서브에이전트, (다이내믹) 워크플로우를 사용해야 하는지 적절한 판단을 하기 위해서, 이들의 핵심 메카니즘을 비교 설명한 표입니다. 수백개 (기존에는 제 경험상 20여개가 한계)의 서브 에이전트가 돌아가는 것은 대규모의 복잡한 워크플로우를 돌리면서, 그것도 요소마다 구현, 검증, 수정 각각을 배치해서 편향이나 컨텍스트 오염을 막고 전문성을 높이는 데에 필요조건은 맞습니다만, 충분조건이 될 수는 없습니다. workflow 역시 그것만으로는 충분조건은 아닙니다만, 이 둘이 함께 있으면 충분조건이 됩니다. 일할 사람과 일하는 규칙이 셋팅이 되는거니까요. 워크플로우는 쉽게 생각하시면 일이 진행되는 프로세스로 보시면 됩니다. 어떤 경우에는 어떤 것이 실행되어야 하고, 언제 멈춰야 하고, 언제 넘겨야 하고, 입력과 출력은 어떻게 되어야 하고 등등의 규칙 덩어리입니다. 이 규칙대로 반드시 실행이 되어야 하죠. 이런 규칙을 흔히 script 로 표현하는 간단한 프로그램으로 만들어서 실행합니다. AI agent 는 기본적으로 완벽하게 규칙대로 흘러가지 않습니다. 장점이자 단점인데, 스스로 판단을 해야 하는 것이 필요한 경우에는 agent 를 써야 하지만, 무조건 정해진 규칙대로 루틴하게 돌아가야 하는 것을 agent 에게 맡기면 돌발 상황이 생기기 때문에 이런 것은 워크플로우로 처리해야 합니다. 따라서, 기존에 AI 를 활용한 어떤 것을 만든다고 할 때에는 AI agent 에게 맡겨야 하는 부분과 그러면 안되고 철저하게 독립적인 프로세스를 다따라야 하는 부분으로 나누어서 개발을 하되, 이 둘을 연결하는 것도 워크플로우의 최상단으로서 잡아주어야 했습니다. 그런 워크플로우 자체도 AI agent 와 개발하긴 했지만요. 이제 이 워크플로우 자체를 워크플로우가 필요한 시점에 AI agent 가 주어진 목적에 맞게 직접 설계하고 구현합니다. 그래서 dynamic workflow 라고 dynamic 이 붙었습니다. 만약 실행 중에도 동적으로 워크플로우 스크립트 변형을 AI 가 할 수 있으면 이건 AI 가 혼자 멋대로 판단해 버리는 이슈 때문에 제대로 워크플로우가 돌아가지 않는데, 공식 기술문서를 보면 당연하게도 실행되는 중에는 AI 절대 관여하지 못하고 철저하게 작성된 스크립트에 따라 워크플로우가 제어되게끔 돌아가는 구조입니다. 한편, 첫번째 이미지는 Anthropic 의 PM 인 Cat Woo 가 올린 직관적인 비교 설명 다이어그램인데, 기존 서브에이전트는 그 서브에이전트끼리는 서로 통신할 수 없었고, Agent Team 로 호출한 에이전트들끼리만 통신이 가능했으나, 이번 dynamic workflow 를 내놓으면서 최소한 이 모드에서는 서브에이전트끼리도 통신이 가능한 것처럼 보입니다. 아직 직접 테스트 전이지만 공식 기술문서들을 체크해 보니, 여전히 서브에이전트끼리는 통신이 불가능하고, 에이전트가 직접 mesh 로 메시징이 가능한 것은 agent teams 로 호출한 에이전트들끼리만 가능합니다. 그러면 저 다이어그램은 뭐냐. 추상화시킨 것을 개념적으로 시각화한 것인데 깊게 파고들면 오해를 낳을 순 있겠네요. 살펴보니, 기술적으로는, subagent 를 호출하는 것은 script 입니다. 즉 서브에이전트간에 nested 계층 구조 같은 것은 없으며 script 가 다수의 서브에이전트들을 스폰해서 각각 역할 설정뿐만 아니라, 서로 어떻게 통신해야 하는 지에 대한 프로토콜도 전부 스크립트에서 제어하고, 주고 받아야 하는 통신 내용도 스크립트의 변수로 철저하게 통제할 수 있습니다. 당연히 워크플로우라면 이래야 합니다. 굳이 이런 속사정까지 이해할 필요 없이 어떤 것이고 어떻게 쓰는 지 궁금하신 분들은 첫번째 동영상이 가장 도움이 되실 것 같습니다. X 에서 공식 계정에 붙은 댓글들 중에 @ajith_io 가 올린 동영상입니다. 잘 만들었네요. Anthropic 은 하반기에 대규모 메모리, 특히 팀으로서 에이전트들간에 협업을 하는 데에 집중하여 이에 대한 R&D를 할 것이라고 담당자가 한 인터뷰에서 이야기했습니다. 개인적으로 "전문가 에이전트 협업"에 집중하고 있고 (1인 기업으로서 이게 튼튼해야 제 사업이 잘 되서), 이에 필요한 여러가지 요소들을 구현하고 업데이트하고 시스템화하고 있는데, 추후 Anthropic 이 어떤 것들을 내놓을지 대충 예상이 되니 최대한 엄하게 중복되지 않도록 엣지 포인트만 좀 더 걸러서 집중하는 중입니다. 제가 어떻게 AI 전문가 팀을 활용하고 있는 지는 현시점에서는 https://t.co/22UH4BPzYS 1시간 영상과 이 영상에 달린 질문 댓글의 답변에 가장 잘 담겨 있습니다. 궁금하신 분들은 참고해보세요. 그나저나 이제 13일 됐는데 조회수 2만 4천을 넘었네요... #ai #agent #workflow #claudecode

Gonnector's tweet photo. Anthropic 이 Claude Opus4.8 을 발표하면서 Harness 본체인 Claude Code 쪽에는 신기능인 Dynamic Workflow 를 발표했습니다. 진정한 오케스트레이션을 위해서는 사실상 workflow 관련 기능이 필요한데, 이제야 나온거죠. 오늘 이 발표로 AI agent 는 또 한 번 크게 진화하게 됩니다.

멀티 에이전트를 하네스로서 최초로 구현한 것이 Claude Code 에서였고, 그 이후로 agent team 을 론칭한 이후 대략 반년 정도의 공백만에 이걸 적절히 조합한 것, 아니 그 이상의 의미를 지니는 것으로 보시면 됩니다.

공식 문서는 https://t.co/o2r6bGm0pO 입니다.

기존에 몇몇 오픈소스들도 바로 영향을 받겠네요. 아니 그렇다기보단, 그런 오픈소스들 덕분에 선도 회사에서 이런 것들을 더 늦지 않게 내놓는다고 보는 게 맞을지도요.

소셜에 올라오는 내용들 보면 새로 생긴 ultracode 명령어로 서브에이전트 수백개를 동시에 돌려서 오랫동안 복잡한 일들을 알아서 처리하는 것에 집중해서 이야기하고 있는데, 틀린 이야기는 아닙니다만 진짜 본질은 다른 곳에 있다고 봅니다.

우선, 두번째 이미지를 보세요. 저 공식문서에 있는 영문 테이블을 한글로 번역해서 터미널 출력을 캡쳐한 것인데, subagent, skill, workflow(이번에 새로 추가) 이 세가지 핵심 개념을 헷갈리지 않게 잘 정리해 놓은 표입니다. 이 이해가 핵심입니다. 즉, 어떤 경우에 스킬, 서브에이전트, (다이내믹) 워크플로우를 사용해야 하는지 적절한 판단을 하기 위해서, 이들의 핵심 메카니즘을 비교 설명한 표입니다.

수백개 (기존에는 제 경험상 20여개가 한계)의 서브 에이전트가 돌아가는 것은 대규모의 복잡한 워크플로우를 돌리면서, 그것도 요소마다 구현, 검증, 수정 각각을 배치해서 편향이나 컨텍스트 오염을 막고 전문성을 높이는 데에 필요조건은 맞습니다만, 충분조건이 될 수는 없습니다. workflow 역시 그것만으로는 충분조건은 아닙니다만, 이 둘이 함께 있으면 충분조건이 됩니다. 일할 사람과 일하는 규칙이 셋팅이 되는거니까요.

워크플로우는 쉽게 생각하시면 일이 진행되는 프로세스로 보시면 됩니다. 어떤 경우에는 어떤 것이 실행되어야 하고, 언제 멈춰야 하고, 언제 넘겨야 하고, 입력과 출력은 어떻게 되어야 하고 등등의 규칙 덩어리입니다. 이 규칙대로 반드시 실행이 되어야 하죠. 이런 규칙을 흔히 script 로 표현하는 간단한 프로그램으로 만들어서 실행합니다.

AI agent 는 기본적으로 완벽하게 규칙대로 흘러가지 않습니다. 장점이자 단점인데, 스스로 판단을 해야 하는 것이 필요한 경우에는 agent 를 써야 하지만, 무조건 정해진 규칙대로 루틴하게 돌아가야 하는 것을 agent 에게 맡기면 돌발 상황이 생기기 때문에 이런 것은 워크플로우로 처리해야 합니다.

따라서, 기존에 AI 를 활용한 어떤 것을 만든다고 할 때에는 AI agent 에게 맡겨야 하는 부분과 그러면 안되고 철저하게 독립적인 프로세스를 다따라야 하는 부분으로 나누어서 개발을 하되, 이 둘을 연결하는 것도 워크플로우의 최상단으로서 잡아주어야 했습니다. 그런 워크플로우 자체도 AI agent 와 개발하긴 했지만요.

이제 이 워크플로우 자체를 워크플로우가 필요한 시점에 AI agent 가 주어진 목적에 맞게 직접 설계하고 구현합니다. 그래서 dynamic workflow 라고 dynamic 이 붙었습니다. 만약 실행 중에도 동적으로 워크플로우 스크립트 변형을 AI 가 할 수 있으면 이건 AI 가 혼자 멋대로 판단해 버리는 이슈 때문에 제대로 워크플로우가 돌아가지 않는데, 공식 기술문서를 보면 당연하게도 실행되는 중에는 AI 절대 관여하지 못하고 철저하게 작성된 스크립트에 따라 워크플로우가 제어되게끔 돌아가는 구조입니다.

한편, 첫번째 이미지는 Anthropic 의 PM 인 Cat Woo 가 올린 직관적인 비교 설명 다이어그램인데, 기존 서브에이전트는 그 서브에이전트끼리는 서로 통신할 수 없었고, Agent Team 로 호출한 에이전트들끼리만 통신이 가능했으나, 이번 dynamic workflow 를 내놓으면서 최소한 이 모드에서는 서브에이전트끼리도 통신이 가능한 것처럼 보입니다.

아직 직접 테스트 전이지만 공식 기술문서들을 체크해 보니, 여전히 서브에이전트끼리는 통신이 불가능하고, 에이전트가 직접 mesh 로 메시징이 가능한 것은 agent teams 로 호출한 에이전트들끼리만 가능합니다. 그러면 저 다이어그램은 뭐냐. 추상화시킨 것을 개념적으로 시각화한 것인데 깊게 파고들면 오해를 낳을 순 있겠네요.

살펴보니, 기술적으로는, subagent 를 호출하는 것은 script 입니다. 즉 서브에이전트간에 nested 계층 구조 같은 것은 없으며 script 가 다수의 서브에이전트들을 스폰해서 각각 역할 설정뿐만 아니라, 서로 어떻게 통신해야 하는 지에 대한 프로토콜도 전부 스크립트에서 제어하고, 주고 받아야 하는 통신 내용도 스크립트의 변수로 철저하게 통제할 수 있습니다. 당연히 워크플로우라면 이래야 합니다.

굳이 이런 속사정까지 이해할 필요 없이 어떤 것이고 어떻게 쓰는 지 궁금하신 분들은 첫번째 동영상이 가장 도움이 되실 것 같습니다. X 에서 공식 계정에 붙은 댓글들 중에 @ajith_io 가 올린 동영상입니다. 잘 만들었네요.

Anthropic 은 하반기에 대규모 메모리, 특히 팀으로서 에이전트들간에 협업을 하는 데에 집중하여 이에 대한 R&D를 할 것이라고 담당자가 한 인터뷰에서 이야기했습니다.

개인적으로 "전문가 에이전트 협업"에 집중하고 있고 (1인 기업으로서 이게 튼튼해야 제 사업이 잘 되서), 이에 필요한 여러가지 요소들을 구현하고 업데이트하고 시스템화하고 있는데, 추후 Anthropic 이 어떤 것들을 내놓을지 대충 예상이 되니 최대한 엄하게 중복되지 않도록 엣지 포인트만 좀 더 걸러서 집중하는 중입니다.

제가 어떻게 AI 전문가 팀을 활용하고 있는 지는 현시점에서는 https://t.co/22UH4BPzYS 1시간 영상과 이 영상에 달린 질문 댓글의 답변에 가장 잘 담겨 있습니다. 궁금하신 분들은 참고해보세요. 그나저나 이제 13일 됐는데 조회수 2만 4천을 넘었네요...

#ai #agent #workflow #claudecode

giljae retweeted

Pi Changelog

@PiChangelog

10 days ago

Pi v0.77.0 is out. Highlights: - Claude Opus 4.8 support added with updated adaptive-thinking coverage. - New --exclude-tools / -xt flag to disable specific built-in, extension, or custom tools while leaving the rest available. - Device-code login for OpenAI Codex subscriptions available as a headless alternative to browser login. - InputEvent.streamingBehavior lets extensions distinguish idle prompts, mid-stream steers, and queued follow-ups. Complete details in thread ↓

PiChangelog's tweet photo. Pi v0.77.0 is out.

Highlights:
- Claude Opus 4.8 support added with updated adaptive-thinking coverage.
- New --exclude-tools / -xt flag to disable specific built-in, extension, or custom tools while leaving the rest available.
- Device-code login for OpenAI Codex subscriptions available as a headless alternative to browser login.
- InputEvent.streamingBehavior lets extensions distinguish idle prompts, mid-stream steers, and queued follow-ups.

Complete details in thread ↓

giljae retweeted

크롱

@Krongggggg

10 days ago

텐센트가 사내 문서 RAG 플랫폼 WeKnora를 오픈소스로 던졌는데 깃허브 스타 1.5만 개 박힐 만하네. 산재된 자료를 추론 에이전트랑 자동 정렬되는 위키 구조로 묶어버리니까 맨날 업데이트 밀려서 썩어가던 사내 도큐먼트 지옥 탈출하기 딱 좋음.

130

151

13K

giljae retweeted

Sumanth

@Sumanth_077

10 days ago

Self Improving AI (SIA) beats Karpathy's autoresearcher agent by improving itself! SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task. Most agent frameworks are static. Fixed harness, fixed model weights, fixed memory layer. They plan, act, and use tools. SIA operates on a different layer entirely. SIA focuses on one problem: how do you design structured feedback loops that allow an agent to evaluate its own performance, adapt its strategy, and get better over time? After every run, SIA evaluates itself and improves three things. It updates its own harness. Updates the weights of its underlying model. Updates its own memory layer to handle new complexities. The agent rewrites itself based on what it learned. On MLE-Bench, OpenAI's benchmark for evaluating an agent's ability to train ML models, SIA climbed to the top of the leaderboard. Beat every specialized ML research agent including MLEvolve and AIRA-dojo. Then kept improving and displaced its own previous versions on the leaderboard. I've shared the link to the paper and the repo in the replies!

Sumanth_077's tweet photo. Self Improving AI (SIA) beats Karpathy's autoresearcher agent by improving itself!

SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.

Most agent frameworks are static. Fixed harness, fixed model weights, fixed memory layer. They plan, act, and use tools. SIA operates on a different layer entirely.

SIA focuses on one problem: how do you design structured feedback loops that allow an agent to evaluate its own performance, adapt its strategy, and get better over time?

After every run, SIA evaluates itself and improves three things. It updates its own harness. Updates the weights of its underlying model. Updates its own memory layer to handle new complexities. The agent rewrites itself based on what it learned.

On MLE-Bench, OpenAI's benchmark for evaluating an agent's ability to train ML models, SIA climbed to the top of the leaderboard. Beat every specialized ML research agent including MLEvolve and AIRA-dojo. Then kept improving and displaced its own previous versions on the leaderboard.

I've shared the link to the paper and the repo in the replies!

987

175

88K

giljae retweeted

阿泽 AZe

@Chenzeze777

10 days ago

发现一个开源项目，程序员看了会疯名字叫 Understand-Anything，一键把任何代码变成可交互的知识图谱你只要丢进去一个项目，它就能自动分析出：每个文件干什么每个函数调用了谁每个类依赖了什么整个项目的架构长什么样重点是：免费、本地运行、支持所有主流 AI 编程工具以前接手别人的代码，光看就要一周。现在打开图谱，10 分钟看懂整个项目 🔗 https://t.co/eWARnX9rmL

Chenzeze777's tweet photo. 发现一个开源项目，程序员看了会疯

名字叫 Understand-Anything，一键把任何代码变成可交互的知识图谱

你只要丢进去一个项目，它就能自动分析出：

每个文件干什么
每个函数调用了谁
每个类依赖了什么
整个项目的架构长什么样

重点是：

免费、本地运行、支持所有主流 AI 编程工具

以前接手别人的代码，光看就要一周。现在打开图谱，10 分钟看懂整个项目

🔗 https://t.co/eWARnX9rmL

227

294

12K

giljae retweeted

OrcaRouter 🐳

@OrcaRouter

10 days ago

🚀Just launched Claude Opus 4.8 API on OrcaRouter! Experience one of the strongest coding models yet: https://t.co/jLhh1Zk0EM

137

giljae retweeted

Kshitij Mishra | AI & Tech

@DAIEvolutionHub

10 days ago

🚨 MICROSOFT JUST OPEN-SOURCED A WAY TO “TRAIN” AI AGENTS WITHOUT TOUCHING MODEL WEIGHTS SkillOpt treats a simple markdown skill file like neural network parameters and optimizes it with learning rates, validation checks, minibatches, and epochs. The result? Agents get smarter over time while the base LLM stays frozen. Instead of retraining models, it continuously improves the agent’s reasoning rules inside a single readable .md file. Paper: https://t.co/6KeabqThlf GitHub: https://t.co/geCmfimwnr

DAIEvolutionHub's tweet photo. 🚨 MICROSOFT JUST OPEN-SOURCED A WAY TO “TRAIN” AI AGENTS WITHOUT TOUCHING MODEL WEIGHTS

SkillOpt treats a simple markdown skill file like neural network parameters and optimizes it with learning rates, validation checks, minibatches, and epochs.

The result? Agents get smarter over time while the base LLM stays frozen.

Instead of retraining models, it continuously improves the agent’s reasoning rules inside a single readable .md file.

Paper: https://t.co/6KeabqThlf
GitHub: https://t.co/geCmfimwnr

128

166

giljae retweeted

Michael Livs

@micLivs

10 days ago

introducing pi-dynamic-workflows This is probably going to be a bigger token burner than pi-goal, BUT, dynamic workflows is the first implementation of subagents that i don't hate, mainly because it's "code mode" for subagents. agent writes a js-based workflow DSL into a dedicated tool, engine parses the workflow code and runs it. the dsl implements some primitives for the agent (agent(), parallel(), pipeline(), phase() and log()) to keep it as simple as possible. now available in @badlogicgames pi! pi install npm:pi-dynamic-workflows

324K

giljae retweeted

一只小橘呀

@Ellieorange8

10 days ago

我每天刷Hacker News、Reddit、Twitter找信息，结果80%都是噪音。直到我发现了这个开源项目： Horizon，一个AI信息雷达 🌐 自动聚合Hacker News / Twitter / Reddit / GitHub AI自动打分，低质量直接过滤提取评论区高质量观点自动补充陌生公司/技术背景同一新闻全网去重支持中英双语简报可推送到飞书/邮箱/微信一句话：帮你从“刷信息”变成“收信息”。 GitHub: https://t.co/MMfKvCbW1V 如果你也受够了信息噪音，收藏起来试试吧~

478

105

687

28K

Grey Joo @giljae

10 days ago

That makes sense.

Max For AI

@MaxForAI

12 days ago

昨天见了一个非常牛逼的Agent团队，我敢说在国内绝对是T0的级别（之前DPSK还找他们搞了点Agent数据）刚好聊到了这两天推上吵得非常热闹的AI产品（Agent）要不要用Python的话题他们Founder说的很直接：SB才在Agent项目里用Python🤣 TS适合100%Agent项目，主要有几个原因：第一，Agent最终大多时候会在产品里。不管你做的是Chat界面、工作流面板、浏览器插件、Copilot，还是IDE扩展、Slack/Discord/网页工具，TS天然离这些更近。前端是TS，后端也是TS，中间的tool schema、事件流、UI状态都能共用一套类型。如果你用Python那就会变成：模型服务在Py、后端在Node、前端在TS 一份schema要复制三份如果某个字段名大小写错了，你的Agent马上就给死给你看。第二，Agent很依赖异步和事件流。 Agent不是一次请求一次回答这么简单。它要边想边输出，边调用工具，边等用户确认，边更新UI，边处理取消、重试、超时、恢复。 TS/Node在事件驱动、stream、WebSocket、server-sent events这些场景里很顺。 Python当然也能做，但你会更容易感受到「这东西本来不是为这类Web产品链路长出来的」。第三，类型系统对Agent很重要。 Agent真正容易炸的地方不是「模型不会说话」，而是工具参数错、返回结构错、状态字段错、上下文对象变形。 TS可以把很多东西提前卡住： tool input/output、agent state、message format、UI事件、workflow node、permission object、external API response 这对Agent很关键，因为Agent系统里有大量JSON对象在飞来飞去。第四，TS更适合做「Agent runtime」。如果你做的是一个Agent框架、SDK、运行时、插件系统，TS优势更明显。因为使用者往往要把它接进：网页、后台服务、Electron、浏览器插件、VS Code插件、API route、serverless、edge runtime 这些地方TS生态更统一。所以很多Agent infra选TS，不是因为Python不行，是因为它们要服务的使用场景更接近Web开发者和产品团队。第五，AI应用现在其实是拼系统。早期大家用Python，是因为AI=模型。现在很多AI产品已经演化到包含LLM API、tool calling、database、vector store、browser automation、workflow、UI、billing、auth、analytics 这已经不是研究工程了，是产品工程。互联网产品工程的主语长期就是JS/TS。很无聊，但世界就是这么没品😮‍💨 但他也表示Python不会消失。更合理的分工其实是： Python做模型层、数据层、eval、embedding pipeline、离线任务、实验脚本。 TS做产品层、Agent编排层、前端交互层、插件层、用户可见的runtime。所以你如果做一个Agent产品，你最好： MVP前端+Agent orchestrator用TS。涉及模型训练、数据处理、复杂检索、评测系统，再上Python。聊了一下午，真的学了太多了才知道自己之前对于Agent的认知到底有多浅薄🧎

157

155

291K

giljae retweeted

Tibo

@thsottiaux

11 days ago

To simplify our Codex compute fleet management, we will be sunsetting GPT-5.2 and GPT-5.3-Codex in Codex on June 2nd when logged in with your ChatGPT account. For free plans, GPT-5.5 will be the default frontier model to build and work with going forward. These models will remain available on our API.

635

152

346

778K

giljae retweeted

vast.ai @vast_ai

about 2 months ago

Three lines of Python. Eight H100s. $7.84/hr. That's the whole script. pip install vastai-sdk.

854

20K

117M

giljae retweeted

DAIR.AI

@dair_ai

19 days ago

Great new paper to read: Code as Agent Harness (bookmark it)

101

13K

giljae retweeted

Sarah Drasner

@sarah_edo

19 days ago

Instead, it's a single agent skill that finds and retrieves the best guidance for your use case! Check out the site here: https://t.co/2gclxj9rHw, and can be installed with npx modern-web-guidance@latest install. Our evals run daily with state-of-the-art models and coding agents!

giljae retweeted

elvis

@omarsar0

15 days ago

// Adapt the Interface, Not the Model // I am fascinated by the results across my cheap-model-plus-good-harness builds. This new paper also shows good signs of the code-as-agent-harness thesis. The idea is really simple. Do not touch the model. Instead, modify the runtime interface that wraps the frozen LLM. Then convert recurring interaction failures into reusable interventions on the harness side. The paper reports an average relative improvement 88.5% across 7 deterministic environments, 126 model-environment settings, and 18 backbones. A harness learned from one model trajectory generalizes to 17 other backbones. That tells you the harness is capturing environment structure, not model-specific patterns. If you ship agents in production, your harness work is more portable than you might assume. Paper: https://t.co/Petka4g3F2 Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. // Adapt the Interface, Not the Model //

I am fascinated by the results across my cheap-model-plus-good-harness builds.

This new paper also shows good signs of the code-as-agent-harness thesis.

The idea is really simple. Do not touch the model. Instead, modify the runtime interface that wraps the frozen LLM. Then convert recurring interaction failures into reusable interventions on the harness side.

The paper reports an average relative improvement 88.5% across 7 deterministic environments, 126 model-environment settings, and 18 backbones.

A harness learned from one model trajectory generalizes to 17 other backbones. That tells you the harness is capturing environment structure, not model-specific patterns.

If you ship agents in production, your harness work is more portable than you might assume.

Paper: https://t.co/Petka4g3F2

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

276

341

25K

Grey Joo

@giljae

Last Seen Users on Sotwe

Trends for you

Most Popular Users