까마귀

@o2allergic

8할의 호기심과 2할의 두려움

Joined April 2010

428 Following

153 Followers

3.2K Posts

o2allergic retweeted

Akshay 🚀

@akshay_pachaar

about 19 hours ago

Hermes Mixture of Agents (MoA) explained. Every agent commits to a single model, and every model has blind spots the others would have caught. The usual workaround is to run the same prompt through a few models by hand and reconcile the answers. It works, but it lives outside the agent, so the tools, the memory, and the session are gone the moment that detour starts. Hermes Agent by Nous Research just shipped Mixture of Agents, which folds that whole process back inside the agent. The unit you work with is a preset. Think of it as a recipe that names a few models to consult and one model to write the final answer, saved under a label you can reuse. So a preset might list GPT-5.5 and DeepSeek as the models to consult, with Opus as the one that replies. You set it up once, give it a name, and pick it later like any other model. The models you consult run first and quietly hand their analysis to the one writing the answer. That final model is the one that actually replies and makes the tool calls, now informed by several perspectives instead of one. Here is the part that makes it click. The preset shows up as a model, not as a framework to wire together. So everything that already works in Hermes keeps working. Tool calls, follow-up iterations, memory, and the same session context behave exactly as they do with a single model, because to the agent loop it is a single model. The models can come from anywhere. One preset can mix OpenAI, Anthropic, DeepSeek, and Google, and it is not capped at two. A few things follow from that design. �� It composes a model instead of choosing one. Several models covering each other's blind spots can beat the strongest one on its own. → It stays cheap to run. The models you consult see a stripped-down view of the conversation, so the extra calls stay light and the main context keeps its cache. → It reaches past any single frontier model. Combining the providers already on hand assembles a composite that can outscore the best one available alone. → It is a dial, not a default. It turns on for the hard ten percent of tasks where a second opinion matters, and stays off for routine work where speed wins. Nous reports the effect on its own benchmark. A preset running Opus-4.8 over a GPT-5.5 reference scored higher than either model alone, by roughly six points and eight to eleven percent. The lesson is not that one model has to win. It is that the best answer rarely comes from a single model, and the agent should make blending them as easy as picking one. That said, if you're looking to set up Hermes, I wrote a full deep dive covering the Hermes agent's architecture, memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents. The article is quoted below. You can also watch my YouTube crash course on the Hermes agent: https://t.co/9AEZ7DRn68

akshay_pachaar's tweet photo. Hermes Mixture of Agents (MoA) explained.

Every agent commits to a single model, and every model has blind spots the others would have caught.

The usual workaround is to run the same prompt through a few models by hand and reconcile the answers. It works, but it lives outside the agent, so the tools, the memory, and the session are gone the moment that detour starts.

Hermes Agent by Nous Research just shipped Mixture of Agents, which folds that whole process back inside the agent.

The unit you work with is a preset. Think of it as a recipe that names a few models to consult and one model to write the final answer, saved under a label you can reuse.

So a preset might list GPT-5.5 and DeepSeek as the models to consult, with Opus as the one that replies. You set it up once, give it a name, and pick it later like any other model.

The models you consult run first and quietly hand their analysis to the one writing the answer. That final model is the one that actually replies and makes the tool calls, now informed by several perspectives instead of one.

Here is the part that makes it click. The preset shows up as a model, not as a framework to wire together.

So everything that already works in Hermes keeps working. Tool calls, follow-up iterations, memory, and the same session context behave exactly as they do with a single model, because to the agent loop it is a single model.

The models can come from anywhere. One preset can mix OpenAI, Anthropic, DeepSeek, and Google, and it is not capped at two.

A few things follow from that design.

�� It composes a model instead of choosing one. Several models covering each other's blind spots can beat the strongest one on its own.

→ It stays cheap to run. The models you consult see a stripped-down view of the conversation, so the extra calls stay light and the main context keeps its cache.

→ It reaches past any single frontier model. Combining the providers already on hand assembles a composite that can outscore the best one available alone.

→ It is a dial, not a default. It turns on for the hard ten percent of tasks where a second opinion matters, and stays off for routine work where speed wins.

Nous reports the effect on its own benchmark. A preset running Opus-4.8 over a GPT-5.5 reference scored higher than either model alone, by roughly six points and eight to eleven percent.

The lesson is not that one model has to win. It is that the best answer rarely comes from a single model, and the agent should make blending them as easy as picking one.

That said, if you're looking to set up Hermes, I wrote a full deep dive covering the Hermes agent's architecture, memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents.

The article is quoted below.

You can also watch my YouTube crash course on the Hermes agent: https://t.co/9AEZ7DRn68

568

739

50K

o2allergic retweeted

크롱

@Krongggggg

1 day ago

썬더님 의견에 부연 설명하자면 Claude Tag 를 단순한 agent 라고 생각하지 않습니다. 슬랙에서 오고가는 대화들과 암묵지를 수집하고 이를 유사한것들끼리 분류하고 기억하고 폐기하는 라이프사이클이 돌아가는 방식으로 동작할것이고 카파시가 앤쓰로픽에 합류한 주된 이유도 살짝 컨셉으로만 머물러 있던 llm wiki를 고도화 하여 이를 구현했을것으로 생각하고 있습니다.

Krongggggg's tweet photo. 썬더님 의견에 부연 설명하자면
Claude Tag 를 단순한 agent 라고 생각하지 않습니다.

슬랙에서 오고가는 대화들과 암묵지를 수집하고
이를 유사한것들끼리 분류하고 기억하고 폐기하는
라이프사이클이 돌아가는 방식으로 동작할것이고

카파시가 앤쓰로픽에 합류한 주된 이유도
살짝 컨셉으로만 머물러 있던 llm wiki를 고도화 하여
이를 구현했을것으로 생각하고 있습니다.

o2allergic retweeted

GUTE_RachelHan

@GuteslaX

1 day ago

여러 자료들과 함께 숙제 끝. 저보다 더 전문가 분들이 많으니 앞으로 다양한 시각을 수놓아 주시리라 생각합니다. 제 계정은 전문가분들의 마인드 블로잉을 하는 계정으로다가 컨셉을 나름 잡았는데,, 요즘 현업으로 넘모 바빴;; 그럼 남은 주말 행쇼 ❤️

o2allergic retweeted

홍명수

@Myeongsu_bean

2 days ago

회춘 주사 실험중. 1. 일본 야마나카 신야가 발견한 방식을 미국 바이오테크 기업 라이프바이오사이언스가 적용하여 '세포 재프로그래밍' 주사제를 실험중. 2. 암 발생 위험이 있기에, 인체에서 가장 독립적인 기관인 안구에 먼저 실험. 3. 망막세포는 신경세포의 일종이라 원래 재생 불가능한 세포라는 점도 실험 결과를 유의미하�� 만들어 줌. 4. 잘 되면 신경세포까지 살릴 수 있는, 기존과는 아예 다른 치료법이 탄생하는 것이고 5. 실제로 인간 노화까지 모두 정복할 수 있는 길이 열리는 것이라 관심이 큼. 6. 그런데 이거 웹툰 소재기도 한데..? 알약 먹어야 회춘이 유지되는 거 이거.. 이거 뭐였지 웹툰 이름? 이 기술이 성공한다면 제약바이오쪽 산업지도가 완전히 재구성 되겠네요. 임상3상까지 모두 통과하고 상용화되기까진 시간이 걸리겠지만 인체 실험 성공하고 나면 돈 많은 사람들 너도나도 투자한다고 할 겁니다. 정말 영생이 다가오나..? 🫠 기사출처: 한국경제

Myeongsu_bean's tweet photo. 회춘 주사 실험중.

1. 일본 야마나카 신야가 발견한 방식을 미국 바이오테크 기업 라이프바이오사이언스가 적용하여 '세포 재프로그래밍' 주사제를 실험중.

2. 암 발생 위험이 있기에, 인체에서 가장 독립적인 기관인 안구에 먼저 실험.

3. 망막세포는 신경세포의 일종이라 원래 재생 불가능한 세포라는 점도 실험 결과를 유의미하�� 만들어 줌.

4. 잘 되면 신경세포까지 살릴 수 있는, 기존과는 아예 다른 치료법이 탄생하는 것이고

5. 실제로 인간 노화까지 모두 정복할 수 있는 길이 열리는 것이라 관심이 큼.

6. 그런데 이거 웹툰 소재기도 한데..? 알약 먹어야 회춘이 유지되는 거 이거.. 이거 뭐였지 웹툰 이름?

이 기술이 성공한다면 제약바이오쪽 산업지도가 완전히 재구성 되겠네요. 임상3상까지 모두 통과하고 상용화되기까진 시간이 걸리겠지만 인체 실험 성공하고 나면 돈 많은 사람들 너도나도 투자한다고 할 겁니다.

정말 영생이 다가오나..? 🫠

기사출처: 한국경제

Who to follow

Crypto/AI/Art Stay hungry Stay foolish

o2allergic retweeted

Teknium 🪽

@Teknium

2 days ago

Introducing Mixture of Agents 2.0 in Hermes Agent. Combine any provider's models into a mixture of your own. Access your presets as if it were a normal model in Hermes. Big improvement in our soon-to-release HermesBench against opus and gpt-5.5 with MoA using Opus & GPT together.

153

219

581K

o2allergic retweeted

GUTE_RachelHan

@GuteslaX

about 2 months ago

결국 앞으로의 AI 인프라 경쟁은 세 층에서 벌어질 가능성이 큽니다. 첫째는 GPU/NPU와 메모리, 네트워크 같은 물리적 컴퓨팅 및 에너지 자원 경쟁입니다. 둘째는 Token Factory처럼 동일한 자원에서 더 많은 토큰을 생산하는 추론 최적화 레이어 경쟁입니다. 셋째는 중국의 AI Box��럼 모델, 데이터, 워크플로우, 산업 애플리케이션을 묶어 고객 현장에 넣는 통합 솔루션 경쟁입니다. 한국이 주목해야 할 지점도 여기에 있습니다. 한국은 메모리와 스토리지, 일부 NPU와 시스템 소프트웨어 역량을 가지고 있지만, 이것이 고객의 실제 AI 업무 파이프라인과 연결되지 않으면 부품 경쟁에 머물 수 있습니다. 앞으로 필요한 것은 단순한 AI 서버가 아니라, Agentic AI 워크로드를 이해하고, 메모리·스토리지·NPU·런타임·모델 최적화·업무 시스템 연동을 하나의 패키지로 설계하는 능력입니다. 네비우스가 Eigen AI를 인수한 것은 추론 효율 경쟁의 신호이고, 중국의 AI Box 확산은 AI 인프라가 고객 현장형 통합 상품으로 바뀌고 있다는 신호입니다. AI 인프라의 가치는 더 이상 연산 자원을 보유하는 것만으로 결정되지 않습니다. 앞으로는 AI 워크로드를 얼마나 잘 이해하고, 그것을 실제 산업 현장의 실행 시스템으로 얼마나 빠르고 싸게 바꿀 수 있느냐가 핵심 경쟁력이 될 것입니다

o2allergic retweeted

Alex Lieberman

@businessbarista

4 days ago

I stole this idea and now use it with every single employee. It’s the best illustration I’ve seen of teaching someone to be high agency. It says there are 5 levels of work: Level 1: “There is a problem.” Level 2: “There is a problem, and I’ve found some causes.” Level 3: “Here’s the problem, here are some possible causes, and here are some possible solutions.” Level 4: “Here’s the problem, here’s what I think caused it, here are some possible solutions, and here’s the one I think we should pick.” Level 5: “I identified a problem, figured out what caused it, researched how to fix it, and I fixed it. Just wanted to keep you in the loop.” Using this framework, here’s what I say to every new employee… You will live at Level 4 from Day 1 and as we build trust you will rise to Level 5. Being high agency doesn’t just mean tackling problems in this way. It means your entire way of working should be oriented to being a Level 4+ employee. Plz feel free to steal it as well. And ty @stephsmithio for the framework!

businessbarista's tweet photo. I stole this idea and now use it with every single employee.

It’s the best illustration I’ve seen of teaching someone to be high agency.

It says there are 5 levels of work:

Level 1: “There is a problem.”

Level 2: “There is a problem, and I’ve found some causes.”

Level 3: “Here’s the problem, here are some possible causes, and here are some possible solutions.”

Level 4: “Here’s the problem, here’s what I think caused it, here are some possible solutions, and here’s the one I think we should pick.”

Level 5: “I identified a problem, figured out what caused it, researched how to fix it, and I fixed it. Just wanted to keep you in the loop.”

Using this framework, here’s what I say to every new employee…

You will live at Level 4 from Day 1 and as we build trust you will rise to Level 5.

Being high agency doesn’t just mean tackling problems in this way. It means your entire way of working should be oriented to being a Level 4+ employee.

Plz feel free to steal it as well.

And ty @stephsmithio for the framework!

190

658

11K

o2allergic retweeted

AI Edge

@aiedge_

3 days ago

It's absolutely insane that this is free. This guy should've charged $1000+ for access to this. I just found the ultimate Hermes agent website. Free Hermes guides, agent skills, memory & context tools, plugins, extensions, and so much more. → https://t.co/vFFoDlDn6n

288

687

66K

o2allergic retweeted

Luke The Dev

@iamlukethedev

9 days ago

Hermes Bible is live. A community-built place to search Hermes Agent docs and real workflows in seconds. 169 docs pages. 24 community flows. Instant ⌘K search. Unofficial. Community-built. Not affiliated with @NousResearch I built this because the Hermes community is moving fast, and the best workflows should not get buried in X Articles, Discord, or random posts. Read the docs. Study the flows. Submit your own. If your flow gets added, I’ll link back to your X profile. https://t.co/jPWF2DKjbi

204

138K

o2allergic retweeted

GUTE_RachelHan

@GuteslaX

4 days ago

UNECE WP.29, ADS를 ‘기술’에서 ‘검증 가능한 안전 시스템’으로 규정하다 UNECE WP.29의 2026년 ADS 신규 UN GTR 초안 (Automated Driving Systems에 대한 글로벌 기술규정 초안이고, 2026년 6월 WP.29/AC.3 검토·표결 대상��로 제출된 문서) —- 회고 포함 ⭕️ 자율주행 논의는 오랫동안 기술 데모의 언어로 진행됐다. 차가 스스로 차선을 바꾸는가, 복잡한 교차로를 통과하는가, 보행자를 피하는가, 고속도로에서 사람보다 부드럽게 주행하는가. 하지만 UNECE WP.29는 안전을 추구하며 Automated Driving Systems 글로벌 기술규정 초안은 “이제 규제 근거”가 만들어 졌으니 각국의 상황에 따라 적용하거라“로 받아들이면 된다. * 규정 요약 : “어떤 조건에서, 어떤 근거로, 얼마나 검증 가능하게 안전한가” ⭕️ (긴 설명 시작) ADS 차량을 공공도로에 배치하기 위해 필요한 글로벌 안전 규정의 첫 틀이다. 여기서 ADS는 차량의 하드웨어와 소프트웨어가 함께 전체 동적 주행 과업, DDT를 지속적으로 수행하는 시스템으로 정의된다. 즉 운전자가 계속 감시하고 보조받는 수��이 아니라, 시스템이 인지·판단·계획·제어를 모두 수행하는 영역이다. 따라서 이 규정은 L2 ADAS를 조금 더 고도화한 기능을 말하는 것이 아니다. 본질적으로는 L3 이상, 더 넓게는 L4·L5 자율주행 시스템이 시장에 나오기 위해 어떤 안전 조건을 충족해야 하는지를 다룬다. * 이 지점에서 “자율주행”이라는 말의 무게가 달라진다. 마케팅 문구로서의 자율주행이 아니라, 규제기관 앞에서 증명해야 하는 자율주행이다. ⭕️ ADS는 최소한 유능하고 신중한 인간 운전자 이상의 안전 수준을 보여야 한다. 다만 이것은 “절대 사고가 없어야 한다”는 비현실적 기준이 아니다. 핵심은 불합리한 위험이 없어야 한다는 것이다. 자율주행 시스템이 스스로 운전한다고 주장하려면, 그 시스템은 자신의 작동 조건과 한계를 알고 있어야 하며, 위험 상황에서 적절히 대응해야 ��고, 실패했을 때도 최소위험상태로 전환할 수 있어야 한다. 이 업계에선 ODD(Operational Design Domain) 라는 개념이 있다. 자율주행은 “어디서나 된다”가 아니다. 특정 도로, 특정 속도, 특정 날씨, 특정 조도, 특정 교통환경 안에서 작동하도록 설계된다. 이 작동 가능 영역이 ODD다. 즉, 제조사는 ADS가 어떤 도로 유형에서 작동하는지, 어떤 지리적 범위를 갖는지, 어떤 속도 범위와 환경 조건을 전제로 하는지, 보행자·자전거·우선차량 같은 동적 객체를 어떻게 인식하고 대응하는지를 설명해야 한다. 결국 ODD는 자율주행의 영업범위이자 책임범위다. ODD를 넓게 잡으면 기술적 책임도 커진다. ODD를 좁게 잡으면 서비스 확장성은 떨어지지만 검증은 쉬워진다. (시나리오 검증 중요 - 시뮬레이션 기업 봐야겠지? ; 유니티 /언리얼 같은 무거운 엔진보다 다른거 :) - 아래 계속) ⭕️ 검증 방식도 완전히 달라진다. 과거 차량 규제는 특정 부품이나 장치의 성능시험 중심이었다. 하지만 ADS는 단일 시험으로 검증할 수 없다. UNECE 초안은 이를 위해 multi-pillar 접근을 제시한다. 시뮬레이션, 시험장 테스트, 실도로 테스트, 문서 감사, 배포 후 모니터링이 모두 결합된다. 즉 시험장에서 몇 개 시나리오를 통과했다고 끝나는 것이 아니라, 개발부터 양산, 배포 후 운행까지 이어지는 전체 안전관리 체계가 평가 대상이 된다. 여기서 핵심 산출물이 Safety Case다.요구사항별로 Claim, Argument, Evidence를 제시해야 한다. 어떤 안전 요구사항을 만족한다는 주장, 그 주장이 성립하는 논리, 그리고 이를 뒷받침하는 증거가 연결되어야 한다. 증거에는 시뮬레이션 결과, 시험장 데이터, 실도로 주행 결과, 시스템 분석, 사고 회피 데이터, 로그 등이 포함될 수 있다. 이 변화는 산업적으로 매우 크다. 자율주행 기업의 경쟁력은 더 이상 모델 성능이나 주행거리만으로 설명되지 않는다. 그 성능을 규제기관이 이해할 수 있는 안전 논리로 번역하는 능력이 필요하다. 어떤 시나리오를 선택했는지, 왜 그 시나리오가 ODD를 대표하는지, 시뮬레이션 결과가 실제 물리환경과 얼마나 일치하는지, 테스트 결과가 재현 가능한지까지 설명해야 한다. 특히 시뮬레이션의 위상이 커진다. ADS는 현실에서 모든 위험상황을 직접 시험할 수 없다. 희귀사고, 극단적 날씨, 급작스러운 끼어들기, 보행자 돌발행동 같은 시나리오는 현실도로에서 반복적으로 시험하기 어렵다. 특히 시뮬레이션 툴체인의 가정, 한계, 데이터 품질, 불확실성, 물리시험과의 상관성, 재현성까지 입증해야 한다. (1/2)

GuteslaX's tweet photo. UNECE WP.29, ADS를 ‘기술’에서 ‘검증 가능한 안전 시스템’으로 규정하다

UNECE WP.29의 2026년 ADS 신규 UN GTR 초안
(Automated Driving Systems에 대한 글로벌 기술규정 초안이고, 2026년 6월 WP.29/AC.3 검토·표결 대상��로 제출된 문서) —- 회고 포함

⭕️ 자율주행 논의는 오랫동안 기술 데모의 언어로 진행됐다.

차가 스스로 차선을 바꾸는가, 복잡한 교차로를 통과하는가, 보행자를 피하는가, 고속도로에서 사람보다 부드럽게 주행하는가. 하지만 UNECE WP.29는 안전을 추구하며 Automated Driving Systems 글로벌 기술규정 초안은 “이제 규제 근거”가 만들어 졌으니 각국의 상황에 따라 적용하거라“로 받아들이면 된다.
* 규정 요약 : “어떤 조건에서, 어떤 근거로, 얼마나 검증 가능하게 안전한가”

⭕️ (긴 설명 시작) ADS 차량을 공공도로에 배치하기 위해 필요한 글로벌 안전 규정의 첫 틀이다.

여기서 ADS는 차량의 하드웨어와 소프트웨어가 함께 전체 동적 주행 과업, DDT를 지속적으로 수행하는 시스템으로 정의된다. 즉 운전자가 계속 감시하고 보조받는 수��이 아니라, 시스템이 인지·판단·계획·제어를 모두 수행하는 영역이다.

따라서 이 규정은 L2 ADAS를 조금 더 고도화한 기능을 말하는 것이 아니다.
본질적으로는 L3 이상, 더 넓게는 L4·L5 자율주행 시스템이 시장에 나오기 위해 어떤 안전 조건을 충족해야 하는지를 다룬다.
* 이 지점에서 “자율주행”이라는 말의 무게가 달라진다. 마케팅 문구로서의 자율주행이 아니라, 규제기관 앞에서 증명해야 하는 자율주행이다.

⭕️ ADS는 최소한 유능하고 신중한 인간 운전자 이상의 안전 수준을 보여야 한다. 다만 이것은 “절대 사고가 없어야 한다”는 비현실적 기준이 아니다. 핵심은 불합리한 위험이 없어야 한다는 것이다.
자율주행 시스템이 스스로 운전한다고 주장하려면, 그 시스템은 자신의 작동 조건과 한계를 알고 있어야 하며, 위험 상황에서 적절히 대응해야 ��고, 실패했을 때도 최소위험상태로 전환할 수 있어야 한다.

이 업계에선 ODD(Operational Design Domain) 라는 개념이 있다.
자율주행은 “어디서나 된다”가 아니다. 특정 도로, 특정 속도, 특정 날씨, 특정 조도, 특정 교통환경 안에서 작동하도록 설계된다. 이 작동 가능 영역이 ODD다.

즉, 제조사는 ADS가 어떤 도로 유형에서 작동하는지, 어떤 지리적 범위를 갖는지, 어떤 속도 범위와 환경 조건을 전제로 하는지, 보행자·자전거·우선차량 같은 동적 객체를 어떻게 인식하고 대응하는지를 설명해야 한다.

결국 ODD는 자율주행의 영업범위이자 책임범위다. ODD를 넓게 잡으면 기술적 책임도 커진다. ODD를 좁게 잡으면 서비스 확장성은 떨어지지만 검증은 쉬워진다.
(시나리오 검증 중요 - 시뮬레이션 기업 봐야겠지? ; 유니티 /언리얼 같은 무거운 엔진보다 다른거 :) - 아래 계속)

⭕️ 검증 방식도 완전히 달라진다.
과거 차량 규제는 특정 부품이나 장치의 성능시험 중심이었다.

하지만 ADS는 단일 시험으로 검증할 수 없다. UNECE 초안은 이를 위해 multi-pillar 접근을 제시한다.

시뮬레이션, 시험장 테스트, 실도로 테스트, 문서 감사, 배포 후 모니터링이 모두 결합된다. 즉 시험장에서 몇 개 시나리오를 통과했다고 끝나는 것이 아니라, 개발부터 양산, 배포 후 운행까지 이어지는 전체 안전관리 체계가 평가 대상이 된다.

여기서 핵심 산출물이 Safety Case다.요구사항별로 Claim, Argument, Evidence를 제시해야 한다. 어떤 안전 요구사항을 만족한다는 주장, 그 주장이 성립하는 논리, 그리고 이를 뒷받침하는 증거가 연결되어야 한다. 증거에는 시뮬레이션 결과, 시험장 데이터, 실도로 주행 결과, 시스템 분석, 사고 회피 데이터, 로그 등이 포함될 수 있다.

이 변화는 산업적으로 매우 크다. 자율주행 기업의 경쟁력은 더 이상 모델 성능이나 주행거리만으로 설명되지 않는다.

그 성능을 규제기관이 이해할 수 있는 안전 논리로 번역하는 능력이 필요하다. 어떤 시나리오를 선택했는지, 왜 그 시나리오가 ODD를 대표하는지, 시뮬레이션 결과가 실제 물리환경과 얼마나 일치하는지, 테스트 결과가 재현 가능한지까지 설명해야 한다.

특히 시뮬레이션의 위상이 커진다. ADS는 현실에서 모든 위험상황을 직접 시험할 수 없다. 희귀사고, 극단적 날씨, 급작스러운 끼어들기, 보행자 돌발행동 같은 시나리오는 현실도로에서 반복적으로 시험하기 어렵다. 특히 시뮬레이션 툴체인의 가정, 한계, 데이터 품질, 불확실성, 물리시험과의 상관성, 재현성까지 입증해야 한다.
(1/2)

11K

o2allergic retweeted

darkzodchi

@zodchiii

5 days ago

A senior Anthropic engineer just published the clearest blueprint on "How to give your AI agent a real memory" and it's a 15-page PDF. Write → Consolidate → Recall → Apply • Write: after every attempt, the agent records what it tried and what happened. • Consolidate: it distills those raw attempts into a few reusable lessons, not a transcript dump. • Recall: before the next task, it reads those lessons first. • Apply: it skips the dead ends it already learned, even on a brand new problem. This is exactly how engineers now build agent loops in Claude Code. Read the paper, then grab the setup below 👇

993

162

119K

o2allergic retweeted

東大Codex研究所

@UT_Codex

5 days ago

【衝撃】 CodexがChatGPT Proを「計画担当」として呼び出すSkillが作られた😳 コードベース全体のコンテキストを渡しながらR/Wアクセス付きスレッドはリポジトリごとに紐付く設計だ https://t.co/ahorK3EKkE ・ChatGPT Proがタスクの計画を立てる・Codexがその計画を実行する・スレッドはリポジトリ単位で管理される「計画はGPTに、実行はCodexに」 AIどうしの��業構造を、Skillというコードで自分が書けるどのAIに何をやらせるか、その設計がコードになった

485

781

549K

o2allergic retweeted

PA13L0

@Fluyeporlaweb

5 days ago

Los 10 repos que más rápido han crecido este Junio en GitHub: 1. pewdiepie-archdaemon/odysseus PewDiePie (111M suscriptores en YouTube) construyó un workspace de IA self-hosted y lo publicó gratis. 75.8k estrellas en tres semanas. https://t.co/pGcXf1W1wP 2. mattpocock/skills Las skills de Claude Code de Matt Pocock (el referente de TypeScript). Las que usa él. En su directorio .claude. Ahora públicas. https://t.co/z5w6vdTqOE 3. chopratejas/headroom Creado por un ingeniero de Netflix. Comprime todo lo que lee tu agente de IA antes de que llegue al modelo. 60-95% menos tokens. Mismas respuestas. https://t.co/cJIsgoEBQU 4. DietrichGebert/ponytail Hace que tu agente de IA piense como el dev senior más vago de la sala. El mejor código es el que nunca se escribe. https://t.co/NfiOX8elT1 5. calesthio/OpenMontage El primer sistema de producción de vídeo agentic del mundo. 12 pipelines, 52 herramientas, 500+ agent skills. https://t.co/9DkTYHCzGD 6. jamiepine/voicebox Estudio de voz con IA open source y self-hosted. Clona voces, dicta, genera audio. Sin suscripción. https://t.co/4q8agBJCP0 7. ZhuLinsen/daily_stock_analysis Análisis inteligente de bolsa para mercados de EEUU, China y Hong Kong. LLM + noticias en tiempo real + dashboard. Corre gratis con cron. https://t.co/zuJqwTlnZQ 8. mvanhorn/last30days-skill Skill para agentes que investiga Reddit, X, YouTube, HN y Polymarket. Sintetiza todo en un resumen con contexto real. https://t.co/gA4Y8FG7fb 9. bytedance/deer-flow El SuperAgente open source de ByteDance. Investiga, escribe código y crea. Tareas de minutos a horas sin supervisión. https://t.co/DAw5f8nW7T 10. DeusData/codebase-memory-mcp Knowledge graph del código para Claude Code, Cursor y Codex. Se sincroniza solo con cada cambio. 100% local. Cero tokens extra. https://t.co/Pog64Fu7vX Junio de 2026 en GitHub está siendo una locura. Guarda esto.

Fluyeporlaweb's tweet photo. Los 10 repos que más rápido han crecido este Junio en GitHub:

1. pewdiepie-archdaemon/odysseus
PewDiePie (111M suscriptores en YouTube) construyó un workspace de IA self-hosted y lo publicó gratis.
75.8k estrellas en tres semanas.
https://t.co/pGcXf1W1wP

2. mattpocock/skills
Las skills de Claude Code de Matt Pocock (el referente de TypeScript).
Las que usa él. En su directorio .claude. Ahora públicas.
https://t.co/z5w6vdTqOE

3. chopratejas/headroom
Creado por un ingeniero de Netflix.
Comprime todo lo que lee tu agente de IA antes de que llegue al modelo.
60-95% menos tokens. Mismas respuestas.
https://t.co/cJIsgoEBQU

4. DietrichGebert/ponytail
Hace que tu agente de IA piense como el dev senior más vago de la sala.
El mejor código es el que nunca se escribe.
https://t.co/NfiOX8elT1

5. calesthio/OpenMontage
El primer sistema de producción de vídeo agentic del mundo.
12 pipelines, 52 herramientas, 500+ agent skills.
https://t.co/9DkTYHCzGD

6. jamiepine/voicebox
Estudio de voz con IA open source y self-hosted.
Clona voces, dicta, genera audio. Sin suscripción.
https://t.co/4q8agBJCP0

7. ZhuLinsen/daily_stock_analysis
Análisis inteligente de bolsa para mercados de EEUU, China y Hong Kong.
LLM + noticias en tiempo real + dashboard. Corre gratis con cron.
https://t.co/zuJqwTlnZQ

8. mvanhorn/last30days-skill
Skill para agentes que investiga Reddit, X, YouTube, HN y Polymarket.
Sintetiza todo en un resumen con contexto real.
https://t.co/gA4Y8FG7fb

9. bytedance/deer-flow
El SuperAgente open source de ByteDance.
Investiga, escribe código y crea. Tareas de minutos a horas sin supervisión.
https://t.co/DAw5f8nW7T

10. DeusData/codebase-memory-mcp
Knowledge graph del código para Claude Code, Cursor y Codex.
Se sincroniza solo con cada cambio. 100% local. Cero tokens extra.
https://t.co/Pog64Fu7vX

Junio de 2026 en GitHub está siendo una locura.

Guarda esto.

809

128

53K

o2allergic retweeted

요즘IT

@yozm_it

5 days ago

📌 요즘 뜨는 '루프 엔지니어링', 바로 따라 해볼 프롬프트 저장소(무료) 반복 작업을 AI에 맡기는 루프(작업 템플릿)를 모아둔 저장소 직접 짤 필요 없이, 골라서 바로 가져다 쓰면 돼요!! 할 수 있는 것: ✔ 카탈로그에서 원하는 루프 골라 프롬프트 복사 → 바로 실행 (설치 필요 없음) ✔ Claude Code·Cursor·Codex엔 한 줄로 설치해서 채팅으로 호출 ✔ 내 코드·작업에서 반복되는 일을 찾아 루프로 만들어주기도 함 ✔ 이미 있는 루프를 내 환경에 맞게 고쳐 쓰기 루프 엔지니어링을 시도해보고 싶었다면 꼭 저장해두세요!

yozm_it's tweet photo. 📌 요즘 뜨는 '루프 엔지니어링', 바로 따라 해볼 프롬프트 저장소(무료)

반복 작업을 AI에 맡기는 루프(작업 템플릿)를 모아둔 저장소
직접 짤 필요 없이, 골라서 바로 가져다 쓰면 돼요!!

할 수 있는 것:
✔ 카탈로그에서 원하는 루프 골라 프롬프트 복사 → 바로 실행 (설치 필요 없음)
✔ Claude Code·Cursor·Codex엔 한 줄로 설치해서 채팅으로 호출
✔ 내 코드·작업에서 반복되는 일을 찾아 루프로 만들어주기도 함
✔ 이미 있는 루프를 내 환경에 맞게 고쳐 쓰기

루프 엔지니어링을 시도해보고 싶었다면 꼭 저장해두세요!

149

o2allergic retweeted

차분

@steady_note

5 days ago

동해에서 가장 조용하고 아름다운 고성 고성 아름다운 감성 숙소 모음

315

89K

o2allergic retweeted

감자

@nowlovepan

5 days ago

이건 진짜 AI의 무서운 사용법 같음 (프롬프트) 단순히 “코드 고쳐줘”가 아니라 목표를 주고, 기능 검토 → 사용자 스토리 작성 → 테스트 → 오류 문서화 → 수정 → 재검증까지 계속 돌리는 방식임 이러면 AI가 알아서 목표를 이뤄낼때까지 반복작업을 하고 사람이 하루종일 매달려야 할 일도 AI가 다 해결해놓음.. 이젠 AI를 잘 쓰는 사람은 이런 루프를 잘 설계하는 사람임 아래 goal 프롬프트⬇️ "/goal 이 앱의 모든 기능을 하나하나 검토하고 코드에 기반한 예상 동작으로 사용자 ��토리를 작성하고 기능 상태를 추적하는 단일 표준 스프레드시트를 유지하세요 - 완료되면 루프를 모든 사용자 스토리 테스트와 모든 오류 문서화로 전환하세요 - 완료되면 모든 물류 오류나 UX 오류를 수정하세요 - 수정 후 모든 사용자 동작을 다시 테스트하세요"

215

298

20K