been asking others at Anthropic how they stay in the loop with Claude and fully understand the work being done
this is one of my favorites from Suzanne:
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
We are hiring research fellows to help us improve FrontierSWE!
If you want to help build the hardest real-world coding benchmark, reach out! Fellows can work with us for a few weeks up to months and will be supported with compute and a generous stipend
https://t.co/KL5va5ydAe
This GPT Image 2 prompt is going insanely viral right now.
“Redraw the attached image in the most clumsy, scribbly, and utterly pathetic way possible. Use a white background, and make it look like it was drawn in MS Paint with a mouse. It should be vaguely similar but also not really, kind of matching but also off in a confusing, awkward way, with that low-quality pixel-by-pixel feel that really emphasizes how ridiculously bad it is. Actually, you know what, whatever, just draw it however you want.”
https://t.co/61ZjRvuJAs
This post from @genmon feels like it could’ve been written yesterday, but it’s from 2022 lol. Wow.
If someone had been recording and connecting all of this context since then, the depth of that graph by now would be insane.
🧩 카카오, ‘플레이MCP’에 오픈클로 연동 지원! “나만의 AI 비서 직접 만든다”
카카오가 모델 컨텍스트 프로토콜(MCP) 기반 개방형 플랫폼 **플레이MCP(PlayMCP)**에 오픈소스 AI 에이전트 오픈클로(OpenCLO) 연동을 공식 지원한다고 오늘 발표했습니다.
이제 누구나 로컬 PC에 오픈클로를 설치해 자신만의 AI 비서를 직접 만들고 운영할 수 있게 됐어요.
핵심 기능
•카카오톡 톡캘린더·카카오맵·선물하기·멜론 등 카카오 주요 서비스 + 외부 MCP 서버 200여 개와 자유 연결
•메신저(카카오톡 등)를 채널로 활용해 자연어로 작업 지시 → 결과 자동 수신
•예시: “판교 주변 5년차 이하 서버 개발자 채용공고를 하루에 한 번씩 찾아서 알려줘”
•보안: 발급 후 10분간만 유효한 원타임 토큰 적용, 연동 해제도 즉시 가능
왜 주목하나?
MCP 개발자들이 만든 서버를 다양한 AI 서비스와 연결해 실험·확장할 수 있는 진짜 개방형 플랫폼이 현실화됐습니다. 로컬 LLM + 외부 API 모델 모두 지원하며, 사용자가 직접 AI 에이전트를 커스터마이징하는 시대가 한 걸음 더 가까워졌어요.
카카오 AI 커넥트 성과리더 유용하 “MCP 개발자들이 자신이 만든 서버를 다양한 AI 서비스와 연결해 실험하고 확장할 수 있는 개방성이 플레이MCP의 지향점”이라고 밝혔습니다.
AI 비서 하나쯤은 직접 만들어 쓰고 싶은 분들, 지금 시작할 타이밍입니다!
#카카오 #PlayMCP #오픈클로 #MCP #AI에이전트 #AI비서 #오픈소스
Este desarrollador indio muestra cómo ChatGPT Images 2.0 diseña la UI completa de tu app o videojuego desde un solo prompt.
Interfaces listas para producción, coherentes y usables de verdad, sin tocar Figma ni contratar un diseñador.
Un prompt. UI completa. Lista para tu app.
We did a prototyping workshop internally
then suddenly one agent starts building backend stuff
👉 PM has no idea what’s happening
just sits there waiting
we were seriously thinking
“do we need to build our own tool for this?”
tried Claude Design and… wow
this problem just disappears
this isn’t just for designers
it’s actually amazing for PMs too
Introducing Claude Design by Anthropic Labs: make prototypes, slides, and one-pagers by talking to Claude.
Powered by Claude Opus 4.7, our most capable vision model. Available in research preview on the Pro, Max, Team, and Enterprise plans, rolling out throughout the day.
Agent evals are drifting away from production reality.
Most benchmarks use clean tasks, well-specified requirements, deterministic metrics, and retrospective curation. Production work is messier, with implicit constraints, fragmented multimodal inputs, undeclared domain knowledge, long-horizon deliverables, and expert judgment that evolves over time.
This paper introduces AlphaEval, a production-grounded benchmark for evaluating agents as complete products.
AlphaEval contains 94 tasks sourced from seven companies deploying AI agents in core business workflows, spanning six O*NET domains. It evaluates systems like Claude Code and Codex as commercial agent products, not just model APIs.
The benchmark combines multiple evaluation paradigms: LLM-as-a-Judge, reference-driven metrics, formal verification, rubric-based assessment, automated UI testing, and domain-specific checks.
Why it matters: organizations need benchmarks that start from real production requirements, then become executable evals with minimal friction.
Paper: https://t.co/cbTGgTWoNl
Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
Codex for (almost) everything.
It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.
Peguei um Uber que tava mandando áudio pra um brother no WhatsApp falando que precisa ter um GitHub com projetos bons pras consultorias te chamarem
Não sei se isso é um sinal de topo ou de fundo pro mercado