Yaser Martinez @elyase - Twitter Profile

4 days ago

@jherr the loop was always there: read issue, write code, review PR, ... better models just move more of it from human tab-switching to AI autonomy

0

10

elyase retweeted

Marie Haynes

@Marie_Haynes

14 days ago

This is really big news. Google introduced the Open Knowledge Format (OKF) - a standardized way to store information in a directory of markdown files. Makes it really easy to make a digital brain that agents can use. These files can serve as a living wiki. You can give agents the ability to query them or edit them. They can interlink. Seems to me this could replace Notion or Obsidian. I can think of so many uses for this. Google's blog post: https://t.co/DqSjg4UpvH An easier to understand explanation is the SPEC.md file: https://t.co/A3qSz3Tfas I gave those two links to Antigravity and asked how we could use it for any of the projects we're working on. It came up with so many ideas. I would imagine Claude Fable 5 would whip up some pretty amazing things based on this system. Currently creating an OKF library of our pepper garden. It's going to be a fun weekend.

Marie_Haynes's tweet photo. This is really big news. Google introduced the Open Knowledge Format (OKF) - a standardized way to store information in a directory of markdown files. Makes it really easy to make a digital brain that agents can use.

These files can serve as a living wiki. You can give agents the ability to query them or edit them. They can interlink.

Seems to me this could replace Notion or Obsidian. I can think of so many uses for this.

Google's blog post: https://t.co/DqSjg4UpvH

An easier to understand explanation is the SPEC.md file:
https://t.co/A3qSz3Tfas

I gave those two links to Antigravity and asked how we could use it for any of the projects we're working on. It came up with so many ideas. I would imagine Claude Fable 5 would whip up some pretty amazing things based on this system.

Currently creating an OKF library of our pepper garden. It's going to be a fun weekend.

171

7K

823

13K

1M

elyase retweeted

wevm @wevm_dev

26 days ago

Introducing Vocs v2: a minimal docs framework designed for agents and humans. Flexible docs that stay simple at the source, rich in the browser, and easy for agents to consume. $ 𝚗𝚙𝚖 𝚒𝚗𝚒𝚝 𝚟𝚘𝚌𝚜

wevm_dev's tweet photo. Introducing Vocs v2: a minimal docs framework designed for agents and humans.

Flexible docs that stay simple at the source, rich in the browser, and easy for agents to consume.

$ 𝚗𝚙𝚖 𝚒𝚗𝚒𝚝 𝚟𝚘𝚌𝚜 https://t.co/ZrQXIUeus2

3

59

9

49

18K

elyase retweeted

banteg

@banteg

29 days ago

first impression of claude 4.8 is it's extremely convincing but still a slopus. tried it to criticize a new project and it identified it fell into a local minima and invented a new parser for when we could've used ast. almost convinced me, glad i checked myself that ast is not emitted in older versions of the compiler we are targeting. codex chose a gnarly but ultimately justified approach. claude didn't bother to verify any of its claims and has used absolutist language like "delete https://t.co/zuys0EhoHP", which is basically 80% of the codebase. when presented with evidence: > That contradicts my earlier byte-count check, and it matters enormously > My earlier "v0.2.9" was a double false-positive (a git log -S hit on an internal symbol, plus a verification grep that mis-read a VersionException as success). Corrected in the review with a note owning the error the biggest bullshitter model in the world! if you rely on claude for anything, god help you.

45

506

33

94

84K

Who to follow

Sir Knumskull

@sir__knumskull

🤓 C++ Software Engineer 👨‍💼 He/Him

Data analysis, machine learning and management reporting at European Research Council Executive Agency.

elyase retweeted

Georgios Konstantopoulos

@gakonst

about 1 month ago

Open Sourcing Centaur: Multiplayer, self-hosted, secure agents for Slack. Centaur has been transforming how @paradigm and @tempo invest, build and research. Now you can run it yourself on infrastructure you control. Instructions below.

100

1K

113

1K

626K

elyase retweeted

Hayden Bleasel

@haydenbleasel

about 2 months ago

Introducing Files SDK A unified storage SDK for object and blob backends. One small, honest API. Web-standards I/O. An escape hatch when you need the native client. → 18 providers - S3, R2, Vercel Blob, Google Drive, etc. → upload, download, head, delete, copy, list, url → Works everywhere - Node, Bun, Deno, edge runtimes, browsers → Tools for OpenAI, Vercel AI and Claude Agents SDKs

75

2K

110

2K

246K

elyase retweeted

Michael Livs

@micLivs

about 2 months ago

https://t.co/UzashBiKMY

0

4

2

3

672

elyase retweeted

Zecheng Zhang

@zechengzh

about 2 months ago

Introducing Mirage, a unified virtual filesystem for AI agents! 6 weeks. 1.1M+ lines of code. We rewrote bash from the ground up so cat, grep, head, and pipes work across heterogeneous services. S3, Google Drive, Slack, Gmail, GitHub, Linear, Notion, Postgres, MongoDB, SSH, and more, all mounted side-by-side as one filesystem. Bash that AI agents already know works on every format! cat, grep, head, and wc parse .parquet, .csv, .json, .h5, even .wav! One pipe can stitch S3, Drive, GitHub, Slack, and Linear together, same Unix semantics throughout. Workspaces are versioned too. Snapshot, clone, and roll back the whole thing with one API call. A two-layer cache turns repeated reads into local lookups, so agent loops stay fast and cheap. Drop a Workspace into FastAPI, Express, or a browser app. Wire it into OpenAI Agents SDK, Vercel AI SDK, LangChain, Mastra, or Pi. Run it alongside Claude Code and Codex. Site: https://t.co/zo1orc2wA9 GitHub: https://t.co/zeRAKri7I9 #AIAgents #OpenSource #AgenticAI #Strukto #Filesystem #VFS

zechengzh's tweet photo. Introducing Mirage, a unified virtual filesystem for AI agents!

6 weeks. 1.1M+ lines of code. We rewrote bash from the ground up so cat, grep, head, and pipes work across heterogeneous services. S3, Google Drive, Slack, Gmail, GitHub, Linear, Notion, Postgres, MongoDB, SSH, and more, all mounted side-by-side as one filesystem.

Bash that AI agents already know works on every format! cat, grep, head, and wc parse .parquet, .csv, .json, .h5, even .wav! One pipe can stitch S3, Drive, GitHub, Slack, and Linear together, same Unix semantics throughout.

Workspaces are versioned too. Snapshot, clone, and roll back the whole thing with one API call. A two-layer cache turns repeated reads into local lookups, so agent loops stay fast and cheap.

Drop a Workspace into FastAPI, Express, or a browser app. Wire it into OpenAI Agents SDK, Vercel AI SDK, LangChain, Mastra, or Pi. Run it alongside Claude Code and Codex.

Site: https://t.co/zo1orc2wA9
GitHub: https://t.co/zeRAKri7I9

#AIAgents #OpenSource #AgenticAI #Strukto #Filesystem #VFS

172

3K

338

5K

621K

elyase retweeted

Zyphra

@ZyphraAI

about 2 months ago

Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density. With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵

ZyphraAI's tweet photo. Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density.

With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵 https://t.co/URTj1br9tw

99

2K

290

2K

1M

elyase retweeted

kache

@yacineMTB

5 months ago

you can outsource your thinking but you cannot outsource your understanding

284

19K

4K

6K

3M

elyase retweeted

Teknium 🪽

@Teknium

about 2 months ago

Introducing Hermes Curator! The new system built in to Hermes Agent now helps you keep your skills that the self improvement loop creates in check, by consolidating and pruning automatically. The curator does multiple things: - keeps track of how often you use each skill, when it was last updated/created, etc - Once a week runs automatically (configurable) - Uses the analytics plus it's own scanning of your skills and consolidates or prunes them if necessary - Skips externally installed skills, built in skills, and skills you "pin" that you dont' want touched. It will only attempt curation over agent created/updated skills or user written skills. - It will then determine whether skills can be consolidated, pruned, or otherwise made more manageable. It will convert some skills that are too specific into references, templates or scripts for larger/broader skills, or integrate them directly into a consolidation of an existing skill. You can also disable it entirely in the config.yaml and/or run it manually with `hermes curator run ` Learn more on the docs here: https://t.co/6woLLRtDLP

Teknium's tweet photo. Introducing Hermes Curator!

The new system built in to Hermes Agent now helps you keep your skills that the self improvement loop creates in check, by consolidating and pruning automatically.

The curator does multiple things:

- keeps track of how often you use each skill, when it was last updated/created, etc

- Once a week runs automatically (configurable)

- Uses the analytics plus it's own scanning of your skills and consolidates or prunes them if necessary

- Skips externally installed skills, built in skills, and skills you "pin" that you dont' want touched. It will only attempt curation over agent created/updated skills or user written skills.

- It will then determine whether skills can be consolidated, pruned, or otherwise made more manageable. It will convert some skills that are too specific into references, templates or scripts for larger/broader skills, or integrate them directly into a consolidation of an existing skill.

You can also disable it entirely in the config.yaml and/or run it manually with `hermes curator run `

Learn more on the docs here:

https://t.co/6woLLRtDLP

131

2K

161

2K

477K

elyase retweeted

Thomas Ricouard

@Dimillian

about 2 months ago

A new feature sneaked in the Codex app’s latest update. You can now do /side (or use the ... menu) to spawn a side chat! Useful when you're deep in a thread and want to have a side question in the current context!

Dimillian's tweet photo. A new feature sneaked in the Codex app’s latest update. You can now do /side (or use the ... menu) to spawn a side chat! Useful when you're deep in a thread and want to have a side question in the current context! https://t.co/9tywlE5GAp

87

1K

72

373

184K

elyase retweeted

chiefofautism

@chiefofautism

2 months ago

openai built a model that HIDES personal data in text so nothing leaks i flipped it INSIDE OUT same 1.5B weights, same label taxonomy, but instead of masks you get structured spans, name, email, phone, bank account, address, secrets, char offsets and all point it at logs, dumps, stolen inboxes and it just... returns every private thing in the pile

chiefofautism's tweet photo. openai built a model that HIDES personal data in text so nothing leaks

i flipped it INSIDE OUT

same 1.5B weights, same label taxonomy, but instead of masks you get structured spans, name, email, phone, bank account, address, secrets, char offsets and all

point it at logs, dumps, stolen inboxes and it just... returns every private thing in the pile

57

2K

103

2K

135K

elyase retweeted

Cua @trycua

2 months ago

We're open-sourcing Cua Driver - our new macOS driver that lets any agent (Claude Code, Codex, your own loop) drive any app in the background, with true multi-player and multi-cursor built-in. 1/8

trycua's tweet photo. We're open-sourcing Cua Driver - our new macOS driver that lets any agent (Claude Code, Codex, your own loop) drive any app in the background, with true multi-player and multi-cursor built-in.

1/8 https://t.co/EQT1QwPOBQ

64

2K

173

2K

241K

elyase retweeted

Artificial Analysis

@ArtificialAnlys

2 months ago

GPT-5.5 takes OpenAI back to the clear number one in AI. OpenAI’s new model tops the Artificial Analysis Intelligence Index by 3 points, breaking a three-way tie with Anthropic and Google OpenAI gave us pre-release access to test all five reasoning effort levels: xhigh, high, medium, low and non-reasoning. ➤ OpenAI topping five headline evaluations: GPT-5.5 (xhigh) leads Terminal-Bench Hard, GDPval-AA and our newly hosted APEX-Agents-AA. The model trails only other OpenAI models in CritPt and AA-LCR, and comes second to Gemini 3.1 Pro Preview on three additional evaluations. The largest gains are on AA-Omniscience (+14 pts), our knowledge and hallucination benchmark, and τ²-Bench Telecom (+7 pts), a customer service agent benchmark. ➤ 20% more expensive to run our Intelligence Index: Per-token pricing has doubled from GPT-5.4 to $5/$30 per 1M input/output tokens. However, a ~40% token use reduction largely absorbs the hike - resulting in a net ~+20% cost to run our Intelligence Index. ➤ Effort a clear ladder for balancing intelligence and cost: GPT-5.5 (medium) scores the same as Claude Opus 4.7 (max) on our Intelligence Index at one quarter of the cost (~$1,200 vs $4,800) - although Gemini 3.1 Pro Preview scores the same at a cost of ~$900. GPT-5.5 (low) approximates Claude Opus 4.7 (Non-reasoning, high) on our Intelligence Index at half the cost to run (~$500 vs ~$1 ,000). ➤ Number one in GDPval-AA with an Elo of 1785: GPT-5.5 (xhigh) leads Claude Opus 4.7 (max) by ~30 pts and Gemini 3.1 Pro Preview by ~470 pts. GDPval-AA is Artificial Analysis’ benchmark that leverages OpenAI’s GDPval dataset to evaluate models on real-world economically valuable tasks. ➤ Top AA-Omniscience accuracy, but trailing the frontier on hallucination: Our private AA-Omniscience benchmark rewards factual knowledge across diverse topics, but punishes hallucination. GPT-5.5 (xhigh) has the highest accuracy at 57% - meaning the model can recall facts in the Omniscience corpus more effectively than any other model. However, it has a hallucination rate of 86% - vs Opus 4.7 (max) at 36%, and Gemini 3.1 Pro Preview at 50%. This makes it more likely to answer a question when it does not ‘know’ the answer. The 14 pt gain in AA-Omniscience from GPT-5.4 (xhigh) was largely driven by knowledge, with a modest improvement in hallucination. Congratulations to the team at @OpenAI and @sama on the launch

ArtificialAnlys's tweet photo. GPT-5.5 takes OpenAI back to the clear number one in AI. OpenAI’s new model tops the Artificial Analysis Intelligence Index by 3 points, breaking a three-way tie with Anthropic and Google

OpenAI gave us pre-release access to test all five reasoning effort levels: xhigh, high, medium, low and non-reasoning.

➤ OpenAI topping five headline evaluations: GPT-5.5 (xhigh) leads Terminal-Bench Hard, GDPval-AA and our newly hosted APEX-Agents-AA. The model trails only other OpenAI models in CritPt and AA-LCR, and comes second to Gemini 3.1 Pro Preview on three additional evaluations. The largest gains are on AA-Omniscience (+14 pts), our knowledge and hallucination benchmark, and τ²-Bench Telecom (+7 pts), a customer service agent benchmark.

➤ 20% more expensive to run our Intelligence Index: Per-token pricing has doubled from GPT-5.4 to $5/$30 per 1M input/output tokens. However, a ~40% token use reduction largely absorbs the hike - resulting in a net ~+20% cost to run our Intelligence Index.

➤ Effort a clear ladder for balancing intelligence and cost: GPT-5.5 (medium) scores the same as Claude Opus 4.7 (max) on our Intelligence Index at one quarter of the cost (~$1,200 vs $4,800) - although Gemini 3.1 Pro Preview scores the same at a cost of ~$900. GPT-5.5 (low) approximates Claude Opus 4.7 (Non-reasoning, high) on our Intelligence Index at half the cost to run (~$500 vs ~$1 ,000).

➤ Number one in GDPval-AA with an Elo of 1785: GPT-5.5 (xhigh) leads Claude Opus 4.7 (max) by ~30 pts and Gemini 3.1 Pro Preview by ~470 pts. GDPval-AA is Artificial Analysis’ benchmark that leverages OpenAI’s GDPval dataset to evaluate models on real-world economically valuable tasks.

➤ Top AA-Omniscience accuracy, but trailing the frontier on hallucination: Our private AA-Omniscience benchmark rewards factual knowledge across diverse topics, but punishes hallucination. GPT-5.5 (xhigh) has the highest accuracy at 57% - meaning the model can recall facts in the Omniscience corpus more effectively than any other model. However, it has a hallucination rate of 86% - vs Opus 4.7 (max) at 36%, and Gemini 3.1 Pro Preview at 50%. This makes it more likely to answer a question when it does not ‘know’ the answer. The 14 pt gain in AA-Omniscience from GPT-5.4 (xhigh) was largely driven by knowledge, with a modest improvement in hallucination.

Congratulations to the team at @OpenAI and @sama on the launch

63

2K

208

278

265K

elyase retweeted

OpenAI

@OpenAI

2 months ago

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done. Now available in ChatGPT and Codex.

2K

51K

7K

9K

13M

elyase retweeted

elie

@eliebakouch

2 months ago

kimi K2.6 vs K2.5, mythos, opus 4.7, and cursor composer 2 (based on K2.5) on every benchmark i could find tl;dr: it's a really really good model

eliebakouch's tweet photo. kimi K2.6 vs K2.5, mythos, opus 4.7, and cursor composer 2 (based on K2.5) on every benchmark i could find

tl;dr: it's a really really good model https://t.co/7rtKCSfaNO

49

2K

134

507

150K

elyase retweeted

Yoonho Lee

@yoonholeee

2 months ago

We just released code for Meta-Harness! https://t.co/OdU7zocdPl Aside from replicating paper experiments, the repo is designed to help users implement good Meta-Harnesses in completely new domains! Just point your agent at ONBOARDING.md and have a conversation

yoonholeee's tweet photo. We just released code for Meta-Harness! https://t.co/OdU7zocdPl

Aside from replicating paper experiments, the repo is designed to help users implement good Meta-Harnesses in completely new domains! Just point your agent at ONBOARDING.md and have a conversation https://t.co/0H6Zrvg8FQ

26

1K

163

1K

125K

elyase retweeted

OpenAI

@OpenAI

3 months ago

Our existing $200 Pro tier still remains our highest usage option. And as a thank you to our existing Pro users on the $200 tier, we’re extending our 2x Codex usage promo (until May 31st) and we’ve reset your Codex rate limits (yes, again).

376

5K

294

435

842K

elyase retweeted

Rivet

@rivet_dev

3 months ago

Say hello to agentOS (beta) A portable open-source OS built just for agents. Powered by WASM & V8 isolates. 🔗 Embedded in your backend ⚡ ~6ms coldstarts, 32x cheaper than sbxs 📁 Mount anything as a file system (S3, SQLite, …) 🥧 Use Pi, Claude Code/Codex/Amp/OpenCode soon

58

1K

80

1K

266K

Yaser Martinez

@elyase

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users