MT

Verified account

@MatthewZMD

I was an Emacs hacker, a photographer and a multidisciplinary artist. Now I am building biomedical AI scientists.

Joined June 2025

47 Following

55 Followers

124 Posts

Pinned Tweet

4 days ago

大家好，这是基诺米 Genomi，一套开源的 agent harness，它能把你的 AI agent 直接变成一个懂你 DNA 的私人专家。几年前我做过一次 DNA 检测。跟大多数人一样，我把报告拿到手，看到几条有意思的，然后就把它忘到脑后了。最近我试着把这份数据交给我的 agent，马上就意识到，DNA 对个人健康真的很有用。但有几个问题挡在前面： > 通用 AI 经常说得头头是道，其实是错的； > 静态的 DNA 报告跟不上新的研究； > 而你的 DNA，本就该留在自己的机器上，而不是上传到某个网站。所以我们做了基诺米：本地优先、agent 原生、能自我进化，而且一切以证据为本。

2

19

5

22

6K

about 14 hours ago

@GavinRayDev @itsEmZee_ Hey Gavin, I’ve made some changes which I think it may fix it, can I run your vcf to verify? I’ve requested access to the file on google drive

0

0

0

0

19

about 17 hours ago

@GavinRayDev @itsEmZee_ I’m glad! Do you mind send me the exact schema errors or your trace with codex you saw? There are a lot of different moving parts I must’ve miss one somewhere!

1

1

0

0

34

about 20 hours ago

@gptsiolis @itsEmZee_ Hey George, please run genomi update and parse your file again!

0

0

0

0

9

1 day ago

@MarkJCarney Gotta start harnessing AI harnesses https://t.co/46PiLgQjY5

4 days ago

Introducing Genomi: an open-source agent harness that turns your AI agent into your personal DNA expert. I took a DNA test years ago. Like a lot of people, I got the report, found something interesting, and forgot about it. Recently I gave the data to my codex agent and it was obvious how incredibly useful DNA is for personal health, but: > General AI can sound right while being wrong > Static DNA reports can’t keep up with new science > DNA data should stay on your local device, not uploaded to a website So we built Genomi, local-first, agent-native, self-evolving, evidence-grounded.

27

473

46

543

99K

0

2

0

0

66

MatthewZMD retweeted

2 days ago

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx

2K

27K

4K

15K

17M

MatthewZMD retweeted

2 days ago

Introducing Agent Arena: real-world agentic evals at scale. How do you evaluate agents doing actual work? We measure millions of live sessions where real users accomplish real tasks. On Arena, models now get web search, filesystem, and terminal tools to complete complex workflows: writing code, creating slide deck, researching the web, building apps, and analyzing documents. Every session produces rich signals. Users iterate with the agent turn-by-turn: approving, editing, correcting, praise or expressing frustration. The environment gives feedback too: shell errors, tool failures, recovery attempts, and more. Our leaderboard measures each model's agentic performance using causal inference across five signals: task success, steerability, error recovery, user praise vs. complaint, and tool hallucination. This leaderboard snapshot is built from 300K+ tasks, 2M+ tool calls, and 40M lines of code by agents. Top labs in Agent Arena: - #1 @OpenAI: GPT-5.5 (High) - #2 @AnthropicAI: Claude-Opus-4.7 (Thinking) - #3 @Zai_org: GLM-5.1 - #4 @GoogleDeepMind: Gemini-3.1-Pro - #5 @Kimi_Moonshot: Kimi-K2.6 More analysis in the thread, with the full technical blog below.

arena's tweet photo. Introducing Agent Arena: real-world agentic evals at scale.

How do you evaluate agents doing actual work? We measure millions of live sessions where real users accomplish real tasks.

On Arena, models now get web search, filesystem, and terminal tools to complete complex workflows: writing code, creating slide deck, researching the web, building apps, and analyzing documents.

Every session produces rich signals. Users iterate with the agent turn-by-turn: approving, editing, correcting, praise or expressing frustration. The environment gives feedback too: shell errors, tool failures, recovery attempts, and more.

Our leaderboard measures each model's agentic performance using causal inference across five signals: task success, steerability, error recovery, user praise vs. complaint, and tool hallucination.

This leaderboard snapshot is built from 300K+ tasks, 2M+ tool calls, and 40M lines of code by agents.

Top labs in Agent Arena:
- #1 @OpenAI: GPT-5.5 (High)
- #2 @AnthropicAI: Claude-Opus-4.7 (Thinking)
- #3 @Zai_org: GLM-5.1
- #4 @GoogleDeepMind: Gemini-3.1-Pro
- #5 @Kimi_Moonshot: Kimi-K2.6

More analysis in the thread, with the full technical blog below.

68

1K

143

322

345K

2 days ago

Recursive self improvement is the way.

2 days ago

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx

2K

27K

4K

15K

17M

0

1

0

0

119

2 days ago

@YashHustle_22 Not something I can do but Codex cannot do, but something Codex couldn’t do at all, but now I made it able to do: https://t.co/z2YlTBRMng

4 days ago

Introducing Genomi: an open-source agent harness that turns your AI agent into your personal DNA expert. I took a DNA test years ago. Like a lot of people, I got the report, found something interesting, and forgot about it. Recently I gave the data to my codex agent and it was obvious how incredibly useful DNA is for personal health, but: > General AI can sound right while being wrong > Static DNA reports can’t keep up with new science > DNA data should stay on your local device, not uploaded to a website So we built Genomi, local-first, agent-native, self-evolving, evidence-grounded.

27

473

46

543

99K

0

2

0

1

658

3 days ago

Codex reset time!

3 days ago

Hi. Over the last 24 hours we had three separate small incidents that affected Codex reliability. Those are three too many and we are taking active steps for them to not reproduce. I have reset usage limits for Codex across all paid plans. May the tokens flow again.

1K

11K

514

501

1M

0

1

0

0

78

3 days ago

@gptsiolis @itsEmZee_ I'm doing a comprehensive scan across the project and let you know when it's done! Our tests need some strenghening

0

1

0

0

27

3 days ago

@gptsiolis @itsEmZee_ We suspect the problem is due to we had an AGI schema change that a path in 23andMe got left out, so a mismatch, I tried it on some 23andMe files and they seem to work.

1

1

0

0

66

3 days ago

@gptsiolis @itsEmZee_ Please run `/genomi update`, restart MCP server / your agent and try again!

0

1

0

0

26

3 days ago

@gptsiolis @itsEmZee_ Is it a 23andme file?

1

1

0

0

34

MatthewZMD retweeted

3 days ago

Your AI agent cant read your DNA file. Genomi turns your genome into a queryable local database that your AI agent can interact with.

3

100

6

36

9K

3 days ago

Genomi is officially on @ProductHunt Genomi transforms your massive DNA raw data into contexually manageable, queryable Active Genome Index that you can trust. https://t.co/qSvywCVQxy

0

0

0

0

38

3 days ago

@cryptobyHash @Scobleizer One potential improvement to Active Genome Index is to secure it!

0

0

0

0

4

3 days ago

@carb1n_ @itsEmZee_ You the first person that gets it 👀

0

1

0

0

5

MatthewZMD retweeted

Google AI Developers

4 days ago

Building autonomous agents for scientific discovery? 🧬🤖 @GoogleDeepMind Science Skills is now available on GitHub. We've open-sourced this specialized toolkit to accelerate your agentic workflows with scientific grounding and higher token efficiency. Download now ↓ https://t.co/cwp1HOeKvo

31

2K

267

1K

87K

3 days ago

@hnshah My favourite: https://t.co/McOfM7B9BG

0

2

0

0

68

3 days ago

@FengXiong79932 3q 3q 🤩

0

0

0

0

25

4 days ago

大家好，这是基诺米 Genomi，一套开源的 agent harness，它能把你的 AI agent 直接变成一个懂你 DNA 的私人专家。几年前我做过一次 DNA 检测。跟大多数人一样，我把报告拿到手，看到几条有意思的，然后就把它忘到脑后了。最近我试着把这份数据交给我的 agent，马上就意识到，DNA 对个人健康真的很有用。但有几个问题挡在前面： > 通用 AI 经常说得头头是道，其实是错的； > 静态的 DNA 报告跟不上新的研究； > 而你的 DNA，本就该留在自己的机器上，而不是上传到某个网站。所以我们做了基诺米：本地优先、agent 原生、能自我进化，而且一切以证据为本。

2

19

5

22

6K

Last Seen Users on Sotwe

Trends for you

Most Popular Users