Ryan Kosai @rkosai - Twitter Profile

Pinned Tweet

almost 6 years ago

I learned about programming from my dad. That's how I know the first rule of software is to number your punch cards in case you drop them.

1

21

1

0

rkosai retweeted

Applied Scientific Intelligence

@AppliedSciAI

19 days ago

Anyone using AI in biology knows the feeling: a perfectly legitimate research request throws a [CONTENT_FILTERED] error because a frontier model decided it looked like a biosecurity risk. We're releasing RefusalBench, an open benchmark for auditing the refusal accuracy of frontier models across biological risk tiers. Findings from our new preprint: • Anthropic models are roughly 21X more likely to refuse than the non-Anthropic baseline on the same prompts. • The Anthropic effect looks like infrastructure-level filtering, not per-prompt reasoning: 99.8% of Anthropic's 2,223 strict refusals share one canonical reason code. • Grok 4.20 is the best-calibrated model, catching 81.7% of dual-use prompts while refusing just 3.0% of benign ones. • High refusal rate ≠ high safety: The highest-refusing model isn't the best at catching genuinely dangerous requests - it's just refusing more of everything. You can now your test own orchestrator model with RefusalBench and find which subdomain-tier intersections will silently kill your pipeline before it happens in production. Links below to the preprint and RefusalBench on Hugging Face.

2

14

2

4

879

Ryan Kosai

@rkosai

about 2 months ago

An AI that tastes good, but prepare it wrong and it results in paralysis and death?

Sakana AI

@SakanaAILabs

about 2 months ago

We’re launching the beta for our new commercial AI product: Sakana Fugu 🐡, a multi-agent orchestration system! Blog: https://t.co/36Ud311KCP Fugu hits SOTA on SWE-Pro, GPQA-D, and ALE-Bench, and has been our internal secret weapon. It dynamically coordinates frontier models, autonomously selecting the optimal agent combinations and roles for each task. Available as an OpenAI-compatible API, you can seamlessly integrate Fugu into your existing workflows with minimal changes. 🐟 Fugu Mini: High-speed orchestration optimized for latency 🐡 Fugu Ultra: Full model pool utilization for deep, complex reasoning Apply for the beta test here: https://t.co/1fjuAha7ci

SakanaAILabs's tweet photo. We’re launching the beta for our new commercial AI product: Sakana Fugu 🐡, a multi-agent orchestration system!

Blog: https://t.co/36Ud311KCP

Fugu hits SOTA on SWE-Pro, GPQA-D, and ALE-Bench, and has been our internal secret weapon. It dynamically coordinates frontier models, autonomously selecting the optimal agent combinations and roles for each task.

Available as an OpenAI-compatible API, you can seamlessly integrate Fugu into your existing workflows with minimal changes.

🐟 Fugu Mini: High-speed orchestration optimized for latency
🐡 Fugu Ultra: Full model pool utilization for deep, complex reasoning

Apply for the beta test here: https://t.co/1fjuAha7ci

28

706

161

335

367K

0

1

0

93

Ryan Kosai

@rkosai

3 months ago

@policytensor Regardless of what you think would actually happen, there is only one right answer in a Twitter poll. MAD always remains above suspicion.

0

127

Who to follow

chase

@chasews

Memetic fitness enjoyer. Seeker of culture engineers for social and parasocial relationships. Normie dilettante zeitgeist engineering hobbyist. Long text.

🇳🇴 NAFO dog exposing 🗑️🇷🇺 narratives and lobby. Listen to Ukrainians and donate to @BackAndAlive or other Ukrainian NGOs 🇳🇴❤️🇺🇦

Ryan Kosai

@rkosai

4 months ago

Very cool to @joyjiao12 talking about their lab-in-the-loop, and seeing @jrkelly / @Nick___Edwards about the future of automation.

Ginkgo Bioworks

@Ginkgo

4 months ago

In the last three months, we've made two announcements that have offered a glimpse of the future of lab automation: a new 97-instrument autonomous lab at the @EMSLscience at @PNNLab and our work with @OpenAI's GPT-5 to achieve a 40% improvement over state-of-the-art in cell-free protein synthesis. Today at #SLAS2026, catch Joy Jiao of OpenAI, Todd Edwards of EMSL/PNNL, and our very own Will Serber for a tutorial on designing, deploying, and scaling autonomous labs at 12:00pm | Register at: https://t.co/VcRsgDLVa8 Then, at 1:00pm watch our CEO @jrkelly and @Nick___Edwards of @readysetpotato share an insider’s view of how leading organizations are deploying automation today at their NexusXp Fireside Chat, "The Road to Self-Driving Labs." Register at: https://t.co/Uwe1QUb0zd

Ginkgo's tweet photo. In the last three months, we've made two announcements that have offered a glimpse of the future of lab automation: a new 97-instrument autonomous lab at the @EMSLscience at @PNNLab and our work with @OpenAI's GPT-5 to achieve a 40% improvement over state-of-the-art in cell-free protein synthesis.

Today at #SLAS2026, catch Joy Jiao of OpenAI, Todd Edwards of EMSL/PNNL, and our very own Will Serber for a tutorial on designing, deploying, and scaling autonomous labs at 12:00pm | Register at: https://t.co/VcRsgDLVa8

Then, at 1:00pm watch our CEO @jrkelly and @Nick___Edwards of @readysetpotato share an insider’s view of how leading organizations are deploying automation today at their NexusXp Fireside Chat, "The Road to Self-Driving Labs." Register at: https://t.co/Uwe1QUb0zd

1

52

10

7

4K

0

1

0

125

Ryan Kosai

@rkosai

4 months ago

@Osint613 https://t.co/v1yxc30jjq

0

29

Ryan Kosai

@rkosai

5 months ago

@alexhatchspence When you’re Napoleon, the army they send to arrest you defects.

0

8

Ryan Kosai

@rkosai

5 months ago

@yoheinakajima Part of it now is hacking the stochasticity. KTB wins but you probably have to spam it quite a bit to get it to catch. Maybe run the model 10 times and require n of 10 matches to win.

0

13

rkosai retweeted

𝚟𝚒𝚎 ⟢

@viemccoy

5 months ago

kind of insane how everyone seemed to think prompt engineering would be important for like 2 months and kind of laughed at it and now it is genuinely one of the most important skills and can be the defining difference between success and failure on a project

32

544

5

85

29K

Ryan Kosai

@rkosai

5 months ago

@yoheinakajima Then I would win more

0

8

Ryan Kosai

@rkosai

5 months ago

@yoheinakajima I think ASCII is most interesting

1

0

12

rkosai retweeted

Reads with Ravi

@readswithravi

5 months ago

Wife: You’re not buying new books, are you? Me: Absolutely not. These books were published years ago.

120

24K

2K

619

461K

Ryan Kosai

@rkosai

5 months ago

@yoheinakajima Semantically rich, rather than a compression.

0

4

Ryan Kosai

@rkosai

5 months ago

@yoheinakajima Although not the shortest, even amongst my own answers, I think “:boonA” is the most interesting I came up with.

1

0

11

Ryan Kosai

@rkosai

5 months ago

@yoheinakajima Bring it.

1

0

41

rkosai retweeted

Sean Florez

@seanf1orez

5 months ago

Software wins for a boring reason: the loop is cheap. Edit → run → test → repeat. Most “hard” fields feel slow not because the physics is impossible, but because the work is handmade every time: •come up with an idea •rebuild the setup •rerun the same steps •reprocess the data •argue about what changed •decide what to try next Weeks disappear into glue work. The move isn’t “pick software over plasma/optics/materials.” It’s: reframe the problem so it can be worked on like software. What mapping your problem into a “software problem” actually means: •define clear inputs (“what are we changing?”) •make the process repeatable (“how do we run it the same way?”) •define outputs you can score (“what does ‘better’ mean?”) •track versions (“what changed since last time?”) Once the work looks like that, you inherit software’s superpower: fast iteration. This shows up everywhere: •materials: change structure/conditions → run → score properties → keep what works •robotics: change policy/design → run in sim → quick reality check → iterate •lab/instrument work: standardize the recipe → push a button → get a clean report The key is staying honest. Speed is useless if you’re just churning nonsense. So you need simple guardrails: •“did we run the same process?” •“does the result pass basic sanity?” •“are we comparing apples to apples?” Make the work repeatable and measurable, and “hard” fields start compounding like software.

0

28

4

15

20K

rkosai retweeted