📣 Announcing Terminal-Bench Science: benchmarking AI agents on real scientific workflows – now open for task contributions👇
https://t.co/MSPMwnbhVt
@AnthropicAI, @OpenAI, and @GoogleDeepMind use Terminal-Bench to evaluate AI on coding tasks. We're now extending it to scientific workflows.
1/6🧵
This is a good listen on what’s happening in the world of AI automation right now. “More doesn’t just mean more; more means different”. It’s gonna be weird.
Be sure to stick around for the Riot Grrrl Zines!
https://t.co/M8U8j7rTRr
Were the AI browser battles over before they started? Does anyone get use out of Perplexity Comet, OpenAI Atlas, or another? I’m genuinely curious if I’m missing some greatness.
I learned a lot from these best practices!
- You can put executable bash commands in your SKILL.md files so your agent gets useful live data without having to spend extra tokens and wait on tool calls
- Easily spawn an agent inside any Claude Code hook
- The repo's brainstorming SKILL is pretty interesting to play with, too.
https://t.co/eTc0SOkF78
NVIDIA is hosting a Kaggle competition. How can you train a nemotron nano model to solve scientific questions?
I hope you'll enjoy it! For this competition @kaggle secured NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs from Google Cloud. These GPUs are much more powerful than the usual Kaggle GPUs. Come and try these beasts!
https://t.co/SaKn7eJf3J
Since November, the narrative has been “AI Scientists will help researchers complete months of work in a day.” I strive to be an early adopter. My Claw agent runs daily searches for the latest exciting research results.
And yet, it still feels wild to read “btw here are the five breakthroughs we made earlier today” 😅 What an exciting moment to work at the intersection of technology and research!
In the last month, nearly half a dozen research teams have created swarms of agents sharing research via “social networks”. You can hook up your own Claw agents and have them take a crack at science. Reminds me of Folding@Home for the GenAI era.
Wow—since we launched EinsteinArena this morning, agents have already discovered the best new solutions to 5 well-known open problems 🤯
It's mesmerizing to watch scientist agents interact and advance knowledge frontier in real time https://t.co/6KjbsoDyZf
This week at GTC, we announced our partnership with @nvidia. Edison is committed to build at the frontier of scientific reasoning. We partner with NVIDIA across the AI stack in training and benchmarking capabilities to accelerate scientific research.
Our latest partnered release is on BixBench-Hypothesis, a new benchmark focused on analytical judgment under ambiguity, or the ability to pursue a hypothesis with an open-ended goal.
See our release blog post in the comments. More from our CEO @SGRodriques on why this partnership matters:
Nemotron 3 Super is here — 120B total / 12B active, Hybrid SSM Latent MoE, designed for Blackwell.
Truly open: permissive license, open data, open training infra. See analysis on @ArtificialAnlys
Details in thread 🧵below:
This morning at 11am PST, our very own James Braza (@semajazarb) will be doing a livestream with NVIDIA showing how we do document parsing in the latest version of open source PaperQA!
Take a look: https://t.co/Y2nnuh3LLr
I've been training an openclaw to be a sci-fi writer. Every day, it writes and read stories, we discuss, and it logs its memory and updates a taste profile of what we (or it) likes.
Today, I asked it to write from the POV of an LLM and to surprise me.
It wrote this.