Andrew Borthwick

@BorthwickAndrew

AI/ML Scientist. Primary author of RoboPhD, an easy-to-use toolkit for optimizing complex agents. "If you can benchmark it, RoboPhD can optimize it".

Seattle, WA

Joined October 2011

49 Following

17 Followers

11 Posts

Andrew Borthwick

@BorthwickAndrew

about 2 months ago

This is the 6th RoboPhD application, alongside @Chudbrochil's recent sudoku work. RoboPhD wins 5 of 6 across the suite — ARC-AGI, Text2SQL, financial QA, sudoku, and now bioinformatics. Joint work with Anthony and @steve_ash.

Andrew Borthwick

@BorthwickAndrew

about 2 months ago

I last took biology in high school. RoboPhD just evolved a 682-line agent that scores 65.9% Fmax on Price-149 — a 149-protein benchmark built specifically to defeat homology-based function prediction (the "find a similar protein and copy its labels" approach). GEPA scored 55.7%. @karpathy's Autoresearch scored 57.7%. https://t.co/4VKbCQTEiY

Andrew Borthwick

@BorthwickAndrew

about 2 months ago

RoboPhD finds these techniques because the evolution loop sees rich per-protein error reports and gets selection pressure from head-to-head competition between candidate agents. Bad design choices fail visibly on specific proteins; good ones win iteration after iteration.

Andrew Borthwick

@BorthwickAndrew

about 2 months ago

The seed was 50 lines of basic BLAST-and-LLM-fallback (very basic stuff). The evolved agent does multi-source evidence fusion, adversarial dual-LLM ensembling, and confidence calibration by homology consensus. These are techniques real bioinformatics groups publish papers on.

Who to follow

Ksenia Se

@Kseniase_

Finding patterns in AI: past, present, possible futures. Building @TheTuringPost, researching @T2_Diplomacy, talking at https://t.co/aKNWULV76o. Mom of 5

Hassan Sajjad

@hassaan84s

Associate Professor - Dalhousie University, Halifax, Canada NLP, deep learning, explainable AI

Yong Zheng-Xin

@yong_zhengxin

preparedness || phd in em-dashes @BrownUniversity || views my own

Andrew Borthwick

@BorthwickAndrew

about 2 months ago

All three algorithms had the same evaluation budget and were given the same information. None of them benefited from my 9th grade bioinformatics wizardry.

BorthwickAndrew retweeted

Steve

@steve_ash

2 months ago

Excited to share RoboPhD! An evolutionary approach to optimizing Agents through multi-round competition using Elo. https://t.co/D7QDGNXD40 https://t.co/Pyaav6R3Mz Takeaways: 💡On three out of four diverse tasks (abstract reasoning, SQL generation, financial QA, cloud scheduling) RoboPhd beats the popular GEPA and an adaptation of @karpathy AutoResearch Hill climbing approach under the same fixed number of evaluations. 💡 RoboPhd uses a multi-round competition with different sampling each round, using Elo as a means to rank candidates. This allows us to be more sample efficient over a fixed train/validation split. 💡RoboPhd allows the agents to self-instrument to discover useful diagnostic info to surface to the evolution process, kind of a self-adapting textual gradient The code is out on GitHub under MIT license and we offer a GEPA optimize_anything-like API to make it easy to plug your own tasks! "If you can benchmark it, RoboPhD can optimize it" :) This work was lead by the herculean efforts of Andrew Borthwick with myself and Anthony Galczak contributing.

Andrew Borthwick

@BorthwickAndrew

5 months ago

@sir4K_zen @rohanpaul_ai Evolution produces an easily deployable agent: the BIRD referees test on systems which are unseen to us scientists. Regarding schema drift: the Python analysis tools inspect the current schema at runtime. So if the schema changes, just re-run the analysis phase.

Andrew Borthwick

@BorthwickAndrew

5 months ago

@helderbuilds @rohanpaul_ai BIRD benchmark has gold SQL for each question - we execute RoboPhD's queries and score as correct/incorrect based on whether the result matches. The evolution AI analyzes errors and produces revised prompts + analysis code for the next generation

BorthwickAndrew retweeted

Rohan Paul

@rohanpaul_ai

6 months ago

RoboPhD shows LLMs, text-generating AI models, can self-improve text-to-SQL by evolving tools and prompts from feedback. Text-to-SQL is hard because the model must understand a database's tables and columns, then write exact SQL, the language databases use for queries, where tiny mistakes count as total failure. RoboPhD splits the job into 2 evolving parts, a non-AI code script that writes a database cheat sheet, and instructions that guide the LLM to write SQL from that cheat sheet. An evolution agent, an AI that rewrites the system, keeps making new versions based on what went wrong, tests them on BIRD, a public set of databases and questions, then picks winners with an ELO score, a chess-style rating for head-to-head results. Starting from a tiny 70-line starting point, the best evolved agent reaches 73.67% accuracy, and the biggest gains show up on cheaper LLMs that normally lag behind. That matters because the final output is just a reusable script plus instructions, so a lower-cost model can perform like a pricier one in real deployments. ---- Paper Link – arxiv. org/abs/2601.01126 Paper Title: "RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution"

rohanpaul_ai's tweet photo. RoboPhD shows LLMs, text-generating AI models, can self-improve text-to-SQL by evolving tools and prompts from feedback.

Text-to-SQL is hard because the model must understand a database's tables and columns, then write exact SQL, the language databases use for queries, where tiny mistakes count as total failure.

RoboPhD splits the job into 2 evolving parts, a non-AI code script that writes a database cheat sheet, and instructions that guide the LLM to write SQL from that cheat sheet.

An evolution agent, an AI that rewrites the system, keeps making new versions based on what went wrong, tests them on BIRD, a public set of databases and questions, then picks winners with an ELO score, a chess-style rating for head-to-head results.

Starting from a tiny 70-line starting point, the best evolved agent reaches 73.67% accuracy, and the biggest gains show up on cheaper LLMs that normally lag behind.

That matters because the final output is just a reusable script plus instructions, so a lower-cost model can perform like a pricier one in real deployments.

----

Paper Link – arxiv. org/abs/2601.01126

Paper Title: "RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution"

BorthwickAndrew retweeted

Amazon Science

@AmazonScience

over 5 years ago

An internal #MachineLearning challenge has fostered a greater sense of community among the company's scientists, says principal scientist Andrew Borthwick. Learn more about Amazon's two-pizza teams and its decentralized approach to science and engineering. https://t.co/gogwc2xVlZ

Andrew Borthwick

@BorthwickAndrew

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users