Steve

@steve_ash

I like people, places, things, & ideas. Dog Parent. Senior Principal Engineer @ AWS. Opinions are my own. 🎓🚴🎹🏳️‍🌈🐩🍷

Seattle, WA

Joined October 2010

1.1K Following

336 Followers

2K Posts

Pinned Tweet

Steve

@steve_ash

about 2 months ago

Our team launched this today!! You're going to love it

Amazon Quick

@amazonquick

about 2 months ago

Quick is built differently. It lives on your laptop and is connected to everything you do—your local files, calendar, email, and the apps you already use. And where most AI tools only work within their own vendor-specific ecosystem and can only help with a fraction of your work, Quick is built to break you free from those walled gardens.

steve_ash retweeted

Swami Sivasubramanian

@SwamiSivasubram

12 days ago

Does AI actually help engineering teams ship faster? Only if you change how you work. We've been experimenting across hundreds of engineering teams at @Amazon and seeing clear patterns in where AI delivers the most value. The teams seeing 4.5x–10x+ productivity gains all figured out the same thing: the workflow matters more than the tool and patience is non-negotiable. Frontier teams that invest time in building agent context see compounding acceleration after a couple of weeks. The teams that quit too early never get there. Here's what we learned → https://t.co/E5UnlHFw34

SwamiSivasubram's tweet photo. Does AI actually help engineering teams ship faster? Only if you change how you work. We've been experimenting across hundreds of engineering teams at @Amazon and seeing clear patterns in where AI delivers the most value. The teams seeing 4.5x–10x+ productivity gains all figured out the same thing: the workflow matters more than the tool and patience is non-negotiable. Frontier teams that invest time in building agent context see compounding acceleration after a couple of weeks. The teams that quit too early never get there. Here's what we learned → https://t.co/E5UnlHFw34

Steve

@steve_ash

12 days ago

@QuinnyPig @bee__computer What are some of the useful things that Quick Desktop did for you?

steve_ash retweeted

Amazon Quick

@amazonquick

about 1 month ago

Five new capabilities available to accelerate your AI-powered analytics. 1. Dataset Q&A 2. Chat Explanations 3. Dataset Enrichment 4. Generate Analysis 5. Direct Query on S3 Tables

Who to follow

The best offers from the Unity Asset Store: new sales, price drops and free assets. No AI-generated content. Follow me to find assets at the best price!

Pedro Nolberto Barbera

about 1 month ago

@zackkanter @SwamiSivasubram @amazonquick Hi Zack! What IdP are you using?

steve_ash retweeted

Swami Sivasubramanian

@SwamiSivasubram

about 1 month ago

AI can generate SQL, build charts, and return answers in seconds. But unlike a generated UI–where you can see whether a layout looks right–you can't look at a number and know whether it's correct, or how to fix it if it isn't. Five new capabilities in @amazonquick solve that, from connecting directly to your data lake, to showing you exactly how every answer was derived. ➡️ https://t.co/rqflHlZXky

SwamiSivasubram's tweet photo. AI can generate SQL, build charts, and return answers in seconds. But unlike a generated UI–where you can see whether a layout looks right–you can't look at a number and know whether it's correct, or how to fix it if it isn't. Five new capabilities in @amazonquick solve that, from connecting directly to your data lake, to showing you exactly how every answer was derived. ➡️ https://t.co/rqflHlZXky

steve_ash retweeted

Swami Sivasubramanian

@SwamiSivasubram

about 2 months ago

Amazon Quick changes how you work. Today we're releasing it in desktop mode—a proactive AI assistant that connects to your apps, builds a personal knowledge graph from your work, and gets smarter every session. No AWS account needed. Quick finds the smarter way to get it done ➡️https://t.co/IKyc5ZWoTi #WhatsNextwithAWS

188

62K

steve_ash retweeted

Andrew Borthwick

@BorthwickAndrew

about 2 months ago

This is the 6th RoboPhD application, alongside @Chudbrochil's recent sudoku work. RoboPhD wins 5 of 6 across the suite — ARC-AGI, Text2SQL, financial QA, sudoku, and now bioinformatics. Joint work with Anthony and @steve_ash.

Steve

@steve_ash

2 months ago

We just published some results for RoboPhD an agent optimization method that beats GEPA and Autoresearch on 3 out of the 4 tasks we studied! RoboPhD is an evolutionary approach to optimizing Agents through multi-round competition using Elo. https://t.co/ba14TcHGwJ https://t.co/EbSGd7ZEmJ… Takeaways: 💡On three out of four diverse tasks (abstract reasoning, SQL generation, financial QA, cloud scheduling) RoboPhd beats the popular GEPA and an adaptation of @karpathy AutoResearch Hill climbing approach under the same fixed number of evaluations. 💡 RoboPhd uses a multi-round competition with different sampling each round, using Elo as a means to rank candidates. This allows us to be more sample efficient over a fixed train/validation split. 💡RoboPhd allows the agents to self-instrument to discover useful diagnostic info to surface to the evolution process, kind of a self-adapting textual gradient The code is out on GitHub under MIT license and we offer a GEPA optimize_anything-like API to make it easy to plug your own tasks! "If you can benchmark it, RoboPhD can optimize it" :)

Steve

@steve_ash

2 months ago

Another iteration on RoboPhD which you posted on before @rohanpaul_ai

Steve

@steve_ash

2 months ago

Excited to share RoboPhD! An evolutionary approach to optimizing Agents through multi-round competition using Elo. https://t.co/D7QDGNXD40 https://t.co/Pyaav6R3Mz Takeaways: 💡On three out of four diverse tasks (abstract reasoning, SQL generation, financial QA, cloud scheduling) RoboPhd beats the popular GEPA and an adaptation of @karpathy AutoResearch Hill climbing approach under the same fixed number of evaluations. 💡 RoboPhd uses a multi-round competition with different sampling each round, using Elo as a means to rank candidates. This allows us to be more sample efficient over a fixed train/validation split. 💡RoboPhd allows the agents to self-instrument to discover useful diagnostic info to surface to the evolution process, kind of a self-adapting textual gradient The code is out on GitHub under MIT license and we offer a GEPA optimize_anything-like API to make it easy to plug your own tasks! "If you can benchmark it, RoboPhD can optimize it" :) This work was lead by the herculean efforts of Andrew Borthwick with myself and Anthony Galczak contributing.

Steve

@steve_ash

3 months ago

@karpathy This has been my process too! I also have an automated schedule to find new knowledge (via my substack subscriptions, arxiv feed) check deltas against the existing md files, track potentially new emerging things and credibility scores to go into a lightweight daily report for me

steve_ash retweeted

Rohan Paul

@rohanpaul_ai

6 months ago

RoboPhD shows LLMs, text-generating AI models, can self-improve text-to-SQL by evolving tools and prompts from feedback. Text-to-SQL is hard because the model must understand a database's tables and columns, then write exact SQL, the language databases use for queries, where tiny mistakes count as total failure. RoboPhD splits the job into 2 evolving parts, a non-AI code script that writes a database cheat sheet, and instructions that guide the LLM to write SQL from that cheat sheet. An evolution agent, an AI that rewrites the system, keeps making new versions based on what went wrong, tests them on BIRD, a public set of databases and questions, then picks winners with an ELO score, a chess-style rating for head-to-head results. Starting from a tiny 70-line starting point, the best evolved agent reaches 73.67% accuracy, and the biggest gains show up on cheaper LLMs that normally lag behind. That matters because the final output is just a reusable script plus instructions, so a lower-cost model can perform like a pricier one in real deployments. ---- Paper Link – arxiv. org/abs/2601.01126 Paper Title: "RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution"

rohanpaul_ai's tweet photo. RoboPhD shows LLMs, text-generating AI models, can self-improve text-to-SQL by evolving tools and prompts from feedback.

Text-to-SQL is hard because the model must understand a database's tables and columns, then write exact SQL, the language databases use for queries, where tiny mistakes count as total failure.

RoboPhD splits the job into 2 evolving parts, a non-AI code script that writes a database cheat sheet, and instructions that guide the LLM to write SQL from that cheat sheet.

An evolution agent, an AI that rewrites the system, keeps making new versions based on what went wrong, tests them on BIRD, a public set of databases and questions, then picks winners with an ELO score, a chess-style rating for head-to-head results.

Starting from a tiny 70-line starting point, the best evolved agent reaches 73.67% accuracy, and the biggest gains show up on cheaper LLMs that normally lag behind.

That matters because the final output is just a reusable script plus instructions, so a lower-cost model can perform like a pricier one in real deployments.

----

Paper Link – arxiv. org/abs/2601.01126

Paper Title: "RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution"

Steve

@steve_ash

11 months ago

@LoFiAllstars A phrase that I continue to use frequently in my life, learned from Lo-fi

Steve

@steve_ash

over 1 year ago

Claude has an opinion on √GOAT too: √GOAT = (Half a goat on hind legs balancing a protractor on its head reciting first 3 digits of pi) × (Half a goat on hind legs while balancing a protractor on its head reciting first 3 digits of pi) = One complete goat doing goat things

Steve

@steve_ash

almost 2 years ago

@soldni @Muennighoff Congratulations to you and the team! Great stuff 🎉

Steve

@steve_ash

about 2 years ago

In a world of hype and contrarians, it's hard to find the right balance between (A) naively inspired and trying to imagine the future (B) being skeptical and needing answers before making a bet. I do believe that it takes little skill or knowledge to default to the extremes.

Steve

@steve_ash

over 2 years ago

@wellheyitsjulia I hated that movie. I love ari aster, but when the penis monster came on screen I threw my hands in the air and gave up. The film seems like the product of a "mad genius" director with no one pushing back asking "are you sure about that??"

Steve

@steve_ash

over 2 years ago

@bradmalin @IEEEorg @sama @OpenAI Congratulations!

Steve

@steve_ash

almost 3 years ago

It should be a UX law that if you show me something to click and give me enough time to react (>600ms?) That you can't move it right as I'm about to click for it. Microsoft Outlook Web client, I'm looking at you.

103

Steve

@steve_ash

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users