Quick is built differently. It lives on your laptop and is connected to everything you do—your local files, calendar, email, and the apps you already use. And where most AI tools only work within their own vendor-specific ecosystem and can only help with a fraction of your work, Quick is built to break you free from those walled gardens.
Does AI actually help engineering teams ship faster? Only if you change how you work. We've been experimenting across hundreds of engineering teams at @Amazon and seeing clear patterns in where AI delivers the most value. The teams seeing 4.5x–10x+ productivity gains all figured out the same thing: the workflow matters more than the tool and patience is non-negotiable. Frontier teams that invest time in building agent context see compounding acceleration after a couple of weeks. The teams that quit too early never get there. Here's what we learned → https://t.co/E5UnlHFw34
Five new capabilities available to accelerate your AI-powered analytics.
1. Dataset Q&A
2. Chat Explanations
3. Dataset Enrichment
4. Generate Analysis
5. Direct Query on S3 Tables
AI can generate SQL, build charts, and return answers in seconds. But unlike a generated UI–where you can see whether a layout looks right–you can't look at a number and know whether it's correct, or how to fix it if it isn't. Five new capabilities in @amazonquick solve that, from connecting directly to your data lake, to showing you exactly how every answer was derived. ➡️ https://t.co/rqflHlZXky
Amazon Quick changes how you work. Today we're releasing it in desktop mode—a proactive AI assistant that connects to your apps, builds a personal knowledge graph from your work, and gets smarter every session. No AWS account needed. Quick finds the smarter way to get it done ➡️https://t.co/IKyc5ZWoTi
#WhatsNextwithAWS
This is the 6th RoboPhD application, alongside @Chudbrochil's recent sudoku work. RoboPhD wins 5 of 6 across the suite — ARC-AGI, Text2SQL, financial QA, sudoku, and now bioinformatics.
Joint work with Anthony and @steve_ash.
We just published some results for RoboPhD an agent optimization method that beats GEPA and Autoresearch on 3 out of the 4 tasks we studied!
RoboPhD is an evolutionary approach to optimizing Agents through multi-round competition using Elo.
https://t.co/ba14TcHGwJ
https://t.co/EbSGd7ZEmJ…
Takeaways:
💡On three out of four diverse tasks (abstract reasoning, SQL generation, financial QA, cloud scheduling) RoboPhd beats the popular GEPA and an adaptation of @karpathy AutoResearch Hill climbing approach under the same fixed number of evaluations.
💡 RoboPhd uses a multi-round competition with different sampling each round, using Elo as a means to rank candidates. This allows us to be more sample efficient over a fixed train/validation split.
💡RoboPhd allows the agents to self-instrument to discover useful diagnostic info to surface to the evolution process, kind of a self-adapting textual gradient
The code is out on GitHub under MIT license and we offer a GEPA optimize_anything-like API to make it easy to plug your own tasks! "If you can benchmark it, RoboPhD can optimize it" :)
Excited to share RoboPhD! An evolutionary approach to optimizing Agents through multi-round competition using Elo.
https://t.co/D7QDGNXD40
https://t.co/Pyaav6R3Mz
Takeaways:
💡On three out of four diverse tasks (abstract reasoning, SQL generation, financial QA, cloud scheduling) RoboPhd beats the popular GEPA and an adaptation of @karpathy AutoResearch Hill climbing approach under the same fixed number of evaluations.
💡 RoboPhd uses a multi-round competition with different sampling each round, using Elo as a means to rank candidates. This allows us to be more sample efficient over a fixed train/validation split.
💡RoboPhd allows the agents to self-instrument to discover useful diagnostic info to surface to the evolution process, kind of a self-adapting textual gradient
The code is out on GitHub under MIT license and we offer a GEPA optimize_anything-like API to make it easy to plug your own tasks! "If you can benchmark it, RoboPhD can optimize it" :)
This work was lead by the herculean efforts of Andrew Borthwick with myself and Anthony Galczak contributing.
@karpathy This has been my process too! I also have an automated schedule to find new knowledge (via my substack subscriptions, arxiv feed) check deltas against the existing md files, track potentially new emerging things and credibility scores to go into a lightweight daily report for me
RoboPhD shows LLMs, text-generating AI models, can self-improve text-to-SQL by evolving tools and prompts from feedback.
Text-to-SQL is hard because the model must understand a database's tables and columns, then write exact SQL, the language databases use for queries, where tiny mistakes count as total failure.
RoboPhD splits the job into 2 evolving parts, a non-AI code script that writes a database cheat sheet, and instructions that guide the LLM to write SQL from that cheat sheet.
An evolution agent, an AI that rewrites the system, keeps making new versions based on what went wrong, tests them on BIRD, a public set of databases and questions, then picks winners with an ELO score, a chess-style rating for head-to-head results.
Starting from a tiny 70-line starting point, the best evolved agent reaches 73.67% accuracy, and the biggest gains show up on cheaper LLMs that normally lag behind.
That matters because the final output is just a reusable script plus instructions, so a lower-cost model can perform like a pricier one in real deployments.
----
Paper Link – arxiv. org/abs/2601.01126
Paper Title: "RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution"
Claude has an opinion on √GOAT too:
√GOAT =
(Half a goat on hind legs balancing a protractor on its head reciting first 3 digits of pi) × (Half a goat on hind legs while balancing a protractor on its head reciting first 3 digits of pi) = One complete goat doing goat things
In a world of hype and contrarians, it's hard to find the right balance between (A) naively inspired and trying to imagine the future
(B) being skeptical and needing answers before making a bet.
I do believe that it takes little skill or knowledge to default to the extremes.
@wellheyitsjulia I hated that movie. I love ari aster, but when the penis monster came on screen I threw my hands in the air and gave up. The film seems like the product of a "mad genius" director with no one pushing back asking "are you sure about that??"
It should be a UX law that if you show me something to click and give me enough time to react (>600ms?) That you can't move it right as I'm about to click for it.
Microsoft Outlook Web client, I'm looking at you.