Amir Bashti

Verified account

@amirbashti

Frontier AI Data @SnorkelAI @Stanford

San Francisco, CA

Joined January 2016

63 Following

114 Followers

14 Posts

amirbashti retweeted

vincent sunn chen

@vincentsunnchen

4 months ago

https://t.co/aNuEf6Yu9j

17

329

83

287

149K

amirbashti retweeted

5 months ago

Excited to share a preview of @SnorkelAI 's new Agentic Coding benchmark - testing models on realistic, multi-step software engineering tasks in fully sandboxed execution environments across a calibrated range of task domains and difficulties, inspired by our work with the @terminalbench team! With a top pass@5 score of 58% (Opus 4.5) - this new benchmark challenges the notion running wild on X right now that LLMs have "solved" software engineering. And, with both unit tests and both final-output and trajectory-level rubrics, it's already giving us & partners insights into where coding agents fail. Excited to share more here shortly! Link to benchmark & release post in 🧵👇

1

36

12

5

3K

amirbashti retweeted

10 months ago

Lots of chatter about agentic/RL simulation environments recently! Some key misconceptions (slightly caricatured): >> Building RL envs is easy, because you just code up a verifier quickly, and let the model do the tough data generation on its own! - Usually, this boils down to over-indexing on environments where verification is easy. - For example: you might need a chess expert to generate realistic expert gameplay traces, but anyone with a basic chess rulebook could verify a win easily. - However: there are many, many settings where verification is not at all trivial. The simplest examples are settings with nuanced, domain-specific evaluation rubrics (e.g. most real world enterprise settings). An extreme example being: verify whether a program will halt :) >> Building RL envs will get commoditized as the "standard" environments get rapidly solved. - RL environments effectively encode a complete product spec - including unique tools, data resources, constraints, rubrics/verifiers, and human/agent simulators - and as such, are as diverse as the space of all possible AI products. - Yes, certain generic RL envs will rapidly commoditize ('web browsing', 'computer OS') - but these are not the useful ones anyway! - The useful RL envs will be deeply domain- and product-specific – and will require corresponding human expertise and customization to build and evolve over time. >> RL (and RL envs) will be all that you need! - Current evidence suggests that RL / RL envs will be one part of the overall AI development loop- which will continue to require golden human annotations/traces for initial SFT; ongoing human evals; and more - Just like trial-and-error based learning is only one part of human learning, RL will likely be one tool/phase of many. In summary: - (1) Building the components of an RL environment is usually highly non-trivial. - (2) RL envs effectively describe a product spec - there will be a wide range of unique ones, requiring deep product/domain expertise. - (3) RL (and RL envs) will be one component of a rich ecosystem of tools for model learning, including human data, rubrics, evals, and more. If interested in some of the work the @SnorkelAI team is doing in partnership with leading LLM developers here- shoot us a note! It's an exciting time to build in this space :)

7

183

16

236

26K

amirbashti retweeted

about 1 year ago

Scale alone is not enough for AI data. Quality and complexity are equally critical. Excited to support all of these for LLM developers with @SnorkelAI Data-as-a-Service, and to share our new leaderboard! — Our decade-plus of research and work in AI data has a simple point: scale alone is not enough. AI success is all about the quality, complexity, and distribution of data—in addition to volume. We’re excited to be powering leading LLM developers with @SnorkelAI Expert Data-as-a-Service, our white glove service for custom, expert-level AI datasets—and to now preview some of what we’re building via our new Expert Data Leaderboard (🔗 in 🧵) + upcoming OSS dataset releases! Snorkel Expert Data-as-a-Service is built to meet the rapidly evolving data needs of the agentic AI world—where success is built on the quality, complexity, and distribution of datasets, in addition to size and scale. This kind of high-quality, frontier AI data can only come from a union of technology and human expertise. With Snorkel Expert Data-as-a-Service, we’re powering frontier LLM developers across agentic, expert knowledge, reasoning, coding, multi-modal, and other task types via the combination of these two key components: - (1) The Snorkel Expert Network: A global team of subject matter experts focused wholly on specialized knowledge–spanning thousands of topics in STEM/academic, vertical/professional, and consumer/lifestyle domains. - (2) @SnorkelAI Data Development Platform: Our unique programmatic data curation and quality control platform, accelerating and improving expert authoring and review through principled techniques developed over the last decade of R&D. Now: we’re incredibly excited to showcase some of the power of Snorkel Expert Data-as-a-Service via the new Snorkel Leaderboard—putting frontier models to the test in complex, agentic, and reasoning settings inspired by real industry scenarios (not esoteric puzzles)! We’ll be releasing new leaderboards and accompanying expert-verified open source datasets (coming soon!) regularly. To start, we’re sharing three initial ones in preview: - SnorkelFinance: Q&A over financial documents requiring agentic tool-calling and reasoning - SnorkelUnderwrite: Agentic insurance tasks requiring industry-specific reasoning and tool use - SnorkelSequences: Mathematical tasks requiring compositional multi-step reasoning

14

142

31

40

496K

Who to follow

Soccer Down Here

Verified account

@SoccerDownHere

The SDH Network. Daily soccer coverage across shows, audio, and live platforms. Morning Espresso • SDH AM • Soccer Over There Around the Corner from Everywhere.

Verified account

Atlanta Falcons reporter @929TheGame | Founder, Cohost @FiveStripeFinal

@ChrisDRaimondi

AMBSE Communications

amirbashti retweeted

about 1 year ago

Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! --- Snorkel Evaluate is our new data-centric agentic AI evaluation platform for specialized, mission-critical enterprise settings where vibe checks and out-of-the-box metrics driven by simple LLM prompts are not enough. Snorkel Expert Data-as-a-Service is our white glove service for expert-level AI datasets, powering frontier LLM developers in areas like expert knowledge, reasoning, agentic action and tool use, and more! Both built on top of @SnorkelAI’s Data Development Platform, using our programmatic technology to drive higher-quality expert data, faster– for getting specialized AI to real production value. If you’re building enterprise AI and want to partner around the key ingredient in AI today–the data–book a demo and let's talk! https://t.co/w0J8izpn8p Finally, see thread for details on 🧵👇 - 📽️ A walkthrough of Snorkel Evaluate and Expert Data-as-a-Service on an agentic AI enterprise task - 📅 An upcoming event on Enterprise Agentic AI with innovators from @Accenture @BNY @Comcast @Stanford @QBE & others - 📊 An upcoming series of benchmark datasets and model artifact releases 👀 Want early access to the full agentic AI dataset? Retweet this post and we'll send you the link!

15

270

76

42

50K

amirbashti retweeted

over 5 years ago

Congrats, @amirbashti 👏 Team of the Week honors for our No. 1️⃣0️⃣

4

43

5

0

0

amirbashti retweeted

over 5 years ago

ANOTHER ONE‼️ @amirbashti with a beauty!

0

18

2

0

0

amirbashti retweeted

over 6 years ago

GOLAZO IN THE TOP CORNER 🔥 Amir Bashti nets his FIRST #ATLUTD2 goal!

3

140

20

0

0

amirbashti retweeted

over 6 years ago

Can't touch this 😉 @amirbashti | #TekkersTuesday

2

98

3

0

0

amirbashti retweeted

Stanford Men’s Soccer @StanfordMSoccer

over 7 years ago

Another honor for @amirbashti. The senior forward lands on the @UnitedCoaches Scholar All-America first team. #GoStanford 🤓⚽️🎓 https://t.co/OWtoBDXF0I

0

24

1

0

0

amirbashti retweeted

Stanford Men’s Soccer @StanfordMSoccer

over 7 years ago

Tanner Beason and Amir Bashti collected the first All-America accolades of their careers on Thursday night. It's the fifth consecutive season Stanford has had multiple players honored. #GoStanford https://t.co/NVHnVXBqPa

1

46

4

0

0

amirbashti retweeted

Stanford Men’s Soccer @StanfordMSoccer

about 8 years ago · Yorkshire and The Humber

3-1 win vs Bradford City. Hat trick from Bashti 🎩

StanfordMSoccer's tweet photo. 3-1 win vs Bradford City. Hat trick from Bashti 🎩 https://t.co/ZU8VuLQdga

StanfordMSoccer's tweet photo. 3-1 win vs Bradford City. Hat trick from Bashti 🎩 https://t.co/ZU8VuLQdga

StanfordMSoccer's tweet photo. 3-1 win vs Bradford City. Hat trick from Bashti 🎩 https://t.co/ZU8VuLQdga

1

52

2

0

0

amirbashti retweeted

Stanford Men’s Soccer @StanfordMSoccer

about 8 years ago · Hillingdon

Cardinal win 5-2 over QPR. Goals from Bashti, Bulut, Joshua, Panchot, and Beason. Quick shower then off to Wembley to watch England vs Italy. #EnglandTour2018

3

52

6

0

0

amirbashti retweeted

Stanford Men’s Soccer @StanfordMSoccer

about 8 years ago

3-0 Win over Fulham FC U20’s. Goals by Panchot and Bashti with a brace. #EnglandTour2018

2

69

9

0

0

Last Seen Users on Sotwe

Trends for you

Most Popular Users