Ashish Sabharwal @Ashish_S_AI - Twitter Profile

11 months ago

Excited to share MoNoCo -- a new benchmark with @TomerWolfson, @harsh3vedi, and others for pushing the frontier of LLMs on realistic information-seeking questions that require combining info from dozens of Wikipedia pages!

Ai2 @allen_ai

11 months ago

LLMs power research, decision‑making, and exploration—but most benchmarks don’t test how well they stitch together evidence across dozens (or hundreds) of sources. Meet MoNaCo, our new eval for question-answering cross‑source reasoning. 👇

allen_ai's tweet photo. LLMs power research, decision‑making, and exploration—but most benchmarks don’t test how well they stitch together evidence across dozens (or hundreds) of sources. Meet MoNaCo, our new eval for question-answering cross‑source reasoning. 👇 https://t.co/ilEihlTBdJ

10

224

37

94

22K

0

3

0

284

Ashish Sabharwal @Ashish_S_AI

almost 2 years ago

Excited to share this new benchmark for LLM-based coding agents, for a "super"-useful problem! Kudos to @ben_bogin @tusharkhot and colleagues!

Ben Bogin @ben_bogin

almost 2 years ago

📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️ https://t.co/U47r3F3UO5

ben_bogin's tweet photo. 📢 New Benchmark: SUPER for Setting UP and Executing tasks from Research repositories

Reproducibility is crucial in science. We introduce SUPER to evaluate LLMs' capabilities in autonomously running experiments from research repositories. ⬇️

https://t.co/U47r3F3UO5 https://t.co/lN6QhFNe4d

5

72

19

21

20K

0

3

0

1

585

Ashish_S_AI retweeted

Ai2 @allen_ai

almost 2 years ago

🥳The BIGGEST congratulations for our teams' recognition at #ACL2024! OLMo received the Best Theme Paper, Dolma + AppWorld received the Best Resource Paper, and "Political Compass or Spinning Arrow?" was honored with an Outstanding Paper Award.

allen_ai's tweet photo. 🥳The BIGGEST congratulations for our teams' recognition at #ACL2024! OLMo received the Best Theme Paper, Dolma + AppWorld received the Best Resource Paper, and "Political Compass or Spinning Arrow?" was honored with an Outstanding Paper Award. https://t.co/6Q78Z25eZG

3

71

14

0

8K

Ashish Sabharwal @Ashish_S_AI

almost 2 years ago

Thrilled for AppWorld, and especially @harsh3vedi, to receive this recognition at ACL-2024!

LUNR @stonybrooknlp

almost 2 years ago

AppWorld won the (one of the) best resource paper award(s) at #ACL2024 Outstanding resource and great work by @harsh3vedi @b_niranjan at Stony Brook @tusharkhot @Ashish_S_AI @ai2_aristo and collaborators 🧵👇

stonybrooknlp's tweet photo. AppWorld won the (one of the) best resource paper award(s) at #ACL2024
Outstanding resource and great work by @harsh3vedi @b_niranjan at Stony Brook @tusharkhot @Ashish_S_AI @ai2_aristo and collaborators

🧵👇 https://t.co/zvi8FCC7fJ

0

16

8

1

6K

1

5

3

0

539

Who to follow

techsiren ♥︎

@techhsiren

foid in tech ⋆༘ ୭˚ wife *⁀➷ mama ⋆*˚♡˚*⋆ sewing maya#231 (❀❛ ֊ ❛„)♡ || pregnant w/ no. 2 𐙚𐙚

Zhaofeng Wu

@zhaofeng_wu

Research @OpenAI | PhD @MIT_CSAIL | Previously @MetaAI @GoogleDeepMind @allen_ai | MS'21 BS'19 BA'19 @uwnlp

William Merrill

@lambdaviking

incoming Prof @TTIC_Connect theory and pretraining at @allen_ai Will irl, TC0 enthusiast

Ashish Sabharwal @Ashish_S_AI

almost 2 years ago

Excited to share AppWorld, our challenging new interactive coding environment and benchmark to push AI agents further! Super easy to use (`pip install...`), reliable, reproducible, realistic. Congratulations to @harsh3vedi for the huge effort!!

Harsh Trivedi

@harsh3vedi

almost 2 years ago

🔥 Autonomous AI Assistants (e.g., #googleio2024, #WWDC24) and coding agents (e.g., #Devin, #SWEAgent) have garnered a lot of attention recently. We can envision coding agents autonomously completing complex day-to-day tasks across apps using APIs on our behalf. But how can we develop & benchmark them in a rigorous & reproducible manner? 🚀 Introducing AppWorld: 🌎a simulated world environment where agents can write code to interact with many apps via APIs on behalf of people 📊a benchmark of complex tasks defined on it, and 🧪a robust evaluation framework for assessing agent’s goal completion. 📢 To appear as an #ACL2024 paper 🌎💻🧑‍🤝‍🧑 “AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents” #NLProc #ai #AIagents 📜 https://t.co/cdQ07847Kp (paper) 🌐 https://t.co/dIawTLcI7a for code, blog, data (tasks, APIs, trajectories) explorer, interactive playground, leaderboard & more!

4

86

28

48

58K

0

2

0

188

Ashish Sabharwal @Ashish_S_AI

about 2 years ago

Happy to share that this work will appear at NAACL-2024! Check out our recently updated version on arXiv at https://t.co/C9YVKqVPuB

Aristo Team at Ai2 @ai2_aristo

over 2 years ago

New paper by @ben_bogin @shivanshug11 Peter Clark & @Ashish_S_AI shows that generating code with domain description prompts is hugely effective for Semantic Parsing in the modern era of LLMs! And there is more to it than just the target language's popularity!

0

5

1

2

1K

0

3

0

425

Ashish Sabharwal @Ashish_S_AI

about 2 years ago

Turns out SSMs like S4 and S6 don't quite get the best of both worlds -- sequential and parallel -- and struggle to track state just like Transformers. Excited to share the "Illusion of State" paper w/ @lambdaviking, @jowenpetty ! https://t.co/O8eD2Xzy7f

William Merrill @lambdaviking

about 2 years ago

✨Excited to finally drop our new paper: SSMs “look like” RNNs, but we show their statefulness is an illusion🪄🐇 Current SSMs cannot express basic state tracking, but a minimal change fixes this! 👀 w/ @jowenpetty, @Ashish_S_AI https://t.co/rkHp2BvYm1

lambdaviking's tweet photo. ✨Excited to finally drop our new paper: SSMs “look like” RNNs, but we show their statefulness is an illusion🪄🐇

Current SSMs cannot express basic state tracking, but a minimal change fixes this! 👀

w/ @jowenpetty, @Ashish_S_AI
https://t.co/rkHp2BvYm1 https://t.co/y5tvLFib1z

22

1K

192

924

411K

0

20

4

2K

Ashish Sabharwal @Ashish_S_AI

about 2 years ago

ICYMI @benbenbrubaker wrote an eloquent Quanta article✍️covering our ICLR-2024 paper (w/ @lambdaviking) on how the expressive power of transformers changes with the length of CoT! Recently updated paper📜at https://t.co/6BbOON04DA

William Merrill @lambdaviking

over 2 years ago

Today in Quanta Magazine, @benbenbrubaker gives a new overview of our work (w/ @Ashish_S_AI) on the expressive power of transformers with/without CoT https://t.co/FgZSBzUTpn

1

59

11

16

12K

0

4

1

0

828

Ashish_S_AI retweeted

Sarah Wiegreffe ✈️ ICML @sarahwiegreffe

about 3 years ago

New paper: "Attentiveness to Answer Choices Doesn’t Always Entail High QA Accuracy" 📊💬 https://t.co/5V47V0kfjn Something I've been thinking a lot about recently is the relationship between distributions over vocabularies produced by language models and the various ways... 1/5

sarahwiegreffe's tweet photo. New paper: "Attentiveness to Answer Choices Doesn’t Always Entail High QA Accuracy" 📊💬
https://t.co/5V47V0kfjn

Something I've been thinking a lot about recently is the relationship between distributions over vocabularies produced by language models and the various ways... 1/5 https://t.co/B73mZGpKut

1

71

17

14

19K

Ashish_S_AI retweeted

Wenhao Yu

@wyu_nd

about 3 years ago

📢 Introducing ReFeed: a novel plug-and-play approach to enhance the factuality of large language models via retrieval feedback! Together with @Meng_CS @zhihz0535 @LiangZhenwen @ai2_aristo Read more: https://t.co/V1FowqUG8v

wyu_nd's tweet photo. 📢 Introducing ReFeed: a novel plug-and-play approach to enhance the factuality of large language models via retrieval feedback! Together with @Meng_CS @zhihz0535 @LiangZhenwen @ai2_aristo

Read more: https://t.co/V1FowqUG8v https://t.co/hxXageTpFd

1

73

16

14

7K

Ashish_S_AI retweeted

Wenhao Yu

@wyu_nd

about 3 years ago

📢 Introducing IfQA - the first large-scale open-domain question answering (ODQA) dataset centered around counterfactual reasoning. Together with @Meng_CS @ai2_aristo! Paper link: https://t.co/dwKCzrv3Nn

wyu_nd's tweet photo. 📢 Introducing IfQA - the first large-scale open-domain question answering (ODQA) dataset centered around counterfactual reasoning. Together with @Meng_CS @ai2_aristo!

Paper link: https://t.co/dwKCzrv3Nn https://t.co/iJIKo1WALC

3

63

15

14

6K

Ashish Sabharwal @Ashish_S_AI

about 3 years ago

Introducing 𝗥𝗘𝗙𝗟𝗘𝗫: What does my LLM believe?🧐We show that we can add a 𝗿𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗹𝗮𝘆𝗲𝗿 to an LLM to materialize its "belief graph", repair inconsistencies & produce reasoning chains drawn from a now-consistent system of beliefs! https://t.co/t1w65Sf73d #NLProc

Ashish_S_AI's tweet photo. Introducing 𝗥𝗘𝗙𝗟𝗘𝗫: What does my LLM believe?🧐We show that we can add a 𝗿𝗮𝘁𝗶𝗼𝗻𝗮𝗹 𝗹𝗮𝘆𝗲𝗿 to an LLM to materialize its "belief graph", repair inconsistencies & produce reasoning chains drawn from a now-consistent system of beliefs! https://t.co/t1w65Sf73d #NLProc https://t.co/OPNGZZYGNl

0

15

4

11

3K

Ashish Sabharwal

@Ashish_S_AI

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users