Avijit Ghosh @evijit - Twitter Profile

Avijit Ghosh

@evijit

about 3 hours ago

@cgeorgiaw @AIatMeta Love a good viz

0

18

Avijit Ghosh

@evijit

about 3 hours ago

@YichuanM Maybe this:

Adithya S K

@adithya_s_k

7 days ago

Introducing Repo2RLEnv Turn any repository into runnable, verifiable coding environments built from real PRs and commits for coding-agent evaluation or RL training > uv pip install repo2rlenv

17

459

43

405

58K

0

50

Avijit Ghosh

@evijit

about 3 hours ago

@JoshPurtell @carrynointerest FWIW, VCs have historically been one of the biggest champions of OSS as well, for many reasons including cost (see work posted by A16z people for example). VC Strat is not a monolith

0

4

evijit retweeted

Google AI Developers

@googleaidevs

1 day ago

Building autonomous agents for scientific discovery? 🧬🤖 @GoogleDeepMind Science Skills is now available on GitHub. We've open-sourced this specialized toolkit to accelerate your agentic workflows with scientific grounding and higher token efficiency. Download now ↓ https://t.co/cwp1HOeKvo

25

1K

206

1K

63K

Who to follow

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞

@hima_lakkaraju

AI Professor @Harvard; Senior Staff Research Scientist @GoogleAI; @trustworthy_ml #AI #XAI; AI PhD from Stanford; Sloan/Kavli Fellow, MIT TR #35Under35

Prithviraj (Raj) Ammanabrolu

@rajammanabrolu

Reinforcement Learning and Language. Assistant Prof @UCSanDiego. Research Scientist @Nvidia.

Irene Solaiman

@IreneSolaiman

ai stuff @huggingface 🤗 founded @evaluatingevals views=mine former: @OpenAI @Harvard aspiring ukulele-singer

Avijit Ghosh

@evijit

2 days ago

Did you know that whether or not your benchmark dataset is private or public has little bearing on how fast it saturates? In our ICML 2026 paper, we look into that hypothesis (and more), and provide a comprehensive analysis into why benchmarks saturate. Read the paper! 👇

EvalEval Coalition @evaluatingevals

2 days ago

🚨 As AI models improve, many benchmarks are becoming saturated and losing their ability to distinguish between models. 🚨 Check out our new @icmlconf paper: “When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation”

evaluatingevals's tweet photo. 🚨 As AI models improve, many benchmarks are becoming saturated and losing their ability to distinguish between models. 🚨

Check out our new @icmlconf paper: “When AI Benchmarks Plateau: A Systematic Study of Benchmark Saturation” https://t.co/gMW1B8k1gg

2

36

11

16

8K

1

9

3

6

2K

Avijit Ghosh

@evijit

3 days ago

If you’re an NIH grantee who has to abide by NIH release policies, you can do that on Hugging Face 🤗

Daniel van Strien @vanstriendaniel

3 days ago

Hugging Face is the home for AI & ML across every domain, including biomedical! The @NIH just added the @huggingface Hub to its official list of Generalist Repositories for data sharing. NIH-funded? You can point to the Hub in your data sharing plan 🤗

vanstriendaniel's tweet photo. Hugging Face is the home for AI & ML across every domain, including biomedical!

The @NIH just added the @huggingface Hub to its official list of Generalist Repositories for data sharing.

NIH-funded? You can point to the Hub in your data sharing plan 🤗 https://t.co/OhJIMDaGcW

5

74

23

24

27K

0

9

1

947

Avijit Ghosh

@evijit

3 days ago

Weekend mini project! Since commentary on AI is inherently interdisciplinary, we connected the observations in the @Pontifex's encyclical with decades of scholarship in Responsible AI and Ethics research and created an interactive space with these annotations! Work with Ian Reynolds, @YJernite, and @mmitchell_ai Lots to unpack. We started with 105 annotations. Please submit pull requests for more that we may have missed! https://t.co/1aq3rCfdGQ

1

5

0

2

341

Avijit Ghosh

@evijit

3 days ago

@markfielding99 @mmitchell_ai That’s what the original encyclical says :)

1

0

28

Avijit Ghosh

@evijit

4 days ago

@trydotworks @natolambert This makes no sense? There are several companies who have never open sourced a model even when they were first starting out. It’s fine to optimize for reach and profits it’s just that open science doesn’t have those same motivations for the people pushing it 🤷‍♂️

0

15

Avijit Ghosh

@evijit

5 days ago

@AndrewCurran_ Sometimes its funny to find edge cases -- Claude 4.7 Opus struggled hard with a duckdb js typecasting issue that GPT 5.4 oneshotted for example. The spiky intelligence theory continues to hold

0

3

0

304

Avijit Ghosh

@evijit

5 days ago

I feel like the current state of LLM evals overindexes on prompts/dataset and underindexes on metrics. You used to be able to measure something abstract like “fairness” in great many ways using the same overused COMPAS dataset, with often highly contextualized beautifully designed metrics. Now it’s all exact matches and pass@k (basically different flavors of accuracy) and that’s mostly it. I do see cost and time measurements but those I would argue are properties of the system itself than of the interaction being mesured. Of course these evals will get saturated, you’re never measuring anything novel! Bring back measurement science to actually measure behavior, instead of engineering new prompts and then immediately discarding good data on the first sign of saturation :)

3

8

0

1

347

Avijit Ghosh

@evijit

5 days ago

@iamtrask TIL

0

10

Avijit Ghosh

@evijit

6 days ago

This tweet and the responses below it will one day be studied as eagerly as chronicles of what people in the 1900s thought the future would look like

evijit's tweet photo. This tweet and the responses below it will one day be studied as eagerly as chronicles of what people in the 1900s thought the future would look like https://t.co/LHexGknfYI

Joe Weisenthal

@TheStalwart

6 days ago

I said this to @citrini last night, but in the future, will we really need storage? I take a ton of photos of my kids, and they are on my phone and in a cloud. But in the future, won't I just tell a model "generate a photo from my son's 7th birthday" and it'll be just as good?

334

2K

70

291

948K

2

1

0

1

2K

Avijit Ghosh

@evijit

6 days ago

@iamtrask Some of these predictions have come true, so why not! Still no flying cars :(

1

0

48

Avijit Ghosh

@evijit

6 days ago

@rlacombe @SylvainGariel @AnthropicAI Hello from Boston! Also love that username CoolScience :)

0

2

0

34

Avijit Ghosh

@evijit

6 days ago

Some of these new job descriptions I see are frankly insane? “hard science” “AI-pilled”etc. what happened to humility? leadership? willingness to learn? In the modern workplace, AI agents are putting humans into their own “productivity” islands disconnected from each other 🫠

0

3

1

0

129

Avijit Ghosh

@evijit

6 days ago

@LChoshen @_lewtun @andrewwhite01 Thank you Legend

0

25

Avijit Ghosh

@evijit

6 days ago

@LChoshen @_lewtun @andrewwhite01 We had this! But yes could be cool? I just really don’t know why people keep doing chart crimes despite all the tweets about them, feels like a social experiment at this point https://t.co/6gJbbAZEdU

2

0

1K

Avijit Ghosh

@evijit

7 days ago

@eliebakouch Reminds me of: https://t.co/kHb4lNqoPA

0

439

Avijit Ghosh

@evijit

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users