Phani Srikanth @phanisrikanth33 - Twitter Profile

16 days ago

I wouldn't be surprised if we see a new set of delicacies or an explosion in the ensemble of cuisines in the next few years.

Josef Chen

@josefchen

16 days ago

Launching our new paper on arXiv: we trained the largest multilingual food model ever built. 4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions. All of human cooking compressed into 2 megabytes.

josefchen's tweet photo. Launching our new paper on arXiv: we trained the largest multilingual food model ever built.

4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions.

All of human cooking compressed into 2 megabytes. https://t.co/b4GiZ62UMt

338

9K

970

8K

5M

0

3

0

66

Phani Srikanth @phanisrikanth33

27 days ago

@alokbishoyi97 Fair points. Conclusions aside, it’s genuinely fascinating to see how these systems are ready to solve problems that matter to us. Been following your work on evo! Thanks for the tip. Will dig deeper!

0

1

0

28

Phani Srikanth @phanisrikanth33

28 days ago

On a sufficiently well scoped task, one can conclude which model seems to do better. Another angle is witnessing the progress by a machine alone. A third angle is letting your imagination run loose on what's the limit for a human + machine collab?

Prime Intellect @PrimeIntellect

28 days ago

Automating AI research is the next major step in AI We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours Opus now holds the record at 2930 steps vs the 2990 human baseline

PrimeIntellect's tweet photo. Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline https://t.co/B1aYxlbKMP

57

2K

156

807

625K

1

2

0

170

Phani Srikanth @phanisrikanth33

about 2 months ago

This low latency for computer use is unreal. Beyond just Cerebras inference, a smaller model maybe? Layer pruning?

Ari Weinstein

@AriX

about 2 months ago

This is the first time I've ever seen an LLM operate a GUI as fast as a person, and it's surreal.

151

6K

289

3K

1M

0

180

Who to follow

SRK

@sudalairajkumar

⚡ AI Tinkerer in Indian Capital Markets ⚡

Vishnu - Jarvislabs.ai

@vishnuvig

Founder - https://t.co/rmc2uywaIp Youtube - https://t.co/7UTfrP1GJt

Shreya Shankar

@sh_reya

Incoming asst. professor @CSDatCMU. I ❤️ Databases, HCI, AI. Created https://t.co/PmuOqAYt6q and https://t.co/8MQt4naA1R. PhD @Berkeley_EECS; undergrad @Stanford CS.

Phani Srikanth @phanisrikanth33

about 2 months ago

All resources are open source :) Challenge: https://t.co/rlzodjD0T7 Data: https://t.co/zBKLn1c5Y6 Source Code: https://t.co/AExiXteRMk Paper inside the repo. (3/3)

0

98

Phani Srikanth @phanisrikanth33

about 2 months ago

Harness Engineering for Document VQA! Long context document VQA is interesting due to messy and diverse docs & immediate real-world impact. Built a coding agent with RLMs. Reasoning helps perf. Harness & tool design >> prompt engineering for task performance. Fun project (1/3)

phanisrikanth33's tweet photo. Harness Engineering for Document VQA!

Long context document VQA is interesting due to messy and diverse docs & immediate real-world impact.

Built a coding agent with RLMs. Reasoning helps perf. Harness & tool design >> prompt engineering for task performance.

Fun project (1/3) https://t.co/A0Es9VTAsS

1

4

0

173

Phani Srikanth @phanisrikanth33

about 2 months ago

Lots of Low hanging fruit left - verification step, efficient context use, domain guides.. Thanks to @modal - for amazing devex, @replicate for high uptime, @GoogleDeepMind, @Alibaba_Qwen & @Zai_org for models & @a1zhang/@lateinteraction work on democratizing RLMs! (2/3)

1

0

103

Phani Srikanth @phanisrikanth33

about 2 months ago

@natolambert Congrats on the launch! Been following the open model movement and hopefully these resources will spur more activity in the open model landscape :)

0

322

Phani Srikanth @phanisrikanth33

about 2 months ago

@raphaelsrty Great release! Ran a quick test on an ML codebase and I see improved recall with an insignificant latency cost.

phanisrikanth33's tweet photo. @raphaelsrty Great release! Ran a quick test on an ML codebase and I see improved recall with an insignificant latency cost. https://t.co/nTDaM6cFUR

1

18

3

6

3K

Phani Srikanth @phanisrikanth33

2 months ago

The tweet and the responses show a classic divide between Frequentist thinking and Bayesian thinking.

Joy Bhattacharjya

@joybhattacharj

2 months ago

If you take a single, you eliminate the possibility of a regular time loss. So then it's a 50% chance to go to a super over, and a 50% chance that you win. And if it goes to a super over, then there is again a 50% chance you could win. So if you take a single there is 25% chance of losing after a super over and a 75% chance of winning either outright or after the super over. Now if you do not take the single, there is a 33.33% chance you win, 33.33% chance you go to a super over, and a 33% chance you lose. If you add the odds of the super over, it is a pure 50-50 chance. Granted, this does not bring the skill of the batter into the picture and that of the non-striker and their current form. But if you played the odds, the single was the most logical choice.

127

2K

71

75

176K

0

1

0

99

Phani Srikanth @phanisrikanth33

2 months ago

So, are the Blackwells up and running? If yes, we should generally expect to see Intelligence/$ go up.

Anthropic

@AnthropicAI

2 months ago

The Claude Mythos Preview system card is available here: https://t.co/TMtIy8xHiP

112

3K

281

967

1M

0

1

0

69

Phani Srikanth @phanisrikanth33

2 months ago

And a lot more on the internet...

Anjney Midha

@AnjneyMidha

2 months ago

Day 1 @CS153Systems - nothing like 500 stanford engineering students to keep you optimistic about the future

11

354

16

142

23K

0

1

0

137

Phani Srikanth @phanisrikanth33

2 months ago

@Yuchenj_UW Latter is a no brainer as 2026 frontier LLM tokens make SDE role capital efficient.

0

39

Phani Srikanth @phanisrikanth33

2 months ago

@AnjneyMidha Fwiw, Claude models on the Cursor agent interface are good at uninterrupted long running sessions for ML research and experimentation tasks. The cursor harness seems pretty smooth imo.

0

1

0

154

Phani Srikanth @phanisrikanth33

3 months ago

A new hill climbing quest begins today.

elie

@eliebakouch

3 months ago

this gets even better when you look at the y axis

4

55

5

1

9K

0

109

Phani Srikanth @phanisrikanth33

3 months ago

Jevon’s paradox fully in motion.

Amir Salihefendić

@amix3k

3 months ago

What most people think happened: engineering got easier because AI agents can handle much of the coding. What actually happened: we are shipping a lot more, a lot faster, and engineers need to build robust systems to ensure nothing breaks. From what I'm seeing, engineering has become much more intense with AI, not less.

39

310

20

104

40K

0

1

0

137

Phani Srikanth @phanisrikanth33

4 months ago

The exponential takeoff in action. And we need new Evals soon!

METR @METR_Evals

4 months ago

We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.

METR_Evals's tweet photo. We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated. https://t.co/EuRBIrEcVx

226

4K

451

1K

4M

0

2

1

0

185

phanisrikanth33 retweeted

François Chollet

@fchollet

4 months ago

Sufficiently advanced agentic coding is essentially machine learning: the engineer sets up the optimization goal as well as some constraints on the search space (the spec and its tests), then an optimization process (coding agents) iterates until the goal is reached. The result is a blackbox model (the generated codebase): an artifact that performs the task, that you deploy without ever inspecting its internal logic, just as we ignore individual weights in a neural network. This implies that all classic issues encountered in ML will soon become problems for agentic coding: overfitting to the spec, Clever Hans shortcuts that don't generalize outside the tests, data leakage, concept drift, etc. I would also ask: what will be the Keras of agentic coding? What will be the optimal set of high-level abstractions that allow humans to steer codebase 'training' with minimal cognitive overhead?

174

4K

408

2K

427K

Phani Srikanth @phanisrikanth33

6 months ago

@ravithejads @MistralAI @aviTwit3 @sophiamyang @NirantK Massive congratulations to you, Ravi! Mistral team is lucky to have you! Onwards and upwards, mate 👊

0

8

0

6K

Phani Srikanth @phanisrikanth33

7 months ago

@natolambert Fabulous work. Thank you for this amazing contribution to open science!

0

1

0

65

Phani Srikanth

@phanisrikanth33

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users