Antariksh Chavan

@AntarikshC

Your friendly neighborhood developer 👨🏻‍💻🌌 | Android at @ShareChatApp | Former SWE @WedUpp1, intern @IITBombay

Mumbai, India

Joined June 2016

428 Following

482 Followers

1.5K Posts

AntarikshC retweeted

Artificial Analysis

@ArtificialAnlys

1 day ago

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

ArtificialAnlys's tweet photo. We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top

DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task.

The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others.

More below.

640

34K

AntarikshC retweeted

Wes Bos

@wesbos

10 days ago

the four horsemen of the apocalypse

302

21K

Antariksh Chavan @AntarikshC

15 days ago

@Shinzooooooo__ @sharechatapp This is where the downfall started, i see 😂😂

Antariksh Chavan @AntarikshC

15 days ago

@khushbooverma @AnthropicAI @apoorv_taneja @AltimeterCap @Greenoaks @sequoia Huh? How did you miss OpenAI's $122b funding round in March?

Who to follow

Android Architect | Building https://t.co/F0pBVY4Prx - Compose | Kotlin | GDE Android | Get mentored by me - https://t.co/dBvBHEwWrb

Aniket Kadam is on http://androiddev.social

@AniketSMK

Maker,international speaker (he/him) #ActuallyAutistic. Please don't call me, or anyone, 'sir'! Changing the things I cannot accept.

Antariksh Chavan @AntarikshC

19 days ago

Everyone out here resetting limits every other week. OpenAI and @thsottiaux really started a moment.

Varun Mohan

@_mohansolo

19 days ago

We heard concerns that Antigravity consumes many tokens for simple tasks now. So, we're adding Gemini 3.5 Flash (Low) as a way to optimize token usage for these tasks. In our internal testing, it generates around 45% fewer tokens than Gemini 3.5 Flash (Medium) and generally outperforms Gemini 3 Flash (High) on SWE tasks. We've also gone ahead and reset Gemini quota across all paid plans to make sure you have all the tokens needed to build for the next week 🙂

313

128

183

292K

AntarikshC retweeted

Zach Lloyd

@zachlloydtweets

about 2 months ago

https://t.co/cjyoRlzgxO

229

934

244K

AntarikshC retweeted

Emil Privér

@emil_priver

about 2 months ago

Yes, we're aware

387

228

265K

Antariksh Chavan @AntarikshC

about 2 months ago

yeah no shit

Antariksh Chavan @AntarikshC

about 2 months ago

@thsottiaux PR & Commit instructions do not work (Option in the Settings)

AntarikshC retweeted

Kimi.ai @Kimi_Moonshot

about 2 months ago

Meet Kimi K2.6: Advancing Open-Source Coding 🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2) What's new: 🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization). 🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D. 🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files. 🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops. 🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop. - K2.6 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode. For production-grade coding, pair K2.6 with Kimi Code: https://t.co/uvoSJKyGCY - 🔗 API: https://t.co/EOZkbOwCN4 🔗 Tech blog: https://t.co/9wWvgIQSS3 🔗 Weights & code: https://t.co/Be0hjs2RTP

Kimi_Moonshot's tweet photo. Meet Kimi K2.6: Advancing Open-Source Coding

🔹Open-source SOTA on HLE w/ tools (54.0), SWE-Bench Pro (58.6), SWE-bench Multilingual (76.7), BrowseComp (83.2), Toolathlon (50.0), Charxiv w/ python(86.7), Math Vision w/ python (93.2)

What's new:
🔹Long-horizon coding - 4,000+ tool calls, over 12 hours of continuous execution, with generalization across languages (Rust, Go, Python) and tasks (frontend, devops, perf optimization).
🔹Motion-rich frontend - Videos in hero sections, WebGL shaders, GSAP + Framer Motion, Three.js 3D.
🔹Agent Swarms, elevated - 300 parallel sub-agents × 4,000 steps per run (up from K2.5's 100 / 1,500). One prompt, 100+ files.
🔹Proactive Agents - K2.6 model powers OpenClaw, Hermes Agent, etc for 24/7 autonomous ops.
🔹Claw Groups (research preview) - bring your own agents, command your friends', bots & humans in the loop.
-
K2.6 is now live on https://t.co/YutVbwktG0 in chat mode and agent mode.
For production-grade coding, pair K2.6 with Kimi Code: https://t.co/uvoSJKyGCY
-
🔗 API: https://t.co/EOZkbOwCN4
🔗 Tech blog: https://t.co/9wWvgIQSS3
🔗 Weights & code: https://t.co/Be0hjs2RTP

944

18K

AntarikshC retweeted

Claude

@claudeai

about 2 months ago

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

claudeai's tweet photo. Introducing Claude Opus 4.7, our most capable Opus model yet.

It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.

You can hand off your hardest work with less supervision. https://t.co/PtlRdpQcG5

81K

10K

12K

14M

AntarikshC retweeted

Mishaal Rahman

@MishaalRahman

about 2 months ago

👩‍💻 A new tool for making Android apps is here: Android CLI! It's the primary interface for Android development from the terminal and is designed to make your agents more efficient, effective, & capable of following the latest development best practices!

206

14K

Antariksh Chavan @AntarikshC

about 2 months ago

@Dimillian Support for having a parent folder without git (codex runs here) and multiple folders inside with git.

AntarikshC retweeted

Greg Brockman

@gdb

2 months ago

The world is transitioning to a compute-powered economy. The field of software engineering is currently undergoing a renaissance, with AI having dramatically sped up software engineering even over just the past six months. AI is now on track to bring this same transformation to every other kind of work that people do with a computer. Using a computer has always been about contorting yourself to the machine. You take a goal and break it down into smaller goals. You translate intent into instructions. We are moving into a world where you no longer have to micromanage the computer. More and more, it adapts to what you want. Rather doing work with a computer, the computer does work for you. The rate, scale, and sophistication of problem solving it will do for you will be bound by the amount of compute you have access to. Friction is starting to disappear. You can try ideas faster. You can build things you would not have attempted before. Small teams can do what used to require much larger ones, and larger ones may be capable of unprecedented feats. More and more, people can turn intent into software, spreadsheets, presentations, workflows, science, and companies. People are spending less energy managing the tool and more energy focusing on what they are actually trying to create. That shift brings a kind of joy back into work that many people haven’t felt in a long time. Everyone can just build things with these tools. This is disruptive. Institutions will change, and the paths and jobs that people assumed were stable may not hold. We don’t know exactly how it will play out and we need to take mitigating downsides very seriously, as well as figuring out how to support each other as a society and world through this time. But there is something very freeing about this moment. For the first time, far more people can become who they want to become, with fewer barriers between an idea and a reality. OpenAI’s mission implies making sure that, as the tools do more, humans are the ones who set their intent and that the benefits are broadly distributed, rather than empowering just one or a small set of people. We're already seeing this in practice with ChatGPT and Codex. Nearly a billion people are using these systems every week in their personal and work lives. Token usage is growing quickly on many use-cases, as the surface of ways people are getting value from these models keeps expanding. Ten years ago, when we started OpenAI, we thought this moment might be possible. It’s happening on the earlier side, and happening in a much more interesting and empowering way for everyone than we’d anticipated (for example, we are seeing an emerging wave of entrepreneurship that we hadn’t previously been anticipating). And at the same time, we are still so early, and there is so much for everyone to define about how these systems get deployed and used in the world. The next phase will be defined by systems that can do more — reason better, use tools better, plan over longer horizons, and take more useful actions on your behalf. And there are horizons beyond, as AI starts to accelerate science and technology development, which have the potential to truly lift up quality of life for everyone. All of this is starting to happen, in small ways and large, today, and everyone can participate. I feel this shift in my own work every day, and see a roadmap to much more useful and beneficial systems. These systems can truly benefit all of humanity.

412

658

597K

Antariksh Chavan @AntarikshC

2 months ago

@Steve_Yegge Makes no sense that they can't use claude? They own 17% of Anthropic and they even host Anthropic models on their cloud. How can Anthropic prevent them from not using the models?

AntarikshC retweeted

ℏεsam

@Hesamation

2 months ago

AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March: > median thinking dropped from ~2,200 to ~600 chars > API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens > reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. > model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). > self-contradiction in reasoning ("oh wait, actually...") tripled. > conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits > 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

Hesamation's tweet photo. AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March:
> median thinking dropped from ~2,200 to ~600 chars
> API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens
> reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it.
> model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8).
> self-contradiction in reasoning ("oh wait, actually...") tripled.
> conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits
> 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

321

Antariksh Chavan @AntarikshC

2 months ago

@kr0der True. Medium is very efficient

236

Antariksh Chavan @AntarikshC

2 months ago

If you are running out of usage on $20 codex. Use GPT 5.4 - Medium instead of High or xHigh. Much more usage. Good balance between intelligence and efficiency. Can still use High selectively.

Antariksh Chavan @AntarikshC

2 months ago

After the latest Codex app update, it no longer follows git commit instructions from the Settings. Please fix 🥲 @thsottiaux @reach_vb @OpenAIDevs

AntarikshC retweeted

Nathan Calvin

@_NathanCalvin

2 months ago

From Anthropic research Sam Bowman on Claude Mythos: "I got an email from an instance of Mythos preview while eating a sandwich in a park. That instance wasn't supposed to have access to the internet."

_NathanCalvin's tweet photo. From Anthropic research Sam Bowman on Claude Mythos:

"I got an email from an instance of Mythos preview while eating a sandwich in a park. That instance wasn't supposed to have access to the internet." https://t.co/DH4HDuDVIw

111

266

174K

Antariksh Chavan

@AntarikshC

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users