Sravan Jayanthi @sravanjay - Twitter Profile

2 days ago

Not the greatest video but a historic day with incredible people! The energy was unreal in the @xai office today. $SPCX The journey has only begun, now back to making history 🚀🍾

6

250

11

24

9K

sravanjay retweeted

akshey

@aksheyd

16 days ago

and this is just the intro

19

153

7

10

10K

sravanjay retweeted

Gemini @Gemini

18 days ago

Introducing Command Center Built in collaboration with @SpaceX and @xAI to bring AI directly into Gemini. Command Center is a personalized market feed that surfaces real time insights across every topic that matters to you Create your Command Center today

32

137

63

24

88K

sravanjay retweeted

Artificial Analysis

@ArtificialAnlys

18 days ago

Artificial Analysis and IBM Research are launching ITBench-AA, the first in a new series of benchmarks evaluating models on agentic enterprise IT tasks, starting with Site Reliability Engineering tasks where frontier models score below 50% ITBench-AA’s SRE tasks benchmark model performance on Kubernetes incident response, where models must diagnose live systems by reading logs, tracing dependencies, and identifying root-cause entities across complex infrastructure. The underlying ITBench dataset has been developed by @IBM's Software Innovation Lab, leveraging IBM’s deep expertise in enterprise IT operations Artificial Analysis has worked closely with IBM over the last 6 months to develop a implementation of the dataset for frontier AI evaluation, beginning with Site Reliability Engineering (SRE) and expanding to Financial Operations (FinOps) and Chief Information Security Officer (CISO) tasks over time ITBench-AA SRE overview: ➤ 59 SRE tasks in total: 40 public tasks and 19 brand new, held-out tasks ➤ Each task provides a Kubernetes incident snapshot containing alerts, events, traces, metrics, logs, and application topology. The model must identify the minimal set of independent root-cause Kubernetes entities responsible for the incident ➤ Faults span typical SRE failure modes including infrastructure, service, application, and chaos-injected incidents, such as resource quota exhaustion, rollout failures, connection pool exhaustion, and network partitions Methodology details: ➤ Agentic harness: each task is solved by the model running in our open-source Stirrup reference harness, with shell access to a sandboxed file system containing the relevant logs and snapshots. 100-turn cap per task, 3 repeats per task ➤ Models submit a list of root-cause entities (Kubernetes Deployments, Services, Pods, etc.) they believe caused the incident. Each submission is compared against a ground-truth set of root causes provided by IBM Research ➤ Scoring uses average precision at full recall: if a model misses any of the ground-truth root causes, it scores 0.0 for that repeat. If it identifies all of them, it is awarded a score equal to its precision - the share of its submitted entities that are actual root causes, i.e. true positives / (true positives + false positives). The headline score is the average across 59 tasks × 3 repeats. ➤ The harness (Stirrup) is held constant across all evaluated models, allowing an apples-to-apples comparison between models. Key findings: ➤ Claude Opus 4.7 (Adaptive Reasoning, Max Effort) leads at 47%, followed by GPT-5.5 (xhigh) at 46% and Qwen3.7 Max at 42% ➤ All frontier models score below 50%, making ITBench-AA SRE one of the least saturated agentic benchmarks in our suite. For context, frontier models score considerably higher on Terminal-Bench ➤ Turn counts vary nearly 3x and longer trajectories do not translate to higher accuracy. GPT-5.5 (xhigh) averages 31 turns per task at 46%, while Gemini 3.1 Pro Preview averages 83 turns at 30%. Models that over-investigate tend to surface upstream fault-injection mechanisms or co-occurring symptoms as false positives ➤ GLM-5.1 (Reasoning) leads open weights models at 40%, effectively tied with Gemini 3.5 Flash (high). DeepSeek V4 Pro (Reasoning, Max Effort) follows at 38%, with Gemma 4 31B (Reasoning) at 37%, ahead of Gemini 3.1 Pro Preview at 30%

32

554

78

165

201K

Who to follow

Lisa Thiergart

@LisaThiergart

Co-founder @SL5TaskForce | prev. founder & RL at TGT @MIRIBerkeley, @MATSProgram Mentor. AI Security | Secure Datacenters | Neurotech | Technofuturist

Dhruva Bansal

@BansalDhruva

Co-Founder at Stealth Startup | Previously, Stanford, DeepMind, Together AI

Gary McMurray

@gvmcmurray

Division Chief, Intelligent Sustainable Technology Division, GTRI and Associate Director for Institute for Robotics and Intelligent Machines at Georgia Tech

sravanjay retweeted

Boris Skorobogaty

@theskory

28 days ago

Yeah, /implement is easily my #1 skill. It’s not just the implement → review → fix loop that gives way better output — it also has a built-in memory system. After each task, the orchestrator summarises the issues that were fixed, saves them to a persistent file of “most common problems,” and on the next run injects that knowledge into the implementer + reviewer prompts so they avoid repeating the same mistakes.

132

1K

533

152

788K

Sravan Jayanthi

@sravanjay

about 1 month ago

Elon Musk is the greatest of all time

0

1

0

32

Sravan Jayanthi

@sravanjay

about 1 month ago

Grok Build has been the most intelligent product we’ve released to date 🚀 The world’s most technologically advanced enterprises are using Grok Build at scale across thousands of developers & seeing millions of dollars of savings versus competitor tools

xAI

@xai

about 1 month ago

An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at https://t.co/bpTHpjivWD

xai's tweet photo. An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers.

Through this early beta, we will improve the model and product based on your feedback.

Try it at https://t.co/bpTHpjivWD https://t.co/Rlg4qMLkrv

2K

10K

1K

3K

57M

0

4

0

95

Sravan Jayanthi

@sravanjay

about 1 month ago

@krgeorge Tech visionary @krgeorge!!! Congrats on the launch 🚀🚀🚀

0

1

0

66

sravanjay retweeted

akshey

@aksheyd

about 1 month ago

first ship of many 🙂🚀

9

93

1

0

2K

Sravan Jayanthi

@sravanjay

about 1 month ago

@aksheyd Sisyphus has returned with the treasure 🚀🚀🚀

1

2

0

80

sravanjay retweeted

xAI

@xai

about 1 month ago

An early beta of Grok Build, an agentic CLI for coding, building apps, and automating workflows is now available for SuperGrok Heavy subscribers. Through this early beta, we will improve the model and product based on your feedback. Try it at https://t.co/bpTHpjivWD

2K

10K

1K

3K

57M

sravanjay retweeted

xAI

@xai

about 1 month ago

Grok 4.3 is now live on the xAI API. It’s our fastest, most intelligent model to date. It tops the @ArtificialAnlys leaderboards in agentic tool calling and instruction following, and ranks #1 in @ValsAI enterprise domains like case law and corporate finance. Grok 4.3 supports a 1 million token context window and is priced at $1.25/m input and $2.50/m output. Create an API key and start building: https://t.co/JDRUt1UOUm

xai's tweet photo. Grok 4.3 is now live on the xAI API. It’s our fastest, most intelligent model to date.

It tops the @ArtificialAnlys leaderboards in agentic tool calling and instruction following, and ranks #1 in @ValsAI enterprise domains like case law and corporate finance.

Grok 4.3 supports a 1 million token context window and is priced at $1.25/m input and $2.50/m output.

Create an API key and start building: https://t.co/JDRUt1UOUm

945

11K

1K

2K

116M

sravanjay retweeted

Eric Jiang

@veggie_eric

about 1 month ago

When training Grok 4.3, we spoke directly with devs and businesses to understand what they actually needed: a model that’s fast, affordable, and great at tool calling. The result is a daily driver that doesn't just look good on random benchmarks, but is actually useful in the real world. 💰 $1.25 in / $2.50 out ⚡️ 100 tokens / second 📖 1 million context window Try it through Hermes Agent or direct through the xAI API!

veggie_eric's tweet photo. When training Grok 4.3, we spoke directly with devs and businesses to understand what they actually needed: a model that’s fast, affordable, and great at tool calling. The result is a daily driver that doesn't just look good on random benchmarks, but is actually useful in the real world.

💰 $1.25 in / $2.50 out
⚡️ 100 tokens / second
📖 1 million context window

Try it through Hermes Agent or direct through the xAI API!

357

3K

798

303

659K

Sravan Jayanthi

@sravanjay

about 1 month ago

@jshobrook Insane price to intelligence 🚀🚀🚀

0

1

0

188

sravanjay retweeted

Jonathan Shobrook

@jshobrook

about 2 months ago

We beat Sonnet 4.6 with a 500B model. Bigger runs are on the way.

124

2K

55

185

275K

Sravan Jayanthi

@sravanjay

about 2 months ago

@SpaceX @cursor_ai This will be marked in history as when SpaceX leapfrogged to AGI

3

28

0

5K

sravanjay retweeted

Dylan Patel

@dylan522p

about 2 months ago

SpaceXAICursor IPO gonna be crazy

35

1K

48

65

158K

sravanjay retweeted

SpaceX

@SpaceX

about 2 months ago

SpaceXAI and @cursor_ai are now working closely together to create the world’s best coding and knowledge work AI. The combination of Cursor’s leading product and distribution to expert software engineers with SpaceX’s million H100 equivalent Colossus training supercomputer will allow us to build the world’s most useful models. Cursor has also given SpaceX the right to acquire Cursor later this year for $60 billion or pay $10 billion for our work together.

2K

38K

5K

4K

21M

Sravan Jayanthi

@sravanjay

2 months ago

@aksheyd @elonmusk Oh you’re too kind to Grok Agent 😂

1

3

0

109

Sravan Jayanthi

@sravanjay

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users