Sumit Kalra | AI & ML Shift @AIMLShift - Twitter Profile

Sumit Kalra | AI & ML Shift

@AIMLShift

3 days ago

@ChainZenit It missed everything

0

3

Sumit Kalra | AI & ML Shift

@AIMLShift

3 days ago

Interesting benchmark: Claude Opus 4.8 failed a basic reasoning task. Q: "How many days of the week contain the letter 'd'?" Opus 4.8 answered: 2 (Sunday and Monday) ❌ Even after being challenged, it insisted none of the other days contained "d" ❌ Meanwhile, Claude Sonnet 4.6 eventually recognized that all 7 days contain the letter 'd' because they all end in "day" ✅ This wasn't a knowledge question. It was a simple string-inspection task. A reminder that larger, more expensive models don't always outperform smaller ones on basic reasoning and attention checks. AI can write code, analyze contracts, and solve complex problems—yet sometimes stumbles on questions a child could answer. Trust, but verify. #Claude #Anthropic #LLM #AI #GenAI #Reasoning #AIEvaluation

AIMLShift's tweet photo. Interesting benchmark:
Claude Opus 4.8 failed a basic reasoning task.

Q: "How many days of the week contain the letter 'd'?"

Opus 4.8 answered: 2 (Sunday and Monday) ❌
Even after being challenged, it insisted none of the other days contained "d" ❌

Meanwhile, Claude Sonnet 4.6 eventually recognized that all 7 days contain the letter 'd' because they all end in "day" ✅

This wasn't a knowledge question. It was a simple string-inspection task.

A reminder that larger, more expensive models don't always outperform smaller ones on basic reasoning and attention checks.

AI can write code, analyze contracts, and solve complex problems—yet sometimes stumbles on questions a child could answer.

Trust, but verify.
#Claude #Anthropic #LLM #AI #GenAI #Reasoning #AIEvaluation

Bhavy☄️

@Bhavani_00007

4 days ago

Claude Opus 4.8 $200 a month for this ????😭😭😭

95

187

7

11

20K

1

2

0

180

Sumit Kalra | AI & ML Shift

@AIMLShift

15 days ago

@karpathy Gods don't retire. they switch lanes. karpathy → anthropic. r&d. frontier. education. the builders of this era aren't done — they're just getting started on what's next. early days? nah. these are the defining days. #AI #FutureOfWork #TechLeadership

0

84

Sumit Kalra | AI & ML Shift

@AIMLShift

19 days ago

Claude Code just shipped /goal. Not a prompt. Not a task. But a completion condition — Claude loops until a fresh model confirms it's done. You set the bar. AI clears it. This is the agentic shift, not just assisted coding. #AI #Engineering #AIMLShift #FutureOfWork #claudecode

AIMLShift's tweet photo. Claude Code just shipped /goal.

Not a prompt.
Not a task.

But a completion condition — Claude loops until a fresh model confirms it's done.

You set the bar. AI clears it.

This is the agentic shift, not just assisted coding.

#AI #Engineering #AIMLShift #FutureOfWork #claudecode

1

3

1

0

158

Who to follow

Software Testing

@testingmag

Software Testing Magazine knowledge and resources on unit, functional, performance, load testing & DevOps. #softwaretesting #testing #agiletesting

Testing and technology

Sumit Kalra | AI & ML Shift

@AIMLShift

21 days ago

@sama 2 months is enough time to get addicted. Not enough time to build dependency safely. Smart engineering leaders will run it with guardrails — not just hype.

0

1

2K

Sumit Kalra | AI & ML Shift

@AIMLShift

21 days ago

Anthropic in Talks to Acquire Stainless — $300M+ Deal Anthropic is in advanced talks to acquire Stainless, a four-year-old developer tools startup, for at least $300 million. Stainless sells software that helps developers and non-technical people build with AI models — and its current customers include Anthropic, OpenAI, and Google. Via@ The Information https://t.co/50cRKPDNeZ

0

1

0

91

Sumit Kalra | AI & ML Shift

@AIMLShift

about 1 month ago

[KnowledgeBytes] MCP vs Skills: MCP connects AI to the world. Skills tell AI what to do in that world. Access ≠ Execution. You need both. #AI #GenAI #AIAgents #MCP #AIEngineering #FutureOfWork #AITesting #AIMLShift

AIMLShift's tweet photo. [KnowledgeBytes] MCP vs Skills:
MCP connects AI to the world.
Skills tell AI what to do in that world.
Access ≠ Execution.
You need both.
#AI #GenAI #AIAgents #MCP #AIEngineering #FutureOfWork #AITesting #AIMLShift https://t.co/y1gTvvpnLp

0

97

Sumit Kalra | AI & ML Shift

@AIMLShift

about 1 month ago

@AndrewYNg The 1:1 ratio is fine. What breaks is when that one person still thinks like a specialist. AI didn't change the tools. It changed who gets to own the outcome.

0

382

Sumit Kalra | AI & ML Shift

@AIMLShift

about 1 month ago

@boristane fair point tbh. but when the system learns from every fix and ships smarter next time... that's not just a service center anymore

0

60

Sumit Kalra | AI & ML Shift

@AIMLShift

about 1 month ago

Your org chart isn’t evolving — it’s being rewritten. Humans set intent. AI agents execute. Managers become orchestrators. Teams shift from headcount → outcomes. This is the new engineering model. 🎥 Watch: https://t.co/h5sKsNJsuI — Sumit Kalra | @AIMLShift

0

1

0

125

Sumit Kalra | AI & ML Shift

@AIMLShift

about 1 month ago

@McKinsey The new battleground in tech services isn't talent. Not price. Not methodology. But who can build, orchestrate, and govern agents at enterprise scale. The services layer isn't disappearing. It's just unrecognizable from what it was. #AgenticAI #FutureOfWork #Engineering

0

39

Sumit Kalra | AI & ML Shift

@AIMLShift

about 1 month ago

Leadership skillset is changing in the AI world. From making decisions → to designing decision systems From knowing answers → to asking better questions From managing people → to orchestrating humans + AI The leaders who adapt won’t just keep up—they’ll define the future. #AILeadership #FutureOfWork #Leadership #AIMindset #HumanAI #Innovation #TechLeadership

AIMLShift's tweet photo. Leadership skillset is changing in the AI world.

From making decisions
→ to designing decision systems

From knowing answers
→ to asking better questions

From managing people
→ to orchestrating humans + AI

The leaders who adapt won’t just keep up—they’ll define the future.

#AILeadership #FutureOfWork #Leadership #AIMindset #HumanAI #Innovation #TechLeadership

0

1

76

Sumit Kalra | AI & ML Shift

@AIMLShift

about 2 months ago

The new manager isn't a supervisor. They're a conductor — Of people. Of agents. Of robots. Different skills. Different metrics. Different feedback loops. The job description has already changed. Hiring has moved on. Are you? h/t McKinsey — https://t.co/MLqJK60tmg #AI #FutureOfWork #Leadership

0

1

0

91

Sumit Kalra | AI & ML Shift

@AIMLShift

about 2 months ago

The chart everyone will quote: 57% automatable. The number from the actual report that matters more: ~90% of companies invested in AI, fewer than 40% report measurable gains. The gap isn't the tech. It's that we keep bolting AI onto workflows designed for a pre-AI world. Task-level automation ≠ transformation. The unlock is workflow redesign — and most leaders haven't started.

0

1

46

Sumit Kalra | AI & ML Shift

@AIMLShift

about 2 months ago

@mattpocockuk skills are the real unlock tbh. a good /to-prd or /domain-model beats whatever model upgrade dropped this week. this is where claude code starts feeling like a studio not an assistant.

0

46

Sumit Kalra | AI & ML Shift

@AIMLShift

about 2 months ago

@thdxr respect for saying it out loud. most people dress this up as "v2" and pretend the debt was a plan all along. sometimes the foundations just fight you harder than the product does.

0

2

0

1K

Sumit Kalra | AI & ML Shift

@AIMLShift

about 2 months ago

@theo lol the sota shelf life is like 36 hours now. at this point i just pick whatever fits the workflow and stop reading the leaderboards.

0

308

Sumit Kalra | AI & ML Shift

@AIMLShift

about 2 months ago

@GergelyOrosz felt this too. mid agent loop the last thing i need is the model lecturing me on my choice. just need a toggle to shut off the opinions when i already know what i want.

0

87

Sumit Kalra | AI & ML Shift

@AIMLShift

about 2 months ago

@emollick honestly we maintain this internally already. every model drop quietly breaks 3-4 prompts in our agents. would save so much time if labs just shipped a diff.

0

17

Sumit Kalra | AI & ML Shift

@AIMLShift

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users