Rohitash Panda

@RohitashPanda

Technical Lead/Architect, Software “Systems” generalist. Prev: @Arcserve @EMC @Oracle @HPE. Databases. Storage . Systems. Infra. Travel Tech. AI

Bengaluru, India

Joined August 2011

7.3K Following

451 Followers

6.5K Posts

RohitashPanda retweeted

Andrej Karpathy

@karpathy

1 day ago

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

24K

RohitashPanda retweeted

Vivek Galatage

@vivekgalatage

5 days ago

the db primer https://t.co/wnBHCp0j9n

284

305

20K

RohitashPanda retweeted

Abhishek🌱

@Abhishekcur

5 days ago

This article is literally wow. i read it 2 years ago, and coming back to it today, it still feels new. few tutorials teach computers in a way that permanently changes how you think. this is one of them. If you've never built a VM before, you're missing one of the biggest "aha" moments in computer science.

Abhishekcur's tweet photo. This article is literally wow.

i read it 2 years ago, and coming back to it today, it still feels new.

few tutorials teach computers in a way that permanently changes how you think. this is one of them.

If you've never built a VM before, you're missing one of the biggest "aha" moments in computer science.

314

95K

RohitashPanda retweeted

Eric Glyman

@eglyman

6 days ago

As I wrote this, I saw X go into meltdown over tokens. You've seen the headlines: “Uber blows yearly AI budget in just one quarter.” “Meta employee burns 281 billion tokens in April.” But, the problem isn't spending. Spending works. Since 2023, the top quartile of our AI spenders doubled their revenue. The bottom quartile? Flat. It's blind spending. We don’t know which spend worked. A sales team has qualified leads. A support team has resolved conversations. These are units you can measure against. All a token tells you is the meter ran, not whether the work was worth it or not. Finance says, “half the budget,” engineering says, “double it” and you don’t know who’s right because there is no shared language of value. There’s no attribution, and no attribution means no allocation. For example, right now, all work, no matter the size or shape, defaults to frontier models. But meeting summaries and calendar updates don’t require GPT-5.5 Pro. In isolation this seems trivial, but re-route just 10% of a $10M AI bill from frontier to GPT-4 level intelligence you’ve saved nearly one million dollars. This sounds like a made-up stat — it’s not. It truly is that much cheaper. This is the future of finance: not blindly rubber-stamping or rejecting AI spend, but allocating it with the same rigor companies apply to headcount.

eglyman's tweet photo. As I wrote this, I saw X go into meltdown over tokens.

You've seen the headlines: “Uber blows yearly AI budget in just one quarter.” “Meta employee burns 281 billion tokens in April.”

But, the problem isn't spending. Spending works. Since 2023, the top quartile of our AI spenders doubled their revenue. The bottom quartile? Flat.

It's blind spending. We don’t know which spend worked.

A sales team has qualified leads. A support team has resolved conversations. These are units you can measure against. All a token tells you is the meter ran, not whether the work was worth it or not.

Finance says, “half the budget,” engineering says, “double it” and you don’t know who’s right because there is no shared language of value. There’s no attribution, and no attribution means no allocation.

For example, right now, all work, no matter the size or shape, defaults to frontier models. But meeting summaries and calendar updates don’t require GPT-5.5 Pro.

In isolation this seems trivial, but re-route just 10% of a $10M AI bill from frontier to GPT-4 level intelligence you’ve saved nearly one million dollars. This sounds like a made-up stat — it’s not. It truly is that much cheaper.

This is the future of finance: not blindly rubber-stamping or rejecting AI spend, but allocating it with the same rigor companies apply to headcount.

835

581

361K

Who to follow

Nisar Ahmad

@myvirtualjourny

Founder @ Techwrix | #vExpert 2017-25 | Blogger | Love to play Cricket | Husband & father of 3 cute fairies

Christopher Kidd

@ImChrisKidd

Director of Partner Marketing at @Equinix. Avid (and terrible) golfer. Interested in #AI #BigData #Cloud Thoughts are 100% my own.

IT Creations

@IT_Creations

Partners with #Dell #HPE #Lenovo #Tyan #Supermicro + more! Shop Now! https://t.co/nap5PTFXCL See our YouTube server reviews: https://t.co/SwKiP9KjVM

RohitashPanda retweeted

Abhishek🌱

@Abhishekcur

5 days ago

this is the guide I wish someone had handed me on day one of CS. it builds up, from a single transistor, why your computer spends most of its life just waiting for memory instead of doing math. barely a calculator at all. understanding this one idea rewired how I read every program I write, and I tried hard to make it click for you too. if you've never really understood the memory wall, you're missing one of the biggest "aha" moments in computer science. it took me 20+ days of deep research and a lot of work to make it as simple as possible, and honestly there's still a long way to go. every piece of feedback would mean the world to me 🙏 here's mine article: https://t.co/vKr1JCRRe7

507

649

22K

RohitashPanda retweeted

Rahul

@sairahul1

4 days ago

This is the best site on the internet to learn harness engineering. Free. Completely. Most AI engineers have never heard the term. https://t.co/bwDbTTYsjM Bookmark this site. Then read this setup ↓

sairahul1's tweet photo. This is the best site on the internet to learn harness engineering.

Free. Completely.

Most AI engineers have never heard the term.

https://t.co/bwDbTTYsjM

Bookmark this site.

Then read this setup ↓ https://t.co/ddEP0XowXM

436

436K

RohitashPanda retweeted

Viv

@Vtrivedy10

4 days ago

imo there’s a pretty solid default recipe that everyone should use to optimize a system of Agent = Model + Harness you should “train” both 1. Build v1 agent using a sensible base harness and some task specific prompting + tools 2. Harness Engineering using eval tasks that roughly match prod this is often enough - most companies can get acceptable perf doing this. then they collect traces, mine them for patterns, and make slight tweaks from there 3. SFT using data collected from traces) or synthetic data. Often is good candidate for “distillation tasks” to train a cheaper model while maintaining existing performance 4. RL if you have the bandwidth and ability and desire to create environments and designing rewards that represents the tasks you want your agent to be good at. Push past the SFT behavior of “copying” data from existing model to pushing past in some dimension 5. Light harness engineering again to squeeze any more juice (ex: slight prompting) using the trained model that’s better at your task distribution this loop will largely be productized as a general purpose recipe for building and improving agents we’re still in the earliest innings of the world’s companies getting comfortable with steps 1-2 of this loop. Harness engineering will probably be the dominant way ppl will optimize agents but i expect a large number of companies to onboard through this entire loop on some trial project of interest in the next year

389

669

55K

RohitashPanda retweeted

Alex Kuleshov

@0xAX

5 days ago

Great post by @0xkato - How LLMs Actually Work - https://t.co/urZOkGhUwP

402

479

21K

RohitashPanda retweeted

Ricardo

@Ric_RTP

4 days ago

Big Tech just ran out of money building AI and what they're doing to cover it up should be illegal. Google, Amazon, Microsoft, and Meta are spending a combined $700 BILLION this year on AI infrastructure. This eats up 94% of their total operating cash flow. The richest companies in human history are almost broke. And instead of slowing down, they're covering it up with the biggest financial engineering operation since 2008: Google just sold $80 billion in stock to fund AI infrastructure. That was their first equity raise in 20 YEARS. The last time Google needed to sell stock, YouTube didn't even exist. Sundar Pichai admitted the thing keeping him up at night is "compute capacity." The company that prints $100 billion a year in ad revenue just told Wall Street it isn't enough anymore. Amazon's free cash flow is projected to go NEGATIVE this year for the first time ever. Morgan Stanley estimates a $17 billion deficit and Bank of America says $28 billion. The most profitable logistics machine on Earth is about to burn more cash than it generates, and they quietly filed with the SEC saying they may need to raise even more debt and equity to keep building. All four hyperscalers are now borrowing hundreds of billions in bonds to keep the AI buildout alive. These were the most cash-rich companies in human history, and they're leveraging themselves to the teeth to build infrastructure that nobody has proven will generate enough revenue to pay for itself. And the cracks are already starting to show: Broadcom makes the custom AI chips that power Google, Meta, OpenAI, and Anthropic. This week their AI revenue TRIPLED year over year, sales grew 48%, and profits smashed every Wall Street estimate. The reward for all of that was $320 billion in value erased in a single trading session. Their CEO Hock Tan went on the earnings call and exposed three things about the AI industry: Google is already shopping for cheaper AI chip alternatives, broadcom abandoned its strategy of selling complete AI systems and is now retreating to selling bare chips at lower margins. And despite supposedly "unprecedented demand," Tan refused to raise his full-year forecast, which tells you everything about what he's actually seeing behind the curtain. Wall Street heard all three and hit the sell button so hard it dragged AMD, Intel, and the entire chip sector down with it. When a company triples its AI revenue and gets punished because tripling isn't fast enough, the expectations have left the atmosphere entirely. And here's the really scary part... These companies ARE your retirement account. Apple, Microsoft, Amazon, Google, Meta, and Nvidia make up roughly 30% of the S&P 500. If you have a 401k or an index fund, you are already exposed to this bet whether you chose to be or not. Every single one of these companies is telling you AI will generate trillions in revenue. But right now the math says they're spending trillions FIRST and hoping the revenue shows up later. If the revenue catches up, this becomes the greatest infrastructure buildout in human history. Bigger than railroads and bigger than the internet. If it doesn't, the companies that make up a third of the American stock market just leveraged their balance sheets into the largest write-down cycle since 2000. And unlike the dot-com crash, this time the bubble companies aren't random startups with no revenue. They're the backbone of the entire global economy.

194

676

221K

RohitashPanda retweeted

Shubh

@TheSuperEng

5 days ago

These No BS engineering blogs are a goldmine for serious backend and infra engineers: 1. Netflix TechBlog 2. Uber Engineering 3. Engineering at Meta 4. Cloudflare Blog 5. Stripe Engineering 6. Slack Engineering 7. GitHub Engineering 8. Airbnb Tech Blog 9. Dropbox Tech Blog 10. DoorDash Engineering 11. Pinterest Engineering 12. LinkedIn Engineering 13. Spotify Engineering 15. Shopify Engineering 16. Datadog Engineering

135

35K

RohitashPanda retweeted

Milk Road AI

@MilkRoadAI

5 days ago

Bill Ackman was asked how he would underwrite SpaceX at $750 billion and his answer was the most honest thing anyone has said about the biggest IPO in history (Save this). "You underwrite SpaceX the way you underwrite a venture capital investment." His business school professor taught him a framework that has guided his entire career, it's people, opportunity, context, deal. On all three of the first criteria, People, Opportunity, and Context Ackman's verdict was the same, SpaceX is one of one, and nothing else in the market comes close. He even acknowledged feeling bad for Blue Origin before noting that their being so far behind is not harmful to SpaceX but rather a structural tailwind that leaves SpaceX with a near monopoly on low cost orbital access for years to come. And at $1.75 trillion, the number SpaceX is actually targeting on June 12, the question is no longer whether this is the best business on earth, but what the present value math looks like when you extend it five years forward and stress test every assumption about Starlink, launch economics, and AI compute revenue. He said that even Amazon is going to have to become a bigger SpaceX customer, because Blue Origin is so far behind that Amazon has no real alternative for low-cost orbital access. He also said something that almost no one is giving enough weight heading into Thursday's listing: "Time has become increasingly valuable in the AI era. You lose a month, you lose a couple months today, and it means a lot." The Colossus and Macro Hard facilities are compounding infrastructure assets where every month of operational delay means less contracted revenue, less negotiating leverage with customers like Google and Anthropic, and a progressively weaker moat against the hyperscalers who are now racing to build competing compute capacity. Come join Milk Road Pro for our full SpaceX IPO breakdown, how we're stress-testing the Deal leg of Ackman's framework at $1.75 trillion, what our five-year revenue model actually looks like, and our full AI thesis. Link below.

724

718

431K

RohitashPanda retweeted

How To AI

@HowToAI_

5 days ago

Google has published a paper that might end the transformer era. For the last 7 years, every major AI, ChatGPT, Claude, Gemini, has been built on the exact same architecture: The Transformer. But Transformers have a fatal flaw. To remember context, they have to process every single word against every other word. It’s called quadratic complexity. As your prompt gets longer, the compute cost explodes. The alternative is the old-school RNN (Recurrent Neural Network). RNNs are incredibly cheap and fast, but they have a fixed memory size. If you give them a long document, they get amnesia. Until today. Google researchers published Memory Caching: RNNs with Growing Memory. And it fixes the biggest bottleneck in AI. Instead of an RNN having a fixed, rigid memory that constantly overwrites itself, Google gave it a "save" button. The technique allows the RNN to cache checkpoints of its hidden states as it reads. The memory capacity of the RNN can now dynamically grow as the sequence gets longer. They built four different variants, including sparse selective mechanisms where the AI actively chooses exactly which checkpoints matter most. The results rewrite the rules of efficiency. On long-context understanding and recall-intensive tasks, these new Memory-Cached RNNs closed the gap with Transformers. They achieved competitive accuracy without the explosive, quadratic compute cost. It perfectly bridges the gap between the cheap efficiency of an RNN and the massive capability of a Transformer. We have spent billions scaling Transformers because we thought they were the only way an AI could remember a long conversation. But Google just proved we don't need to process the whole history every single time. We just needed a smarter cache.

HowToAI_'s tweet photo. Google has published a paper that might end the transformer era.

For the last 7 years, every major AI, ChatGPT, Claude, Gemini, has been built on the exact same architecture: The Transformer.

But Transformers have a fatal flaw.

To remember context, they have to process every single word against every other word. It’s called quadratic complexity. As your prompt gets longer, the compute cost explodes.

The alternative is the old-school RNN (Recurrent Neural Network). RNNs are incredibly cheap and fast, but they have a fixed memory size. If you give them a long document, they get amnesia.

Until today.

Google researchers published Memory Caching: RNNs with Growing Memory.

And it fixes the biggest bottleneck in AI.

Instead of an RNN having a fixed, rigid memory that constantly overwrites itself, Google gave it a "save" button.

The technique allows the RNN to cache checkpoints of its hidden states as it reads.

The memory capacity of the RNN can now dynamically grow as the sequence gets longer.

They built four different variants, including sparse selective mechanisms where the AI actively chooses exactly which checkpoints matter most.

The results rewrite the rules of efficiency.

On long-context understanding and recall-intensive tasks, these new Memory-Cached RNNs closed the gap with Transformers.

They achieved competitive accuracy without the explosive, quadratic compute cost. It perfectly bridges the gap between the cheap efficiency of an RNN and the massive capability of a Transformer.

We have spent billions scaling Transformers because we thought they were the only way an AI could remember a long conversation.

But Google just proved we don't need to process the whole history every single time.

We just needed a smarter cache.

246

923

544K

RohitashPanda retweeted

h100envy

@h100envy

5 days ago

Tri Dao wrote the code running inside ChatGPT, Claude, and Gemini. Nobody alive understands the GPU bottleneck better. Now he's calling the top. Nvidia holds 90% of AI compute today. He says that ends in three years. His reason: as inference splits into specialized chips for agents, batch jobs, and chat, the one-size-fits-all GPU stops winning. The man whose code is Nvidia's moat just told you the moat is draining.

12K

RohitashPanda retweeted

Reid Hoffman

@reidhoffman

6 days ago

Excited to sit down with my friend @satyanadella to talk about AI, the future of humanity, and going back into founder mode to cure cancer.

402

143

69K

RohitashPanda retweeted

trish

@TrisH0x2A

6 days ago

in 1988 a physicist named Jack Crenshaw got tired of compiler books being impossible to read so he wrote his own series on a BBS called Let's Build a Compiler it starts with a parser that understands exactly one digit then each installment adds one new idea until you end up with a real compiler one small step at a time instead of 500 pages of theory first

TrisH0x2A's tweet photo. in 1988 a physicist named Jack Crenshaw got tired of compiler books being impossible to read

so he wrote his own series on a BBS called Let's Build a Compiler

it starts with a parser that understands exactly one digit

then each installment adds one new idea until you end up with a real compiler

one small step at a time instead of 500 pages of theory first

794

110

827

23K

RohitashPanda retweeted

Ahmad

@TheAhmadOsman

6 days ago

Everything You Need To Know About Inference Engines and Running LLMs Locally at Home Explains why Inference Engines exist in the first place - Prefill is not Decode - VRAM is not bandwidth - Fit is not speed - KV Cache is the real memory problem - Quantization only matters if the engine has good kernels for it - Batching is not scheduling - MoE and the routing problem - How long context changes the serving problem - Multi-GPU changes the interconnect problem - Production: latency, p99s, backpressure, routing, metrics, and failure behavior Then maps the Engines including: - llama.cpp → portability king - MLX / MLX-LM → Apple Silicon weapon - ExLlamaV3 → multi-GPU consumer CUDA / local MoE - vLLM → default open-source production server - SGLang → long-context, MoE, routing, ugly workloads - TensorRT-LLM → max NVIDIA performance - NVIDIA Dynamo → fleet orchestration The point of this article is not “use vLLM” or “use TensorRT-LLM” or “use llama.cpp” But rather fully grasp how the Inference Engines are the traffic cop, memory manager, kernel dispatcher, scheduler, cache accountant, parallelism planner, API surface, and sometimes the deployment framework Do not pick the engine first - Pick the hardware - Pick the workload - Pick the serving model Then the engine becomes obvious Opensource / Local AI FTW

405

463

31K

RohitashPanda retweeted

Slater Stich

@slaterstich

6 days ago

Very excited to share our interview with @polynoamial on AI for math — the Erdős unit distance problem, saturating the IMO, the future of math research, and more!

621

656

183K

RohitashPanda retweeted

Dwarkesh Patel

@dwarkesh_sp

8 days ago

Mathematicians and scientists often peak in their 20s. Why? Maybe older scientists become stuck in their ways. Or maybe younger researchers feel free to be more creative. But @jacobkimmel's hypothesis is that this isn't because of social factors at all - it's evolution:

637

449

181K

RohitashPanda retweeted

Akhilesh Mishra

@livingdevops

9 days ago

"We use Prometheus for monitoring." I hear this in almost every interview. Then I ask one question and the whole thing falls apart. "Why do logs and metrics need different pipelines?" Silence. Most people jump into Prometheus and Grafana without understanding what they're actually solving. They know the tools. They can't explain the problem. With observability, you're solving two completely different problems. Logs tell you what happened. An error occurred. A request came in. A database query failed. These are events. Stories your application tells. Metrics tell you how things are performing right now. Latency is 200ms. CPU is at 75%. You processed 500 requests per minute. These are measurements. Different data types. Different collection methods. Different storage. That's where people get confused. Last month in my DevOps bootcamp, we built a complete observability system for microservices on Kubernetes. For logs, we used Fluentd sidecars that share a volume with the application container. The app writes logs to the volume. Fluentd reads and forwards them. Clean separation of concerns. At a small scale, you send logs straight to CloudWatch. But when you're generating thousands of log lines per second, you add layers. Lambda for formatting. Kinesis for buffering. OpenSearch for fast queries across petabytes of data. S3 for long-term backup. We kept 7 days in OpenSearch for active investigation. 30 days in CloudWatch. Years in S3 for compliance. Each layer has different cost and performance characteristics. For metrics, Prometheus scrapes application endpoints every 30 seconds. Developers instrument their code with Prometheus client libraries. They expose a /metrics endpoint. Prometheus pulls the data automatically. We created ServiceMonitors that tell Prometheus which pods to scrape based on labels. As soon as new pods come up, Prometheus discovers and scrapes them. Then Grafana visualizes everything. We imported pre-built dashboards from https://t.co/5wE21Lb4Q8 for Kubernetes monitoring. And built custom panels for application-specific metrics. Logs and metrics run in parallel. When something breaks, metrics show you the spike. The error rate jumped. Latency went from 100ms to 2 seconds. Then you check the logs. Filter for that time window. Find the stack traces. See exactly what failed. You can't troubleshoot with just one. You need both perspectives. We implemented it, troubleshot everything in a live call, generated real metrics and logs, and built dashboards in Grafana. That's the difference between watching tutorials and actually understanding how systems work in production.

livingdevops's tweet photo. "We use Prometheus for monitoring."

I hear this in almost every interview. Then I ask one question and the whole thing falls apart.

"Why do logs and metrics need different pipelines?"

Silence.

Most people jump into Prometheus and Grafana without understanding what they're actually solving. They know the tools. They can't explain the problem.

With observability, you're solving two completely different problems.

Logs tell you what happened. An error occurred. A request came in. A database query failed. These are events. Stories your application tells.

Metrics tell you how things are performing right now. Latency is 200ms. CPU is at 75%. You processed 500 requests per minute. These are measurements.

Different data types. Different collection methods. Different storage. That's where people get confused.

Last month in my DevOps bootcamp, we built a complete observability system for microservices on Kubernetes.

For logs, we used Fluentd sidecars that share a volume with the application container.

The app writes logs to the volume.
Fluentd reads and forwards them.
Clean separation of concerns.
At a small scale, you send logs straight to CloudWatch.

But when you're generating thousands of log lines per second, you add layers.

Lambda for formatting.
Kinesis for buffering.
OpenSearch for fast queries across petabytes of data.
S3 for long-term backup.

We kept 7 days in OpenSearch for active investigation. 30 days in CloudWatch. Years in S3 for compliance. Each layer has different cost and performance characteristics.

For metrics, Prometheus scrapes application endpoints every 30 seconds.

Developers instrument their code with Prometheus client libraries.
They expose a /metrics endpoint.
Prometheus pulls the data automatically.

We created ServiceMonitors that tell Prometheus which pods to scrape based on labels.

As soon as new pods come up, Prometheus discovers and scrapes them.
Then Grafana visualizes everything.
We imported pre-built dashboards from https://t.co/5wE21Lb4Q8 for Kubernetes monitoring.
And built custom panels for application-specific metrics.

Logs and metrics run in parallel.

When something breaks, metrics show you the spike. The error rate jumped. Latency went from 100ms to 2 seconds.

Then you check the logs. Filter for that time window. Find the stack traces. See exactly what failed.

You can't troubleshoot with just one. You need both perspectives.

We implemented it, troubleshot everything in a live call, generated real metrics and logs, and built dashboards in Grafana.

That's the difference between watching tutorials and actually understanding how systems work in production.

323

334

13K

RohitashPanda retweeted

Robin Hanson

@robinhanson

8 days ago

"Dijkstra said … Programming is not a craft. It is closer to mathematics than to carpentry, and the moment you treat it as a craft, you guarantee that the software you produce will be full of the kind of bugs that craftsmanship cannot catch. The fix, in his view, was to teach programming the way mathematics is taught. You should be able to prove your program correct before you run it." Don't we have a half century of experience showing he was just wrong?

892

176K

Rohitash Panda

@RohitashPanda

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users