StemOverflow

Verified account

@StemOverfloww

Math • Physics • CS • AI • Robotics. Formerly Mathology Overflow. Bridging the gap between the blackboard and the motherboard. 🧠 ⚙️ Founder : @mythkernel

Joined September 2022

31 Following

361 Followers

51 Posts

Pinned Tweet

12 days ago

What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription? Can anyone please tell me what's my fault here? Creating a full blown thread of 25 posts with painstakingly created 24 images with AI? Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks? Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people? Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

1

0

0

0

101

3 days ago

And that's how @OpenAI leaves every other frontier model in the dust. Folks, buckle up your seat belts, we're in for a treat. 😎

3 days ago

I’m excited to share that I’ll be joining OpenAI and look forward to working with the exceptional team there. It was a difficult decision to move on. I’m incredibly proud of the amazing team at Google and everything we’ve built together. It has been an honor and a pleasure to work with all of you.

971

16K

862

2K

9M

0

0

0

0

30

8 days ago

Back to @OpenAI pro 20x sub 😅 P.S. Wish I had the hardware to run 8 bit @deepseek_ai v4 pro, @Kimi_Moonshot kimi2.7, @Zai_org GLM5.1/5.2, @MiniMax_AI M3 ALL in parallel locally.

StemOverfloww's tweet photo. Back to @OpenAI pro 20x sub 😅

P.S. Wish I had the hardware to run 8 bit @deepseek_ai v4 pro, @Kimi_Moonshot kimi2.7, @Zai_org GLM5.1/5.2, @MiniMax_AI M3 ALL in parallel locally. https://t.co/FWMiDvXhiK

8 days ago

Thanks for the refund @AnthropicAI @claudeai @DarioAmodei 🫡

StemOverfloww's tweet photo. Thanks for the refund @AnthropicAI @claudeai @DarioAmodei 🫡 https://t.co/bo19f1mHoP

0

0

0

0

248

0

0

0

0

205

8 days ago

Thanks for the refund @AnthropicAI @claudeai @DarioAmodei 🫡

StemOverfloww's tweet photo. Thanks for the refund @AnthropicAI @claudeai @DarioAmodei 🫡 https://t.co/bo19f1mHoP

8 days ago

Please look into this and issue a refund: @DarioAmodei @AnthropicAI @claudeai.

0

0

0

0

59

0

0

0

0

248

Who to follow

Mental Health Initiative To #StampOutStigma Call our Multi-Lingual 24x7 Helpline - 1800120820050 To Speak With A Mental Health Expert! ISO 9001:2015 certified.

PET Packaging Association for Clean Environment (PACE) is dedicated to promoting the benefits of PET packaging for a safe and clean environment in India

NEDO NEW DELHI OFFICE

This is the official account of the NEDO New Delhi Office . We will release press releases from NEDO, as well as information on NEDO New Delhi Office activities

8 days ago

@AnthropicAI @claudeai @DarioAmodei Thank you.

StemOverfloww's tweet photo. @AnthropicAI @claudeai @DarioAmodei Thank you. https://t.co/RHhG7OemmH

0

0

0

0

4

8 days ago

@AnthropicAI Outside the US, from India. I bought the @claudeai max 20x sub on 11th Jun, because of Fable 5. Paid 236$ (200+18%) just because of Fable 5. And now you guys are rug-pulling your foreign customers @DarioAmodei. What's the procedure to get a refund?

StemOverfloww's tweet photo. @AnthropicAI Outside the US, from India. I bought the @claudeai max 20x sub on 11th Jun, because of Fable 5. Paid 236$ (200+18%) just because of Fable 5. And now you guys are rug-pulling your foreign customers @DarioAmodei.

What's the procedure to get a refund? https://t.co/C8Sem78Qjq

1

0

0

0

87

8 days ago

@narendramodi @PMOIndia CC : @AshwiniVaishnaw @nsitharaman @PiyushGoyal @FinMinIndia @RBI @AmitShah

0

0

0

0

13

8 days ago

My two cents on how India should proceed to build sovereign AIs. @narendramodi @PMOIndia P.S. Been working & building custom language models since 2016, pre-transformers era. Feel free to correct me. Always happy to learn something new.

8 days ago

The US export controls blocking non-US access to @AnthropicAI latest frontier models (Fable 5 / Mythos 5) mark a structural shift: advanced AI is now explicitly strategic infrastructure. This accelerates the need for sovereign capability. Building on calls for an ambitious India AI Mission, here is a rigorous, from-scratch analysis of what it realistically costs to develop production-ready foundational models. I treat @deepseek_ai's public figures (e.g., V3’s ~2.788M H800 GPU-hours / ~$5.6M reference) with healthy skepticism. These "almost certainly" reflect only the final successful training run, not total R&D (experiments, ablations, failed runs, data pipelines, talent, or infrastructure CapEx). Independent scaling and industry benchmarks support significantly higher full-project costs, even with genuine architectural efficiencies. Two Target Classes: DeepSeek-V4-Pro class (efficient MoE path): 1.6T total parameters / ~49B active per token, native 1M context, hybrid attention (CSA + HCA), mHC stability, Muon optimizer, >32T tokens. Strong reasoning/agentic performance at lower compute intensity. GPT-5.5-Pro class (higher-end / denser or larger-scale path): Significantly higher effective compute (dense-like or very large MoE), targeting maximum capability through greater scale. @deepseek_ai @OpenAI The following are the probable technical detail + capital allocation at every stage, with conservative-to-realistic ranges based on FLOPs scaling, hardware specs (H100/H800-class ~400–700 TFLOPS sustained effective), realistic MFU (35–55%), and MoE communication overhead. Rough FLOPs estimate (6 × active params × tokens for core training compute): V3 reference (~37B active, 14.8T tokens): ~3.29 × 10²⁴ FLOPs. V4 scaling (~49B active, ~32–33T tokens): ~2.9× multiplier → ~9.7 × 10²⁴ FLOPs. Theoretical GPU-hours (at ~500–600 TFLOPS effective sustained) for V4 final pre-training: ~4–9 million GPU-hours equivalent. At $2–6/GPU-hour effective (rental/amortized + power): $10–60M for the final pre-training run only. Key adjustments:MFU 35–55% typical (higher end achievable with custom kernels, FP8, good parallelism). MoE adds routing/communication overhead vs pure dense. Full project multiplies final-run compute by 2–5×+ for R&D/experiments. Architecture wins (hybrid sparse attention cutting effective FLOPs/KV cache ~70%+ at 1M context, mHC for stability with low overhead) are real and reduce waste. Stage-by-Stage Breakdown 1. Data Curation, Acquisition & Synthetic Generation Curate/filter 32–50T+ high-quality tokens (web, code, science, long documents, agentic traces). Heavy synthetic flywheel for reasoning chains, trajectories, and preference data. Domain balancing + versioning. Petabyte-scale storage with lineage. For GPT-class: even larger/more diverse corpus. Costs: Acquisition/licensing + pipelines: $15–50M. Synthetic generation (inference on intermediates): $20–80M (major driver). Human/expert annotation (targeted): $5–20M. Storage/versioning platform: $10–25M. DeepSeek-class subtotal: $50–175M. GPT-class subtotal: $80–300M (larger scale). 2. Infrastructure & Hardware Setup Sovereign cluster targeting 50k–150k+ B200/H200-class GPUs (or mixed optimized silicon) with high-bandwidth fabrics. Sustained MFU >50%. Liquid cooling, redundant power (50–200+ MW peak). Custom kernels for hybrid attention, expert parallelism, Muon, and mHC. Costs: GPUs/accelerators (procurement or long-term lease): $150–800M+. Servers, networking, high-speed storage: $50–200M. Data center/power/cooling build-out: $80–300M (power infrastructure often 30–50% of infra). Early electricity & setup: $5–20M. DeepSeek-class subtotal: $285–1,320M. GPT-class subtotal: $500–2,500M+ (larger/more dense clusters).

1

1

0

1

223

1

1

0

0

50

8 days ago

3. Pre-Training DeepSeek-class: 1.6T MoE with 49B active/token. Hybrid attention (CSA + HCA interleaved with sparse attention) for ~27% FLOPs and ~10% KV cache vs prior gen at 1M context. mHC (residual matrices projected onto Birkhoff polytope via Sinkhorn-Knopp) for stability at trillion scale (~6–7% overhead). Muon optimizer, mixed FP4/FP8. High MFU target. GPT-class: Denser or much larger effective scale; higher raw FLOPs; less reliance on sparsity tricks. Costs (final run + R&D/experiments multiplier): GPU-hours/compute: $20–150M (DeepSeek-class final run lower due to efficiency; GPT-class much higher). Electricity during training: $10–50M. Experiments/ablations (2–5× final run): $40–400M+. DeepSeek-class subtotal: $80–400M. GPT-class subtotal: $300–1,500M+. 4. Post-Training, Alignment & Reasoning Two-stage (domain-expert SFT + GRPO cultivation → on-policy distillation). Synthetic preference data dominant. GRPO/DPO-style + distillation for reasoning/agentic gains without monolithic RL blowup. GPT-class may need heavier RL or more iterations. Costs: Compute (inference + RL loops): $15–80M. Synthetic data & modeling: $10–40M. Iteration & human oversight: $5–25M. DeepSeek-class subtotal: $30–145M. GPT-class subtotal: $60–300M. 5. Evaluation, Safety, Red-Teaming & Iteration Full benchmark suite (SWE-Bench, GPQA, agentic, long-context, safety). Adversarial testing + constitutional frameworks. Multiple feedback loops. Costs: $20–100M (both classes; GPT-class potentially higher iteration volume). 6. Inference, Deployment & Serving: Optimized engines (vLLM/SGLang-style) with continuous batching, speculative decoding, quantization (FP8/INT4), and MoE routing. Efficient 1M-context KV management. Production clusters sized for target QPS/latency. Costs (initial capex): Serving clusters + optimization: $40–250M. Electricity/ops (recurring): Scales with usage ($10–50M+/year initial). DeepSeek-class subtotal (initial): $40–200M. GPT-class subtotal (initial): $80–400M. 7. Talent, Operations & Ecosystem 150–500+ team (researchers focused on MoE scaling, hybrid attention, mHC extensions, distillation; infra engineers for high-MFU kernels; safety/evals specialists). Costs: $100–500M+ over project duration (salaries $400k–$1.5M+ total comp for top talent + equity). Overall Totals (End-to-End, First System + Initial Scaling) DeepSeek-V4-Pro class (efficient MoE path): $500 million – $1.5–2 billion (midpoint ~$800M–$1.2B realistic with strong execution). Leverages sparsity, hybrid attention, and mHC for lower intensity. Final runs can be remarkably efficient; full project still substantial due to R&D and infra. GPT-5.5-Pro class (higher-end path): $2 billion – $5 billion+. Driven by significantly higher raw compute, denser scaling, and potentially more iteration. Annual ongoing opex (power, talent, maintenance, inference at scale): $100–400M+ after launch (scales with usage). Strategic Recommendations for India AI MissionPrioritize efficient path first (DeepSeek-class architecture) for faster ROI and capability at manageable cost, then scale toward higher-end. Sovereign cluster: 50k–150k+ GPU-class with mixed sourcing and high-MFU focus. R&D focus: Hybrid attention, mHC-style residuals, Muon/MoE optimizations, synthetic data pipelines. Phased funding: Tied to milestones (e.g., stable 1M-context pre-train, agentic benchmark leadership). Ecosystem: National data trust + talent incentives. This is executable with disciplined engineering. The architectural efficiencies are real and create a genuine cost advantage, even after conservative adjustments for full project scope. The US controls make sovereign leadership not just desirable but urgent. This is the complete, from-scratch picture across both model classes. @narendramodi @AmitShah @TVMohandasPai @svembu @SarvamAI P.S. My two cents after working with language models since 2016, pre-transformers era.

0

1

0

0

137

8 days ago

The US export controls blocking non-US access to @AnthropicAI latest frontier models (Fable 5 / Mythos 5) mark a structural shift: advanced AI is now explicitly strategic infrastructure. This accelerates the need for sovereign capability. Building on calls for an ambitious India AI Mission, here is a rigorous, from-scratch analysis of what it realistically costs to develop production-ready foundational models. I treat @deepseek_ai's public figures (e.g., V3’s ~2.788M H800 GPU-hours / ~$5.6M reference) with healthy skepticism. These "almost certainly" reflect only the final successful training run, not total R&D (experiments, ablations, failed runs, data pipelines, talent, or infrastructure CapEx). Independent scaling and industry benchmarks support significantly higher full-project costs, even with genuine architectural efficiencies. Two Target Classes: DeepSeek-V4-Pro class (efficient MoE path): 1.6T total parameters / ~49B active per token, native 1M context, hybrid attention (CSA + HCA), mHC stability, Muon optimizer, >32T tokens. Strong reasoning/agentic performance at lower compute intensity. GPT-5.5-Pro class (higher-end / denser or larger-scale path): Significantly higher effective compute (dense-like or very large MoE), targeting maximum capability through greater scale. @deepseek_ai @OpenAI The following are the probable technical detail + capital allocation at every stage, with conservative-to-realistic ranges based on FLOPs scaling, hardware specs (H100/H800-class ~400–700 TFLOPS sustained effective), realistic MFU (35–55%), and MoE communication overhead. Rough FLOPs estimate (6 × active params × tokens for core training compute): V3 reference (~37B active, 14.8T tokens): ~3.29 × 10²⁴ FLOPs. V4 scaling (~49B active, ~32–33T tokens): ~2.9× multiplier → ~9.7 × 10²⁴ FLOPs. Theoretical GPU-hours (at ~500–600 TFLOPS effective sustained) for V4 final pre-training: ~4–9 million GPU-hours equivalent. At $2–6/GPU-hour effective (rental/amortized + power): $10–60M for the final pre-training run only. Key adjustments:MFU 35–55% typical (higher end achievable with custom kernels, FP8, good parallelism). MoE adds routing/communication overhead vs pure dense. Full project multiplies final-run compute by 2–5×+ for R&D/experiments. Architecture wins (hybrid sparse attention cutting effective FLOPs/KV cache ~70%+ at 1M context, mHC for stability with low overhead) are real and reduce waste. Stage-by-Stage Breakdown 1. Data Curation, Acquisition & Synthetic Generation Curate/filter 32–50T+ high-quality tokens (web, code, science, long documents, agentic traces). Heavy synthetic flywheel for reasoning chains, trajectories, and preference data. Domain balancing + versioning. Petabyte-scale storage with lineage. For GPT-class: even larger/more diverse corpus. Costs: Acquisition/licensing + pipelines: $15–50M. Synthetic generation (inference on intermediates): $20–80M (major driver). Human/expert annotation (targeted): $5–20M. Storage/versioning platform: $10–25M. DeepSeek-class subtotal: $50–175M. GPT-class subtotal: $80–300M (larger scale). 2. Infrastructure & Hardware Setup Sovereign cluster targeting 50k–150k+ B200/H200-class GPUs (or mixed optimized silicon) with high-bandwidth fabrics. Sustained MFU >50%. Liquid cooling, redundant power (50–200+ MW peak). Custom kernels for hybrid attention, expert parallelism, Muon, and mHC. Costs: GPUs/accelerators (procurement or long-term lease): $150–800M+. Servers, networking, high-speed storage: $50–200M. Data center/power/cooling build-out: $80–300M (power infrastructure often 30–50% of infra). Early electricity & setup: $5–20M. DeepSeek-class subtotal: $285–1,320M. GPT-class subtotal: $500–2,500M+ (larger/more dense clusters).

8 days ago

PM @narendramodi Sir we need an India AI Mission under you with @NandanNilekani as vice chair and others from the private sector and govt. to Help India tackle the AI Revolution. We are way behind and need a national mission to get going quickly. Existing govt programs are too slow, way too small to make any large impact. We need an annual 50000 cr fund for deep tech and AI, a 200,000 cr ELGS Guarantee Fund to build Hyper cloud, hardware and chips. @AshwiniVaishnaw @nsitharaman @PiyushGoyal @FinMinIndia @RBI We need a Very Large National Mission. @AmitShah @amitmalviya

1K

3K

757

264

699K

1

1

0

1

223

8 days ago

Please look into this and issue a refund: @DarioAmodei @AnthropicAI @claudeai.

8 days ago

@AnthropicAI Outside the US, from India. I bought the @claudeai max 20x sub on 11th Jun, because of Fable 5. Paid 236$ (200+18%) just because of Fable 5. And now you guys are rug-pulling your foreign customers @DarioAmodei. What's the procedure to get a refund?

StemOverfloww's tweet photo. @AnthropicAI Outside the US, from India. I bought the @claudeai max 20x sub on 11th Jun, because of Fable 5. Paid 236$ (200+18%) just because of Fable 5. And now you guys are rug-pulling your foreign customers @DarioAmodei.

What's the procedure to get a refund? https://t.co/C8Sem78Qjq

1

0

0

0

87

0

0

0

0

59

11 days ago

@elonmusk @nikitabier @X @XCorpIndia Been 2 days, no reply. a/c still suspended. @TheAhmadOsman Any help would be appreciated?

0

0

0

0

12

12 days ago

What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription? Can anyone please tell me what's my fault here? Creating a full blown thread of 25 posts with painstakingly created 24 images with AI? Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks? Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people? Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

StemOverfloww's tweet photo. What's wrong with this post @elonmusk @nikitabier @X @XCorpIndia that you guys have suspended the account with a premium subscription?

Can anyone please tell me what's my fault here?

Creating a full blown thread of 25 posts with painstakingly created 24 images with AI?

Wanted people to genuinely learn about LLMs from the scratch for FREE in 30 weeks?

Or for commenting on this upcoming work of mine on related "LLM education" posts to increase more awareness among the people?

Is it possible for you to overturn this suspension with a warning?

1

0

0

0

101

12 days ago

I appealed as well, but this was the immediate instant reply. What's going on here? Any help would be highly appreciated. @elonmusk @nikitabier @X @XCorpIndia btw this one is a "Premium +" account. Please don't suspend this one as well.

StemOverfloww's tweet photo. I appealed as well, but this was the immediate instant reply. What's going on here?

Any help would be highly appreciated. @elonmusk @nikitabier @X @XCorpIndia btw this one is a "Premium +" account. Please don't suspend this one as well. https://t.co/9XE41ikxek

1

0

0

0

42

4 months ago

We truly want to see India reach this stage someday in the next decade. We work hard to make that day come soon 🙂

0

2

0

0

105

4 months ago

Sure. Lesss go 💪

0

0

0

0

40

4 months ago

YT : https://t.co/p1iLt6eGuQ

4 months ago

“Two sequences walk into a proof…” One goes down, one goes up, both meet at (\sqrt{ab}). Then a twist: swap + reciprocal symmetry still preserves the story. Animated end-to-end in Manim. Math has choreography. ✨ Find the YT Video here : https://t.co/417CnIxYFS #math #manim #visualization #invariance #arthurengel

0

3

0

0

387

0

1

0

0

54

Last Seen Users on Sotwe

Trends for you

Most Popular Users