Shivam

@shivamg

Currently looking for next challenge. Open to data oriented role or research. developer++

Mauritius

Joined August 2008

1.2K Following

426 Followers

28.4K Posts

Pinned Tweet

Shivam @shivamg

over 7 years ago

Happy new year and best wishes for 2019

shivamg retweeted

Santiago

@svpino

9 days ago

A bunch of companies are banning developers from pushing vibe-coded software to production. Who didn't see this one coming? Vibe-coding is amazing, but we are now realizing what happens when we let anyone put autogenerated slop in front of users. We need something better.

15K

shivamg retweeted

Massimo

@Rainmaker1973

about 1 month ago

Scientists at Stanford Medicine have proposed an explanation for the rare cases of myocarditis that can occur after mRNA COVID-19 vaccination. The research points to a two-step inflammatory process involving the cytokines CXCL10 and IFN-gamma. According to the study, vaccine-activated immune cells, particularly macrophages, produce high levels of CXCL10. This molecule then stimulates T cells to release IFN-gamma, creating a potent inflammatory signal that can damage heart muscle cells. The team confirmed this pathway through experiments using human heart tissue models, immune cells, and mice. Importantly, blocking CXCL10 and IFN-gamma reduced signs of heart injury in laboratory models while largely preserving the protective immune response generated by the vaccine. The researchers also tested genistein, a natural compound found in soybeans, which showed anti-inflammatory effects in the models, though further clinical studies are required. Myocarditis following mRNA vaccination is very rare, occurring in approximately 1 in 140,000 people after the first dose and 1 in 32,000 after the second, with higher rates observed in young males. Most cases are mild and resolve with full recovery. The study notes that COVID-19 infection itself carries a significantly higher risk of myocarditis, roughly 10 times greater than vaccination. This work may inform the development of safer future mRNA vaccines while underscoring the established benefits of current ones in preventing severe COVID-19 outcomes. [Cao, X., et al. (2025). Inhibition of CXCL10 and IFN-γ ameliorates myocarditis in preclinical models of SARS-CoV-2 mRNA vaccination. Science Translational Medicine, 17(828)]

Rainmaker1973's tweet photo. Scientists at Stanford Medicine have proposed an explanation for the rare cases of myocarditis that can occur after mRNA COVID-19 vaccination. The research points to a two-step inflammatory process involving the cytokines CXCL10 and IFN-gamma.

According to the study, vaccine-activated immune cells, particularly macrophages, produce high levels of CXCL10. This molecule then stimulates T cells to release IFN-gamma, creating a potent inflammatory signal that can damage heart muscle cells. The team confirmed this pathway through experiments using human heart tissue models, immune cells, and mice.

Importantly, blocking CXCL10 and IFN-gamma reduced signs of heart injury in laboratory models while largely preserving the protective immune response generated by the vaccine. The researchers also tested genistein, a natural compound found in soybeans, which showed anti-inflammatory effects in the models, though further clinical studies are required.

Myocarditis following mRNA vaccination is very rare, occurring in approximately 1 in 140,000 people after the first dose and 1 in 32,000 after the second, with higher rates observed in young males. Most cases are mild and resolve with full recovery. The study notes that COVID-19 infection itself carries a significantly higher risk of myocarditis, roughly 10 times greater than vaccination.

This work may inform the development of safer future mRNA vaccines while underscoring the established benefits of current ones in preventing severe COVID-19 outcomes.

[Cao, X., et al. (2025). Inhibition of CXCL10 and IFN-γ ameliorates myocarditis in preclinical models of SARS-CoV-2 mRNA vaccination. Science Translational Medicine, 17(828)]

234

519

146

327

91K

shivamg retweeted

LlamaIndex 🦙

@llama_index

about 2 months ago

Need document parsing that stays fully local and private? 👀 Meet liteparse-server, a self-hostable, open-source HTTP server for parsing documents and generating screenshots from PDFs, Office files, and images. ✅ 100% self-hosted ✅ Private by default ✅ Open source ✅ Built for production deployments Deploy it as: 🐳 a @Docker container ⚡ or a serverless Express.js API It also integrates easily with: - @Redisinc for caching and rate limiting - @opentelemetry-compatible collectors for traces and metrics - observability tools like @JaegerTracing, @PrometheusIO and @grafana Read the full breakdown here: https://t.co/E3y2ZHvURm GitHub repo: https://t.co/K0d8XVEFGK

llama_index's tweet photo. Need document parsing that stays fully local and private? 👀

Meet liteparse-server, a self-hostable, open-source HTTP server for parsing documents and generating screenshots from PDFs, Office files, and images.

✅ 100% self-hosted
✅ Private by default
✅ Open source
✅ Built for production deployments

Deploy it as:

🐳 a @Docker container
⚡ or a serverless Express.js API

It also integrates easily with:

- @Redisinc for caching and rate limiting
- @opentelemetry-compatible collectors for traces and metrics
- observability tools like @JaegerTracing, @PrometheusIO and @grafana

Read the full breakdown here: https://t.co/E3y2ZHvURm
GitHub repo: https://t.co/K0d8XVEFGK

16K

Who to follow

Adnan Hashmi (عدنان ھاشمی)

@adnan_hashmi

Data & #AI Architect, Life-long Learner, Proud #Pakistani 🇵🇰, @OpenEdPakistan + @Kolachi3D Founder, #INTJ, #Azure, #MachineLearning

Talk Data to Me

@talkdatatomee

#DataScience, #Statistics, #R, #Python #DataViz #MachineLearning #DeepLearning even #MLOps

Daniel of the Shire

@countrsignal

AI 🤖 ∪ biotech 🧬 ∪ markets 💹 ∪ chickens 🐣

shivamg retweeted

Sebastian Raschka

@rasbt

6 months ago

I really didn't expect another major open-weight LLM release this December, but here we go: NVIDIA released their new Nemotron 3 series this week. It comes in 3 sizes: 1. Nano (30B-A3B), 2. Super (100B), 3. and Ultra (500B). Architecture-wise, the models are a Mixture-of-Experts (MoE) Mamba-Transformer hybrid architecture. As of this morning (Dec 19), only the Nano model has been released as an open-weight model, so this post will focus on that one (shown in my drawing below). Nemotron 3 Nano (30B-A3B) is a 52-layer hybrid Mamba-Transformer model that interleaves Mamba-2 sequence-modeling blocks with sparse Mixture-of-Experts (MoE) feed-forward layers, and uses self-attention only in a small subset of layers. There’s a lot going on in the figure above, but in short, the architecture is organized into 13 macro blocks with repeated Mamba-2 → MoE sub-blocks, plus a few Grouped-Query Attention layers. In total, if we multiply the macro- and sub-blocks, there are 52 layers in this architecture. Regarding the MoE modules, each MoE layer contains 128 experts but activates only 1 shared and 6 routed experts per token. The Mamba-2 layers would take a whole article itself to explain (perhaps a topic for another time). But for now, conceptually, you can think of them as similar to the Gated DeltaNet approach that Qwen3-Next and Kimi-Linear use, which I covered in my Beyond Standard LLMs article. The similarity between Gated DeltaNet and Mamba-2 layers is that both replace standard attention with a gated-state-space update. The idea behind this state-space-style module is that it maintains a running hidden state and mixes new inputs via learned gates. In contrast to attention, it scales linearly instead of quadratically with the input sequence length. What’s actually quite exciting about this architecture is its really good performance compared to pure transformer architectures of similar size (like Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B-A4B), while achieving much higher tokens-per-second throughput. Overall, this is an interesting direction, even more extreme than Qwen3-Next and Kimi-Linear in its use of only a few attention layers. However, one of the strengths of the transformer architecture is its performance at a (really) large scale. I am curious to see how the larger Nemotron 3 Super and especially Ultra will compare to the likes of DeepSeek V3.2.

rasbt's tweet photo. I really didn't expect another major open-weight LLM release this December, but here we go: NVIDIA released their new Nemotron 3 series this week.

It comes in 3 sizes:

1. Nano (30B-A3B),
2. Super (100B),
3. and Ultra (500B).

Architecture-wise, the models are a Mixture-of-Experts (MoE) Mamba-Transformer hybrid architecture. As of this morning (Dec 19), only the Nano model has been released as an open-weight model, so this post will focus on that one (shown in my drawing below).

Nemotron 3 Nano (30B-A3B) is a 52-layer hybrid Mamba-Transformer model that interleaves Mamba-2 sequence-modeling blocks with sparse Mixture-of-Experts (MoE) feed-forward layers, and uses self-attention only in a small subset of layers.

There’s a lot going on in the figure above, but in short, the architecture is organized into 13 macro blocks with repeated Mamba-2 → MoE sub-blocks, plus a few Grouped-Query Attention layers. In total, if we multiply the macro- and sub-blocks, there are 52 layers in this architecture.

Regarding the MoE modules, each MoE layer contains 128 experts but activates only 1 shared and 6 routed experts per token.

The Mamba-2 layers would take a whole article itself to explain (perhaps a topic for another time). But for now, conceptually, you can think of them as similar to the Gated DeltaNet approach that Qwen3-Next and Kimi-Linear use, which I covered in my Beyond Standard LLMs article.

The similarity between Gated DeltaNet and Mamba-2 layers is that both replace standard attention with a gated-state-space update. The idea behind this state-space-style module is that it maintains a running hidden state and mixes new inputs via learned gates. In contrast to attention, it scales linearly instead of quadratically with the input sequence length.

What’s actually quite exciting about this architecture is its really good performance compared to pure transformer architectures of similar size (like Qwen3-30B-A3B-Thinking-2507 and GPT-OSS-20B-A4B), while achieving much higher tokens-per-second throughput.

Overall, this is an interesting direction, even more extreme than Qwen3-Next and Kimi-Linear in its use of only a few attention layers. However, one of the strengths of the transformer architecture is its performance at a (really) large scale. I am curious to see how the larger Nemotron 3 Super and especially Ultra will compare to the likes of DeepSeek V3.2.

264

171K

shivamg retweeted

PHD Comics @PHDcomics

6 months ago

H0 H0 H0

232

14K

shivamg retweeted

Work Chronicles

@_workchronicles

4 months ago

(comic) Layoff-Proof Fallacy

335

114

48K

shivamg retweeted

Sebastian Raschka

@rasbt

7 months ago

Olmo models are always a highlight due to them being fully transparent and their nice, detailed technical reports. I am sure I'll talk more about the interesting training-related aspects from that 100-pager in the upcoming days and weeks. In the meantime, here's the side-by-side architecture comparison with Qwen3. 1) As we can see, the Olmo 3 architecture is relatively similar to Qwen3. However, it's worth noting that this is essentially likely inspired by the Olmo 2 predecessor, not Qwen3. 2) Similar to Olmo 2, Olmo 3 still uses a post-norm flavor instead of pre-norm, as they found in the Olmo 2 paper that it stabilizes the training. 3) Interestingly, the 7B model still uses multi-head attention similar to Olmo 2. However, to make things more efficient and shrink the KV cache size, they now use sliding window attention (e.g., similar to Gemma 3.) Next, let's look at the 32B model. 4) Overall, it's the same architecture but just scaled up. Also, the proportions (e.g., going from the input to the intermediate size in the feed forward layer, and so on) roughly match the ones in Qwen3. 5) My guess is the architecture was initially somewhat smaller than Qwen3 due to the smaller vocabulary, and they then scaled up the intermediate size expansion from 5x in Qwen 3 to 5.4 in Olmo 3 to have a 32B model for a direct comparison. 6) Also, note that the 32B model (finally!) uses grouped query attention.

rasbt's tweet photo. Olmo models are always a highlight due to them being fully transparent and their nice, detailed technical reports.
I am sure I'll talk more about the interesting training-related aspects from that 100-pager in the upcoming days and weeks.
In the meantime, here's the side-by-side architecture comparison with Qwen3.

1) As we can see, the Olmo 3 architecture is relatively similar to Qwen3. However, it's worth noting that this is essentially likely inspired by the Olmo 2 predecessor, not Qwen3.

2) Similar to Olmo 2, Olmo 3 still uses a post-norm flavor instead of pre-norm, as they found in the Olmo 2 paper that it stabilizes the training.

3) Interestingly, the 7B model still uses multi-head attention similar to Olmo 2. However, to make things more efficient and shrink the KV cache size, they now use sliding window attention (e.g., similar to Gemma 3.)

Next, let's look at the 32B model.

4) Overall, it's the same architecture but just scaled up. Also, the proportions (e.g., going from the input to the intermediate size in the feed forward layer, and so on) roughly match the ones in Qwen3.

5) My guess is the architecture was initially somewhat smaller than Qwen3 due to the smaller vocabulary, and they then scaled up the intermediate size expansion from 5x in Qwen 3 to 5.4 in Olmo 3 to have a 32B model for a direct comparison.

6) Also, note that the 32B model (finally!) uses grouped query attention.

843

133

482

244K

shivamg retweeted

Snoopy

@snoopyb047

8 months ago

229

130

41K

shivamg retweeted

Snoopy

@SnoopyFansUS

8 months ago

204

24K

shivamg retweeted

Mathieu

@miniapeur

8 months ago

They should also do the from PhD to homeless: how I found my way into poverty and insane job market.

515

478

276K

shivamg retweeted

William Shatner

@WilliamShatner

8 months ago

I saw a very rare, almost extinct bird on my trip. 😮 The Millennial Falcon. 😑

109

31K

Shivam @shivamg

8 months ago · Republic of Mauritius

@ScholarshipfPhd Sadly very true

shivamg retweeted

Scholarship for PhD

@ScholarshipfPhd

8 months ago

136

40K

shivamg retweeted

elvis

@omarsar0

8 months ago

The most effective AI Agents are built on these core ideas. It's what powers Claude Code. It's referred to as the Claude Agent SDK Loop, which is an agent framework to build all kinds of AI agents. (bookmark it) The loop involves three steps: Gathering Context: Use subagents (parallelize them for task efficiency when possible), compact/maintain context, and leverage agentic/semantic search for retrieving relevant context for the AI agent. Hybrid search approaches work really well for domains like agentic coding. Taking Action: Leverage tools, prebuilt MCP servers, bash/scripts (Skills have made it a lot easier), and generate code to take action and retrieve important feedback/context for the AI agent. Turns out you can also enhance MCP and token usage through code execution and routing, similar to how LLM routing increases efficiency in AI Agents. Verifying Output: You can define rules to verify outputs, enable visual feedback (this becomes increasingly important in multimodal problems), and consider LLM-as-a-Judge to verify quality based on fuzzy rules. Some problems will require visual cues and other forms of input to perform well. Don't overcomplicate the workflow (eg, use computer-using agents when a simple Skill with clever scripts will do). This is a clean, flexible, and solid framework for how to build and work with AI agents in all kinds of domains.

omarsar0's tweet photo. The most effective AI Agents are built on these core ideas.

It's what powers Claude Code.

It's referred to as the Claude Agent SDK Loop, which is an agent framework to build all kinds of AI agents.

(bookmark it)

The loop involves three steps:

Gathering Context: Use subagents (parallelize them for task efficiency when possible), compact/maintain context, and leverage agentic/semantic search for retrieving relevant context for the AI agent. Hybrid search approaches work really well for domains like agentic coding.

Taking Action: Leverage tools, prebuilt MCP servers, bash/scripts (Skills have made it a lot easier), and generate code to take action and retrieve important feedback/context for the AI agent. Turns out you can also enhance MCP and token usage through code execution and routing, similar to how LLM routing increases efficiency in AI Agents.

Verifying Output: You can define rules to verify outputs, enable visual feedback (this becomes increasingly important in multimodal problems), and consider LLM-as-a-Judge to verify quality based on fuzzy rules. Some problems will require visual cues and other forms of input to perform well. Don't overcomplicate the workflow (eg, use computer-using agents when a simple Skill with clever scripts will do).

This is a clean, flexible, and solid framework for how to build and work with AI agents in all kinds of domains.

253

176K

shivamg retweeted

Snoopy

@SnoopyFansUS

8 months ago

258

147

38K

shivamg retweeted

Aadit Sheth

@aaditsh

8 months ago

McKinsey just dropped its 2025 AI report. 1. Everyone’s testing, few are scaling. 88% of companies now use AI somewhere. Only 33% have scaled it beyond pilots. 2. The profit gap is huge. Just 6% see real EBIT impact. Most are still stuck in “experiments,” not execution. 3. The winners think bigger. Top performers aren’t cutting costs. They’re redesigning workflows and creating new products. 4. AI agents are emerging. 23% are testing agents. Only 10% have scaled them (mostly in IT and R&D). 5. The jobs shift is starting. 30% of companies expect workforce reductions next year, mostly in junior or support roles. TL;DR: AI adoption is nearly universal. Impact isn’t. The gap between pilots and profit is where the next unicorns will be built.

aaditsh's tweet photo. McKinsey just dropped its 2025 AI report.

1. Everyone’s testing, few are scaling.
88% of companies now use AI somewhere.
Only 33% have scaled it beyond pilots.

2. The profit gap is huge.
Just 6% see real EBIT impact.
Most are still stuck in “experiments,” not execution.

3. The winners think bigger.
Top performers aren’t cutting costs. They’re redesigning workflows and creating new products.

4. AI agents are emerging.
23% are testing agents.
Only 10% have scaled them (mostly in IT and R&D).

5. The jobs shift is starting.
30% of companies expect workforce reductions next year, mostly in junior or support roles.

TL;DR:
AI adoption is nearly universal. Impact isn’t.

The gap between pilots and profit is where the next unicorns will be built.

113

672

399K

shivamg retweeted

Snoopy

@snoopyhero53

8 months ago

21K

402K

shivamg retweeted

Kirk Borne

@KirkDBorne

8 months ago

Python for Data Analysis: https://t.co/LGw4ZkJLR3 by @wesmckinn ↕️ Definitive handbook for manipulating, processing, cleaning, & crunching datasets in #Python. Updated 3rd edition is packed with practical case studies that show how to solve a broad set of data analysis problems. ↕️ Read it online: https://t.co/5dnRG51ikc

KirkDBorne's tweet photo. Python for Data Analysis: https://t.co/LGw4ZkJLR3 by @wesmckinn
↕️
Definitive handbook for manipulating, processing, cleaning, & crunching datasets in #Python. Updated 3rd edition is packed with practical case studies that show how to solve a broad set of data analysis problems.
↕️
Read it online: https://t.co/5dnRG51ikc

shivamg retweeted

PHD Comics @PHDcomics

8 months ago

Animal researchers think the answer is YES Find out in the latest episode of #ScienceStuff: https://t.co/z13X97BXfj

10K

shivamg retweeted

Sarah Wiegreffe @sarahwiegreffe

8 months ago

I am recruiting 2 PhD students to work on LM interpretability at UMD @umdcs starting in fall 2026! We are #3 in AI and #4 in NLP research on @CSrankings. Come join us in our lovely building just a few miles from Washington, D.C. Details in 🧵

sarahwiegreffe's tweet photo. I am recruiting 2 PhD students to work on LM interpretability at UMD @umdcs starting in fall 2026!

We are #3 in AI and #4 in NLP research on @CSrankings.
Come join us in our lovely building just a few miles from Washington, D.C. Details in 🧵 https://t.co/RxoJmt26GU

772

169

356

111K

Shivam

@shivamg

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users