Shashank Mishra @Shashankm108 - Twitter Profile

1 day ago

todays read is vasily volkov's phd thesis on gpu latency hiding. it's quite long, so i think it'll keep me occupied for the next few days. https://t.co/TVZKVeMh9Q

reprompting's tweet photo. todays read is vasily volkov's phd thesis on gpu latency hiding. it's quite long, so i think it'll keep me occupied for the next few days.

https://t.co/TVZKVeMh9Q https://t.co/J8nu84jQBN

4

428

40

435

16K

Shashankm108 retweeted

Kai Yang

@ChihYang04

3 days ago

We got GitHub for electronics before GTA6

72

9K

638

5K

420K

Shashankm108 retweeted

Pramod Goyal

@goyal__pramod

4 days ago

Software is evolving, so should you! These are the best blogs I read to understand GPUs and CUDA!

2

612

66

832

20K

Shashankm108 retweeted

Jia-Bin Huang

@jbhuang0604

8 days ago

LoRA, low-rank adaptation, is arguably the most popular parameter-efficient fine-tuning method for LLMs. But how does it actually work? Check out the video to learn LoRA and friends (LoRA+, QLoRA, VeRA, and DoRA)! https://t.co/v9qFegbK7g

jbhuang0604's tweet photo. LoRA, low-rank adaptation, is arguably the most popular parameter-efficient fine-tuning method for LLMs.

But how does it actually work?

Check out the video to learn LoRA and friends (LoRA+, QLoRA, VeRA, and DoRA)!

https://t.co/v9qFegbK7g

6

396

50

282

24K

Who to follow

Dr Rajesh Kumar Gupta

@rkgchairman

CHAIRMAN GNIOT GROUP OF INSTITUTIONS क्षेत्रीय सह-संयोजक भाजपा शिक्षण संस्थान प्रकोष्ठ (पश्चिम उत्तर प्रदेश)

Himanshu Singh

@MHimanshuSingh

Shashankm108 retweeted

Nathan Lambert

@natolambert

9 days ago

"Knowledge wants to be free" is an unofficial Interconnects mission statement, courtesy of @xeophon

13

854

81

738

30K

Shashankm108 retweeted

Param

@ParamSiddh

13 days ago

Best YouTube Channels To Learn AI in 2026 (No BS). Save it. 1. Fundamentals – 3Blue1Brown 2. Deep Learning – Andrej Karpathy 3. AI Research – Yannic Kilcher 4. Practical AI – AssemblyAI 5. LLMs – AI Explained 6. ML Theory – StatQuest 7. Papers Simplified – Two Minute Papers 8. GenAI – Matthew Berman 9. AI Agents – Nicholas Renotte 10. Applied ML – Krish Naik 11. PyTorch – Aladdin Persson 12. Math for ML – Serrano Academy 13. Industry Insights – Lex Fridman 14. Real-world AI – DeepLearningAI

ParamSiddh's tweet photo. Best YouTube Channels To Learn AI in 2026 (No BS). Save it.

1. Fundamentals – 3Blue1Brown
2. Deep Learning – Andrej Karpathy
3. AI Research – Yannic Kilcher
4. Practical AI – AssemblyAI
5. LLMs – AI Explained
6. ML Theory – StatQuest
7. Papers Simplified – Two Minute Papers
8. GenAI – Matthew Berman
9. AI Agents – Nicholas Renotte
10. Applied ML – Krish Naik
11. PyTorch – Aladdin Persson
12. Math for ML – Serrano Academy
13. Industry Insights – Lex Fridman
14. Real-world AI – DeepLearningAI

40

4K

726

7K

220K

Shashankm108 retweeted

Mustafa

@oprydai

13 days ago

tensor algebra is not abstract math. it is the grammar of modern intelligence. a scalar is one number. a vector is a line of numbers. a matrix is a grid of numbers. a tensor is the general form: numbers arranged across multiple dimensions. images are tensors. videos are tensors. robot sensor streams are tensors. neural network weights are tensors. physics simulations are tensors. deep learning is basically tensor algebra + optimization + compute. once you understand tensors, AI stops looking like magic. it becomes structure. reality → numbers → geometry → transformations → intelligence.

oprydai's tweet photo. tensor algebra is not abstract math.

it is the grammar of modern intelligence.

a scalar is one number.
a vector is a line of numbers.
a matrix is a grid of numbers.
a tensor is the general form: numbers arranged across multiple dimensions.

images are tensors.
videos are tensors.
robot sensor streams are tensors.
neural network weights are tensors.
physics simulations are tensors.

deep learning is basically tensor algebra + optimization + compute.

once you understand tensors, AI stops looking like magic.

it becomes structure.

reality → numbers → geometry → transformations → intelligence.

41

2K

393

1K

62K

Shashankm108 retweeted

DailyPapers

@HuggingPapers

14 days ago

Ling-2.6 and Ring-2.6 are here Alibaba's Ant Group open-sources trillion-parameter agentic models with hybrid linear attention and the KPop RL framework, delivering instant responses and deep reasoning.

HuggingPapers's tweet photo. Ling-2.6 and Ring-2.6 are here

Alibaba's Ant Group open-sources trillion-parameter agentic models with hybrid linear attention and the KPop RL framework, delivering instant responses and deep reasoning. https://t.co/ZPl5f4gSI5

3

84

18

39

4K

Shashankm108 retweeted

Ryohei Sasaki@engineer

@rsasaki0109

14 days ago

ViTTT [CVPR 2026] [Best Paper Finalist] [Oral] Official repository of Vision Test-Time Training https://t.co/DiVWBWXhzv Test-Time Training (TTT) has recently emerged as a promising direction for efficient sequence modeling. TTT reformulates attention operation as an online learning problem, constructing a compact inner model from key-value pairs at test time. This reformulation opens a rich and flexible design space while achieving linear computational complexity. However, crafting a powerful visual TTT design remains challenging: fundamental choices for the inner module and inner training lack comprehensive understanding and practical guidelines. To bridge this critical gap, in this paper, we present a systematic empirical study of TTT designs for visual sequence modeling. From a series of experiments and analyses, we distill six practical insights that establish design principles for effective visual TTT and illuminate paths for future improvement. These findings culminate in the Vision Test-Time Training (ViT3) model, a pure TTT architecture that achieves linear complexity and parallelizable computation. We evaluate ViT3 across diverse visual tasks, including image classification, image generation, object detection, and semantic segmentation. Results show that ViT3 consistently matches or outperforms advanced linear-complexity models (e.g., Mamba and linear attention variants) and effectively narrows the gap to highly optimized vision Transformers. We hope this study and the ViT3 baseline can facilitate future work on visual TTT models.

rsasaki0109's tweet photo. ViTTT
[CVPR 2026] [Best Paper Finalist] [Oral] Official repository of Vision Test-Time Training
https://t.co/DiVWBWXhzv
Test-Time Training (TTT) has recently emerged as a promising direction for efficient sequence modeling. TTT reformulates attention operation as an online learning problem, constructing a compact inner model from key-value pairs at test time. This reformulation opens a rich and flexible design space while achieving linear computational complexity. However, crafting a powerful visual TTT design remains challenging: fundamental choices for the inner module and inner training lack comprehensive understanding and practical guidelines. To bridge this critical gap, in this paper, we present a systematic empirical study of TTT designs for visual sequence modeling. From a series of experiments and analyses, we distill six practical insights that establish design principles for effective visual TTT and illuminate paths for future improvement. These findings culminate in the Vision Test-Time Training (ViT3) model, a pure TTT architecture that achieves linear complexity and parallelizable computation. We evaluate ViT3 across diverse visual tasks, including image classification, image generation, object detection, and semantic segmentation. Results show that ViT3 consistently matches or outperforms advanced linear-complexity models (e.g., Mamba and linear attention variants) and effectively narrows the gap to highly optimized vision Transformers. We hope this study and the ViT3 baseline can facilitate future work on visual TTT models.

2

473

70

339

26K

Shashankm108 retweeted

Vaishnavi

@_vmlops

17 days ago

FastAPI for AI Engineers https://t.co/AmyaVEFjma

10

877

127

1K

36K

Shashankm108 retweeted

0xSero

@0xSero

16 days ago

Best models for your hardware - 4gb to 12gb vram - VibeThinker-3B - smokes everything remotely close to its weight class. Challenging 30b models! Last version was also topping math benchmarks https://t.co/RTchJFFTnV - 12gb to 24gb vram - Gemma-12B-coder Built on top of an already strong model, reduced refusals and 262k context window trained on fable traces https://t.co/DVAhlQ7Y4n - 24gb to 64gb vram - Gemma-4-26b-diffusion This model was already by far one of the most functional and capable models, now it’s hitting 500+ tok/s on consumer hardware! Smart AF made by Google deepmind https://t.co/mSaWPFpgXQ Cohere North-Mini-Code 30B A new coding model made by an already impressive lab, its priming worth a shot if you’re looking to test the limits of local coding https://t.co/gDPEj6lPAW ——— For those with 4x 6000s or 3x DGX Spark I think my GLM-5.2-REAP is worth a shot. Lmk how it goes!

64

1K

98

1K

76K

Shashankm108 retweeted

himanshu @retr0jirachi

19 days ago

a really concise and cool blog on the current landscape of robotics i found recently and finished reading just now : https://t.co/WHS44mKf1B

retr0jirachi's tweet photo. a really concise and cool blog on the current landscape of robotics i found recently and finished reading just now : https://t.co/WHS44mKf1B https://t.co/4Dkil85F15

4

1K

184

2K

75K

Shashankm108 retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

18 days ago

This video by @jbhuang0604 is a compact but very informative dive into the progress of self-supervised learning over the past few decades. from IMAX in 1992 covering methods like MoCo, SimCLR, DINO, BYOL, MAE all the way up to LeJEPA in 2025 Highly recommend watching!

iScienceLuvr's tweet photo. This video by @jbhuang0604 is a compact but very informative dive into the progress of self-supervised learning over the past few decades.

from IMAX in 1992

covering methods like MoCo, SimCLR, DINO, BYOL, MAE

all the way up to LeJEPA in 2025

Highly recommend watching! https://t.co/4EkdfHHxWV

4

230

32

206

12K

Shashankm108 retweeted

Berryxia.AI

@berryxia

18 days ago

一个12B的本地模型，直接把Fable 5的推理链条蒸馏进去了，现在你能在消费级显卡上离线跑顶级coding能力。这个Gemma 4 12B Coder GGUF是基于Google的gemma-4-12B-it微调的，专门针对代码生成和复杂推理。训练数据里用了Composer 2.5的真实通过案例，还让Fable 5帮着补全那些难搞的case，结果就是每一步推理都导向能真正跑通的代码。最爽的是它走GGUF格式，12GB显卡就能顺畅跑，甚至CPU也能用。调试、补全代码、生成复杂算法、做链式思考提示，全都本地搞定，不用交API费、不用担心导出管制。以前大家觉得前沿模型要么云端用要么根本跑不了，现在开源社区直接把Fable 5的思考方式打包成能塞进你笔记本的版本。模型还在快速迭代，下载量已经破六千，社区反馈它在本地coding场景里特别能打。这波操作把“强大但受限”和“本地可用”之间的鸿沟给填上了。真正的AI生产力，从来不是等大厂放行，而是社区自己动手把能力解放出来。

berryxia's tweet photo. 一个12B的本地模型，直接把Fable 5的推理链条蒸馏进去了，现在你能在消费级显卡上离线跑顶级coding能力。

这个Gemma 4 12B Coder GGUF是基于Google的gemma-4-12B-it微调的，专门针对代码生成和复杂推理。

训练数据里用了Composer 2.5的真实通过案例，还让Fable 5帮着补全那些难搞的case，结果就是每一步推理都导向能真正跑通的代码。

最爽的是它走GGUF格式，12GB显卡就能顺畅跑，甚至CPU也能用。

调试、补全代码、生成复杂算法、做链式思考提示，全都本地搞定，不用交API费、不用担心导出管制。

以前大家觉得前沿模型要么云端用要么根本跑不了，现在开源社区直接把Fable 5的思考方式打包成能塞进你笔记本的版本。

模型还在快速迭代，下载量已经破六千，社区反馈它在本地coding场景里特别能打。

这波操作把“强大但受限”和“本地可用”之间的鸿沟给填上了。

真正的AI生产力，从来不是等大厂放行，而是社区自己动手把能力解放出来。

49

752

143

911

85K

Shashankm108 retweeted

Licheng Liu

@liulicheng10

18 days ago

probably the best blog i have read for some time viewing SFT, RL, and OPD as different ways of reshaping a model's distribution makes their tradeoffs super intuitive. - SFT pulls toward a fixed external target - RL moves along the reward gradient on on-policy samples - OPD sits in between, using a teacher signal but on student-generated data, which is why it inherits RL's anti-forgetting properties even when the teacher itself was an overtrained SFT model. the post is heavily grounded in recent literature and uses the distributional perspective as a unifying bridge across all three paradigms, i really like the point it argues the load-bearing ingredient is on-policy data and OPD's convergence to RL-like outcomes is the strongest evidence

liulicheng10's tweet photo. probably the best blog i have read for some time

viewing SFT, RL, and OPD as different ways of reshaping a model's distribution makes their tradeoffs super intuitive.

- SFT pulls toward a fixed external target
- RL moves along the reward gradient on on-policy samples
- OPD sits in between, using a teacher signal but on student-generated data, which is why it inherits RL's anti-forgetting properties even when the teacher itself was an overtrained SFT model.

the post is heavily grounded in recent literature and uses the distributional perspective as a unifying bridge across all three paradigms, i really like the point it argues the load-bearing ingredient is on-policy data and OPD's convergence to RL-like outcomes is the strongest evidence

12

2K

214

3K

100K

Shashankm108 retweeted

Turing Post

@TheTuringPost

19 days ago

Best open-source vector databases for LLMs in 2026 ▪︎ Milvus ▪︎ Chroma ▪︎ Weaviate ▪︎ Qdrant ▪︎ Vespa ▪︎ LanceDB ▪︎ Deep Lake Knowledge engines and agentic retrieval tools: ▪︎ Chroma Context-1 ▪︎ Weaviate Engram ▪︎ Pinecone Nexus (not open, but worth mentioning) Vector search libraries and engines: ▪︎ Faiss ▪︎ Vald ▪︎ ScaNN (Scalable Nearest Neighbors) ▪︎ Hnswlib ▪︎ Pgvector ▪︎ VectorChord General-purpose search and database platforms with vector capabilities: ▪︎ Elasticsearch ▪︎ ClickHouse ▪︎ Redis ▪︎ OpenSearch ▪︎ Apache Cassandra ▪︎ MongoDB Atlas Vector Search We've gathered all the must-have links, key info, and tips on when to use each resource. Save this! https://t.co/M96e6Ip4Ea

TheTuringPost's tweet photo. Best open-source vector databases for LLMs in 2026

▪︎ Milvus
▪︎ Chroma
▪︎ Weaviate
▪︎ Qdrant
▪︎ Vespa
▪︎ LanceDB
▪︎ Deep Lake

Knowledge engines and agentic retrieval tools:

▪︎ Chroma Context-1
▪︎ Weaviate Engram
▪︎ Pinecone Nexus (not open, but worth mentioning)

Vector search libraries and engines:

▪︎ Faiss
▪︎ Vald
▪︎ ScaNN (Scalable Nearest Neighbors)
▪︎ Hnswlib
▪︎ Pgvector
▪︎ VectorChord

General-purpose search and database platforms with vector capabilities:

▪︎ Elasticsearch
▪︎ ClickHouse
▪︎ Redis
▪︎ OpenSearch
▪︎ Apache Cassandra
▪︎ MongoDB Atlas Vector Search

We've gathered all the must-have links, key info, and tips on when to use each resource. Save this! https://t.co/M96e6Ip4Ea

3

95

19

93

9K

Shashankm108 retweeted

DAIR.AI

@dair_ai

19 days ago

https://t.co/8j1NNuQsFS

19

341

43

502

78K

Shashankm108 retweeted

DAIR.AI

@dair_ai

19 days ago

The Top AI Papers of the Week (June 7 - June 14) - Agentopia - Self-Harness - Agents' Last Exam - MiniMax Sparse Attention - Lookahead Sparse Attention - How AI Agents Reshape Knowledge Work - The Geometry of On-Policy Distillation Read on for more:

16

315

67

270

43K

Shashankm108 retweeted

0xSero

@0xSero

20 days ago

196-256gb memory bros rejoice. Minimax is ready

12

308

10

97

27K

Shashankm108 retweeted

Cameron R. Wolfe, Ph.D.

@cwolferesearch

22 days ago

I’m giving a talk on LLM judges at the Toronto Machine Learning Summit next week. The talk will cover practical techniques like: - Collecting high-quality expert feedback on subjective tasks. - Calibrating LLM judges with expert opinions. - Properly eliciting reasoning within an LLM judge. - Using multiple agents to decompose complex evaluation tasks. - Continually improving LLM judges with production monitoring / metrics. This talk will be full of practical details for building useful evaluation systems. Hope to see you there!

cwolferesearch's tweet photo. I’m giving a talk on LLM judges at the Toronto Machine Learning Summit next week. The talk will cover practical techniques like:

- Collecting high-quality expert feedback on subjective tasks.
- Calibrating LLM judges with expert opinions.
- Properly eliciting reasoning within an LLM judge.
- Using multiple agents to decompose complex evaluation tasks.
- Continually improving LLM judges with production monitoring / metrics.

This talk will be full of practical details for building useful evaluation systems. Hope to see you there!

9

65

5

49

4K

Shashank Mishra

@Shashankm108

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users