Neelalohith R Kashyap

@NeelalohithK

Joint CS PhD Student @ BITS Pilani, India and La Trobe University, Australia | ASCRIN Fellow | Doctoral Researcher @ Machine Intelligence Group

Hyderabad

Joined August 2020

668 Following

57 Followers

227 Posts

Neelalohith R Kashyap @NeelalohithK

2 days ago

@cneuralnetwork It's extremely hard to land a pre-doc as it's very competitive. But, you can try applying, I was not able to get a good pre-doc after trying, hence, had to think of doing a PhD

Neelalohith R Kashyap @NeelalohithK

2 days ago

@YuvrajS9886 @cneuralnetwork No, predocs do exist in the industry labs

NeelalohithK retweeted

Chelsea Finn

@chelseabfinn

5 days ago

LLM RL optimizes for sequential reasoning We also optimize over the reasoning strategy, incl parallel trains of thought, aggregation of parallel traces, & sequential reasoning This allows the model to better explore & allocate compute at test time https://t.co/DkTSllkmvp

chelseabfinn's tweet photo. LLM RL optimizes for sequential reasoning

We also optimize over the reasoning strategy, incl parallel trains of thought, aggregation of parallel traces, & sequential reasoning

This allows the model to better explore & allocate compute at test time

https://t.co/DkTSllkmvp

434

348

53K

NeelalohithK retweeted

Weijia Shi

@WeijiaShi2

4 days ago

Defended my PhD recently! Grateful to my advisors @LukeZettlemoyer and @nlpnoah, committee members Colin and @lucyluwang, and the many mentors, collaborators, and the @uwnlp community for all the support and friendship along the way ❤️ Slides: Break the language model monolith (https://t.co/gKWcXEck2f)

WeijiaShi2's tweet photo. Defended my PhD recently!

Grateful to my advisors @LukeZettlemoyer and @nlpnoah, committee members Colin and @lucyluwang, and the many mentors, collaborators, and the @uwnlp community for all the support and friendship along the way ❤️

Slides: Break the language model monolith (https://t.co/gKWcXEck2f)

115

879

48K

Who to follow

मेराज अहमद معراج احمد

@MerajAh02035427

👉अर्श वाले, मेरी तकदीर की हिफाज़त करना। ज़मीं के सारे खुदाओं से उलझ बैठा हूँ मैं।।👈

NeelalohithK retweeted

Yoshua Bengio

@Yoshua_Bengio

10 days ago

An interesting new paper by my recent PhD graduate on how AI agents' greed for visible incentives can lead them to abandon their safety alignment. You can read it here: https://t.co/y64uOBvSiC

544

317

47K

Neelalohith R Kashyap @NeelalohithK

9 days ago

@adxtyahq You are absolutely right and it's complerely worth the time !

Neelalohith R Kashyap @NeelalohithK

9 days ago

@ayaannmalik @amilabs @ylecun That's impressive ! Congrats !

125

Neelalohith R Kashyap @NeelalohithK

9 days ago

@connoratlunon https://t.co/kk2jxg9nQS

NeelalohithK retweeted

Akshay 🚀

@akshay_pachaar

11 days ago

Turn any paper into running code. Just swap arxiv → autoarxiv in the paper url. That hands the paper to an AI agent from alphaXiv. It reads the abstract, the claims, and the linked GitHub repo, then clones the codebase and works through the usual setup pain like dependencies, broken paths, environment config, and hardware assumptions. From there it designs a minimal reproduction. That means a smaller model, fewer steps, and a single GPU instead of a cluster, scaled down just enough to test whether the headline claim holds. The whole run is live and fully logged. Loss curves, metrics, and training progress are all observable as it happens. What comes back is a clean signal on whether the minimal run matches the paper's reported result, plus an estimate of what a full replication would cost in compute and time. A lot of research code dies in setup before anyone verifies a single number. This moves reproduction from a weekend of debugging to a url change. Pick a paper and try it now. video credits: @askalphaxiv

457

598

45K

NeelalohithK retweeted

Alisa Liu @alisawuffles

11 days ago

I'm joining OpenAI next week!🥹 The job search turned out to be really challenging but also super rewarding, so I wrote a small blog to share what I learned along the way and hopefully make the process a little less mysterious for the next person. https://t.co/6FigSBdenD

507

14K

19K

NeelalohithK retweeted

wesley hsieh

@chengyenhsieh

28 days ago

Resources to Review Linear Algebra for Deep Learning (Interview) I've been asked many times by new grads on how to review linear algebra and deep learning fundamentals. I found reading textbook particularly efficient and effective. Even after joining industry, I still read textbooks from time to time. I summarize resources, important topics, and my personal notes below.

chengyenhsieh's tweet photo. Resources to Review Linear Algebra for Deep Learning (Interview)

I've been asked many times by new grads on how to review linear algebra and deep learning fundamentals.

I found reading textbook particularly efficient and effective. Even after joining industry, I still read textbooks from time to time. I summarize resources, important topics, and my personal notes below.

410

571

21K

Neelalohith R Kashyap @NeelalohithK

11 days ago

@Sanyam0605 Let's gooooo, I love research too

215

NeelalohithK retweeted

Param

@ParamSiddh

13 days ago

If I had 6 months to become an AI Infrastructure Engineer. I’d do this. Stage 1 — Linux + Networking Processes, memory, GPUs, sockets, HTTP, TCP/IP basics. Stage 2 — Python + Backend Async Python, FastAPI, queues, concurrency fundamentals. Stage 3 — GPU Fundamentals CUDA basics, VRAM, batching, quantization, throughput. Stage 4 — LLM Inference vLLM, TensorRT-LLM, speculative decoding, KV caching. Stage 5 — Distributed Systems Load balancing, queues, retries, autoscaling, distributed workers. Stage 6 — AI Serving Model APIs, streaming responses, rate limiting, observability. Stage 7 — Data Pipelines Kafka, Airflow, ETL pipelines, vector indexing. Stage 8 — Kubernetes + Cloud Docker, Kubernetes, AWS/GCP basics, infra automation. Stage 9 — Monitoring + Reliability Prometheus, Grafana, tracing, AI cost monitoring. Stage 10 — Real AI Systems Deploy scalable chat apps, RAG pipelines, inference clusters. Stage 11 — Open Source Contribute to inference tooling or AI infra projects. Stage 12 — Apply AI Infra Engineer, Platform Engineer, ML Systems Engineer. AI apps go viral. AI infrastructure prints money.

570

955

28K

Neelalohith R Kashyap @NeelalohithK

12 days ago

@cneuralnetwork Messed up totally, because of my PhD, it's stressful

238

Neelalohith R Kashyap @NeelalohithK

13 days ago

@cneuralnetwork You're very lucky, be happy !

243

NeelalohithK retweeted

Praveen Kumar Verma

@Alacritic_Super

15 days ago

🚀 15+ Model Compression Techniques Every ML Engineer Should Know 1. Quantization Reduces numerical precision from FP32 to FP16, INT8, INT4, or even 1-bit representations. This significantly reduces memory usage and often accelerates inference with minimal accuracy loss. --- 2. Pruning Removes unimportant weights, neurons, attention heads, layers, or entire blocks. The goal is to eliminate redundancy while preserving model performance. --- 3. Knowledge Distillation Transfers knowledge from a large teacher model into a smaller student model. The student learns both the task and the teacher's behavior. --- 4. Low-Rank Factorization Decomposes large weight matrices into smaller matrices with lower rank. This reduces storage and matrix multiplication costs. --- 5. Weight Sharing Multiple parameters share the same stored values. Instead of millions of unique weights, many connections reuse common representations. --- 6. Sparse Representations Stores only important weights while ignoring near-zero values. Sparse models can dramatically reduce storage and computation requirements. --- 7. Structured Pruning Instead of removing individual weights, entire neurons, channels, attention heads, or Transformer blocks are removed. Hardware often benefits more from structured sparsity. --- 8. Dynamic Sparsity The sparse structure changes during training rather than remaining fixed. Important connections can emerge while unimportant ones disappear. --- 9. Weight Clustering Groups similar weights together and replaces them with shared cluster centroids. This reduces memory requirements while preserving behavior. --- 10. Huffman Coding Applies lossless compression after quantization or clustering. Frequently occurring values receive shorter binary representations. --- 11. Tensor Decomposition Uses techniques such as CP, Tucker, and Tensor Train decompositions to compress large neural network tensors. --- 12. Neural Architecture Search (NAS) Discovers compact architectures automatically. Instead of compressing a large model, it finds a smaller architecture from the beginning. --- 13. Lottery Ticket Pruning Finds sparse subnetworks capable of achieving performance comparable to the original network. The idea is that efficient subnetworks already exist inside large models. --- 14. Early Exit Networks Allows inference to terminate early when confidence is sufficiently high. Easy samples require less computation than difficult samples. --- 15. Mixture of Experts (MoE) Only a small subset of model parameters are activated for each input. Massive models become computationally efficient because most parameters remain inactive. --- 16. Retrieval-Augmented Generation (RAG) Instead of storing all knowledge inside model weights, external knowledge is retrieved when needed. This reduces pressure to continuously scale model size. --- 17. Adapter-Based Learning Techniques such as LoRA, QLoRA, Adapters, IA3, and Prefix Tuning train tiny parameter subsets instead of full models. --- 18. Layer Dropping Removes less important layers after training. Many Transformer layers contribute less than expected and can sometimes be eliminated. --- 19. Token Pruning Removes less important tokens during inference. Especially useful for vision transformers and long-context language models. --- 20. KV Cache Compression Compresses attention cache memory used during long-context inference. Critical for serving modern LLMs efficiently. --- Modern Compression Stack Pretrained Model ↓ Architecture Optimization ↓ Pruning ↓ Distillation ↓ Quantization ↓ Sparse Computation ↓ KV Cache Compression ↓ Production Deployment

Neelalohith R Kashyap @NeelalohithK

15 days ago

@_vmlops 100%

NeelalohithK retweeted

himanshu

@himanshustwts

17 days ago

since a good bunch of discourse is going on around "how to do research", these pieces are quite worth a read. https://t.co/pA0MkOMlKS https://t.co/rw9uMiwlCj https://t.co/H1AGvnb7LP https://t.co/FTyAabr9Rx