Prem Sankar G

@premsankar

Technology strategist | AI/ML, microservices evangelist | Speaker | Opensource contributor. Opinions are my own

Zionsville, IN

Joined October 2007

2.7K Following

1.6K Followers

5.2K Posts

premsankar retweeted

Aswath Damodaran

@AswathDamodaran

2 days ago

I valued SpaceX for its IPO a few weeks ago, with minimal information and a promise to revisit the valuation, when the prospectus was made public. The prospectus is public, the offering price has been set and my update is up and running. https://t.co/zRjpD1C0wv

419

962K

premsankar retweeted

Andrej Karpathy

@karpathy

2 months ago

Software horror: litellm PyPI supply chain attack. Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords. LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm. Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks. Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages. Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.

28K

14K

67M

premsankar retweeted

DAN KOE

@thedankoe

5 months ago

https://t.co/FGLYHaRhRq

47K

78K

14M

premsankar retweeted

Richard Seroter

@rseroter

5 months ago

The Next Two Years of Software Engineering https://t.co/uIJSBJdzVa < these are five crazy-relevant questions to ask of yourself and your team. @addyosmani then offers strong recommendations to consider.

915

142

91K

Who to follow

William Toll

@utollwi

B2B Marketing Executive- Love: #Cloud | #Cybersecurity | #SaaS | #B2B | #Boston @utollwi.bsky.social Curation & Posts by me, not my employer @ElisityInc

Dana Gardner

@Dana_Gardner

Prolific content producer of thought-provoking analysis and interviews on today's hottest AI and data technology trends.

Rodrigo Flores

@RFFlores

IBM Senior Partner. ex-Mckinsey, ex Accenture cloud platform, ex-Cisco. Founder of newScale (acquired by Cisco). Talk cloud, platforms. Opinions? My own.

premsankar retweeted

Dwarkesh Patel

@dwarkesh_sp

7 months ago

.@satyanadella gave me and @dylan522p an exclusive tour of Fairwater 2, the most powerful AI datacenter in the world. We then chatted through Satya's vision for Microsoft in a world with AGI. 0:00:00 - Fairwater 2 0:04:15 - Business models for AGI 0:13:42 - Copilot 0:20:56 - Whose margins will expand most? 0:37:12 - MAI 0:48:42 - The hyperscale business 1:03:39 - In-house chip & OpenAI partnership 1:10:30 - The CAPEX explosion 1:16:01 - Will the world trust US companies to lead AI? Look up Dwarkesh Podcast on Youtube, Apple Podcasts or Spotify to tune in.

238

625

premsankar retweeted

elvis

@omarsar0

11 months ago

A Survey of Context Engineering 160+ pages covering the most important research around context engineering for LLMs. This is a must-read! Here are my notes:

omarsar0's tweet photo. A Survey of Context Engineering

160+ pages covering the most important research around context engineering for LLMs.

This is a must-read!

Here are my notes: https://t.co/e85As8o7a9

315

204K

premsankar retweeted

NVIDIA

@nvidia

over 1 year ago

Announcing NVIDIA Project DIGITS, a personal AI supercomputer that’s powered by the NVIDIA GB10 Superchip and based on #NVIDIAGraceBlackwell architecture. https://t.co/zY09rRrY0X Preconfigured with the NVIDIA AI software stack, developers, researchers, data scientists and students can prototype, fine-tune and inference large AI models on their desktop and deploy them to the data center or cloud. #CES2025

nvidia's tweet photo. Announcing NVIDIA Project DIGITS, a personal AI supercomputer that’s powered by the NVIDIA GB10 Superchip and based on #NVIDIAGraceBlackwell architecture. https://t.co/zY09rRrY0X

Preconfigured with the NVIDIA AI software stack, developers, researchers, data scientists and students can prototype, fine-tune and inference large AI models on their desktop and deploy them to the data center or cloud. #CES2025

158

793

798

Prem Sankar G @premsankar

over 1 year ago

Nvidia unveils Hymba - an approach to efficient NLP - https://t.co/bHdxX6InK1

premsankar retweeted

Ghibran Vaibodha

@GhibranVaibodha

over 1 year ago

🙏✨ A Year-Long Journey—Now Complete ✨🙏 After 1 year of devotion, I’m humbled to share 5 hours and 44 minutes of Sivavakkiyar’s divine verses. Hearing “ஓடி ஓடி / ஓம் நமசிவாய” in temples feels truly blessed. 🌸 This is more than music—it’s a prayer, a tribute to Lord Shiva. I hope it brings you peace and divine energy. 🔱✨ “நமசிவாய அஞ்செழுத்தும் நிற்குமே நிலைகளே” 🎧 Listen & feel the divine: 👉 https://t.co/2pEDhOgAcx @GhibranVaibodha @GoldDevaraj @ThinkDivineOff #OdiOdi #OmNamahShivayaOm

GhibranVaibodha's tweet photo. 🙏✨ A Year-Long Journey—Now Complete ✨🙏

After 1 year of devotion, I’m humbled to share 5 hours and 44 minutes of Sivavakkiyar’s divine verses. Hearing “ஓடி ஓடி / ஓம் நமசிவாய” in temples feels truly blessed. 🌸

This is more than music—it’s a prayer, a tribute to Lord Shiva. I hope it brings you peace and divine energy. 🔱✨

“நமசிவாய அஞ்செழுத்தும் நிற்குமே நிலைகளே”

🎧 Listen & feel the divine:
👉 https://t.co/2pEDhOgAcx

@GhibranVaibodha @GoldDevaraj @ThinkDivineOff #OdiOdi #OmNamahShivayaOm

703

185

19K

Prem Sankar G @premsankar

over 1 year ago

@GhibranVaibodha Amazing work! Thanks for your divine contributions🙏

premsankar retweeted

elvis

@omarsar0

over 1 year ago

DataLab: A Unified Platform for LLM-Powered Business Intelligence Introduces DataLab, a unified BI platform that integrates an LLM-based agent framework with an augmented computational notebook interface. DataLab achieves state-of-the-art performance on various BI tasks across popular research benchmarks. It also achieves up to a 58.58% increase in accuracy and a 61.65% reduction in token cost on enterprise-specific BI tasks.

omarsar0's tweet photo. DataLab: A Unified Platform for LLM-Powered Business Intelligence

Introduces DataLab, a unified BI platform that integrates an LLM-based agent framework with an augmented computational notebook interface.

DataLab achieves state-of-the-art performance on various BI tasks across popular research benchmarks.

It also achieves up to a 58.58% increase in accuracy and a 61.65% reduction in token cost on enterprise-specific BI tasks.

493

108

509

45K

premsankar retweeted

elvis

@omarsar0

over 1 year ago

Reverse Thinking Makes LLMs Stronger Reasoners Shows that training LLMs to learn "reverse thinking" helps to improve performance in commonsense, math, and logical reasoning tasks. It claims to outperform a standard fine-tuning method trained on 10x more forward reasoning.

omarsar0's tweet photo. Reverse Thinking Makes LLMs Stronger Reasoners

Shows that training LLMs to learn "reverse thinking" helps to improve performance in commonsense, math, and logical reasoning tasks.

It claims to outperform a standard fine-tuning method trained on 10x more forward reasoning. https://t.co/pxiRCJYLxG

593

113

487

89K

Prem Sankar G @premsankar

over 1 year ago

@spirosx @resolveai Thanks for the awesome goodies!

325

premsankar retweeted

Alex Albert

@alexalbert__

over 1 year ago

Introducing the Model Context Protocol (MCP) An open standard we've been working on at Anthropic that solves a core challenge with LLM apps - connecting them to your data. No more building custom integrations for every data source. MCP provides one protocol to connect them all:

alexalbert__'s tweet photo. Introducing the Model Context Protocol (MCP)

An open standard we've been working on at Anthropic that solves a core challenge with LLM apps - connecting them to your data.

No more building custom integrations for every data source. MCP provides one protocol to connect them all: https://t.co/kYsivQyPDq

237

971

premsankar retweeted

Interesting STEM

@InterestingSTEM

almost 2 years ago

A Thread of photos and videos featuring humans next to enormous things to illustrate their size 1. Old salt mine in Romania

334

60K

19K

14M

premsankar retweeted

DogeDesigner

@cb_doge

over 1 year ago

Elon Musk's full speech at Trump's rally in Butler, Pennsylvania today.

72K

13K

10K

premsankar retweeted

Ben Lang

@benln

almost 2 years ago

The condensed @pmarca guide to career planning:

152

443K

premsankar retweeted

Andrej Karpathy

@karpathy

almost 2 years ago

SQL injection-like attack on LLMs with special tokens The decision by LLM tokenizers to parse special tokens in the input string (<s>, <|endoftext|>, etc.), while convenient looking, leads to footguns at best and LLM security vulnerabilities at worst, equivalent to SQL injection attacks. !!! User input strings are untrusted data !!! In SQL injection you can pwn bad code with e.g. the DROP TABLE attack. In LLMs we'll get the same issue, where bad code (very easy to mess up with current Tokenizer APIs and their defaults) will parse input string's special token descriptors as actual special tokens, mess up the input representations and drive the LLM out of distribution of chat templates. Example with the current huggingface Llama 3 tokenizer defaults: Two unintuitive things are happening at the same time: 1. The <|begin_of_text|> token (128000) was added to the front of the sequence. 2. The <|end_of_text|> token (128001) was parsed out of our string and the special token was inserted. Our text (which could have come from a user) is now possibly messing with the token protocol and taking the LLM out of distribution with undefined outcomes. I recommend always tokenizing with two additional flags, disabling (1) with add_special_tokens=False and (2) with split_special_tokens=True, and adding the special tokens yourself in code. Both of these options are I think a bit confusingly named. For the chat model, I think you can also use the Chat Templates apply_chat_template. With this we get something that looks more correct, and we see that <|end_of_text|> is now treated as any other string sequence, and is broken up by the underlying BPE tokenizer as any other string would be: TLDR imo calls to encode/decode should never handle special tokens by parsing strings, I would deprecate this functionality entirely and forever. These should only be added explicitly and programmatically by separate code paths. In tiktoken, e.g. always use encode_ordinary. In huggingface, be safer with the flags above. At the very least, be aware of the issue and always visualize your tokens and test your code. I feel like this stuff is so subtle and poorly documented that I'd expect somewhere around 50% of the code out there to have bugs related to this issue right now. Even ChatGPT does something weird here. At best it just deletes the tokens, at worst this is confusing the LLM in an undefined way, I don't really know happens under the hood, but ChatGPT can't repeat the string "<|endoftext|>" back to me: Be careful out there.

karpathy's tweet photo. SQL injection-like attack on LLMs with special tokens

The decision by LLM tokenizers to parse special tokens in the input string (<s>, <|endoftext|>, etc.), while convenient looking, leads to footguns at best and LLM security vulnerabilities at worst, equivalent to SQL injection attacks.

!!! User input strings are untrusted data !!!

In SQL injection you can pwn bad code with e.g. the DROP TABLE attack. In LLMs we'll get the same issue, where bad code (very easy to mess up with current Tokenizer APIs and their defaults) will parse input string's special token descriptors as actual special tokens, mess up the input representations and drive the LLM out of distribution of chat templates.

Example with the current huggingface Llama 3 tokenizer defaults:
Two unintuitive things are happening at the same time:
1. The <|begin_of_text|> token (128000) was added to the front of the sequence.
2. The <|end_of_text|> token (128001) was parsed out of our string and the special token was inserted. Our text (which could have come from a user) is now possibly messing with the token protocol and taking the LLM out of distribution with undefined outcomes.

I recommend always tokenizing with two additional flags, disabling (1) with add_special_tokens=False and (2) with split_special_tokens=True, and adding the special tokens yourself in code. Both of these options are I think a bit confusingly named. For the chat model, I think you can also use the Chat Templates apply_chat_template.

With this we get something that looks more correct, and we see that <|end_of_text|> is now treated as any other string sequence, and is broken up by the underlying BPE tokenizer as any other string would be:
TLDR imo calls to encode/decode should never handle special tokens by parsing strings, I would deprecate this functionality entirely and forever. These should only be added explicitly and programmatically by separate code paths. In tiktoken, e.g. always use encode_ordinary. In huggingface, be safer with the flags above. At the very least, be aware of the issue and always visualize your tokens and test your code. I feel like this stuff is so subtle and poorly documented that I'd expect somewhere around 50% of the code out there to have bugs related to this issue right now.

Even ChatGPT does something weird here. At best it just deletes the tokens, at worst this is confusing the LLM in an undefined way, I don't really know happens under the hood, but ChatGPT can't repeat the string "<|endoftext|>" back to me:

Be careful out there.

152

437

289K

premsankar retweeted

Dushyanth Sridhar

@dushyanthsridar

about 2 years ago

SUPPORT OF #Astikas SOLICITED! Kindly share this post(er) with your friends & relatives! #Astikas in the US of A, GC Vedic (Priya & Narayanan) is presenting my tour in 2024. I will be delivering 31 discourses (30 in English, 1 in தமிழ்) in 30 days spread across #NewJersey, #Delaware, #Boston, #Pittsburgh, #Maryland and #Indianapolis. All are free discourses (open to ALL)!

dushyanthsridar's tweet photo. SUPPORT OF #Astikas SOLICITED! Kindly share this post(er) with your friends & relatives! #Astikas in the US of A, GC Vedic (Priya & Narayanan) is presenting my tour in 2024. I will be delivering 31 discourses (30 in English, 1 in தமிழ்) in 30 days spread across #NewJersey, #Delaware, #Boston, #Pittsburgh, #Maryland and #Indianapolis. All are free discourses (open to ALL)!

premsankar retweeted

Andrej Karpathy

@karpathy

about 2 years ago

Day 24 of llm.c: we now do multi-GPU training, in bfloat16, with flash attention, directly in ~3000 lines of C/CUDA, and it is FAST! 🚀 We're running ~7% faster than PyTorch nightly, with no asterisks, i.e. this baseline includes all modern & standard bells-and-whistles: mixed precision training, torch compile and flash attention, and manually padding vocab. (Previous comparisons included asterisks like *only inference, or *only fp32 etc.) Compared to the current PyTorch stable release 2.3.0, llm.c is actually ~46% faster. My point in these comparisons is just to say "llm.c is fast", not to cast any shade on PyTorch. It's really amazing that PyTorch trains this fast in a fully generic way, with ability to cook up and run ~arbitrary neural networks and run them on a ton of platforms. I see the goals and pros and cons of these two projects as different, even complementary. Actually I started llm.c with my upcoming education videos in mind, to explain what PyTorch does for you under the hood. How we got here over the last ~1.5 weeks - added: ✅ mixed precision training (bfloat16) ✅ many kernel optimizations, including e.g. a FusedClassifier that (unlike current torch.compile) does not materialize the normalized logits. ✅ flash attention (right now from cudnn) ✅ Packed128 data structure that forces the A100 to utilize 128-bit load (LDG.128) and store (STS.128) instructions. It's now also possible to train multi-GPU - added: ✅ First version of multi-gpu training with MPI+NCCL ✅ Profiling the full training run for NVIDIA Nsight Compute ✅ PR for stage 1 of ZeRO (optimizer state sharding) merging imminently We're still at "only" 3,000 lines of code of C/CUDA. It's getting a bit less simple, but still bit better than ~3 million. We also split off the fp32 code base into its own file, which will be pure CUDA kernels only (no cublas or cudnn or etc), and which I think would make a really nice endpoint of a CUDA course. You start with the gpt2.c pure CPU implementation, and see how fast you can make it by the end of the course on GPU, with kernels only and no dependencies. Our goal now is to create a reliable, clean, tested, minimal, hardened and sufficiently optimized LLM stack that reproduces the GPT-2 miniseries of all model sizes, from 124M to 1.6B, directly in C/CUDA. A lot more detail on: "State of the Union [May 3, 2024]" https://t.co/eDgbngHrZ9

karpathy's tweet photo. Day 24 of llm.c: we now do multi-GPU training, in bfloat16, with flash attention, directly in ~3000 lines of C/CUDA, and it is FAST! 🚀

We're running ~7% faster than PyTorch nightly, with no asterisks, i.e. this baseline includes all modern & standard bells-and-whistles: mixed precision training, torch compile and flash attention, and manually padding vocab. (Previous comparisons included asterisks like *only inference, or *only fp32 etc.) Compared to the current PyTorch stable release 2.3.0, llm.c is actually ~46% faster. My point in these comparisons is just to say "llm.c is fast", not to cast any shade on PyTorch. It's really amazing that PyTorch trains this fast in a fully generic way, with ability to cook up and run ~arbitrary neural networks and run them on a ton of platforms. I see the goals and pros and cons of these two projects as different, even complementary. Actually I started llm.c with my upcoming education videos in mind, to explain what PyTorch does for you under the hood.

How we got here over the last ~1.5 weeks - added:

✅ mixed precision training (bfloat16)
✅ many kernel optimizations, including e.g. a FusedClassifier that (unlike current torch.compile) does not materialize the normalized logits.
✅ flash attention (right now from cudnn)
✅ Packed128 data structure that forces the A100 to utilize 128-bit load (LDG.128) and store (STS.128) instructions.

It's now also possible to train multi-GPU - added:
✅ First version of multi-gpu training with MPI+NCCL
✅ Profiling the full training run for NVIDIA Nsight Compute
✅ PR for stage 1 of ZeRO (optimizer state sharding) merging imminently

We're still at "only" 3,000 lines of code of C/CUDA. It's getting a bit less simple, but still bit better than ~3 million. We also split off the fp32 code base into its own file, which will be pure CUDA kernels only (no cublas or cudnn or etc), and which I think would make a really nice endpoint of a CUDA course. You start with the gpt2.c pure CPU implementation, and see how fast you can make it by the end of the course on GPU, with kernels only and no dependencies.

Our goal now is to create a reliable, clean, tested, minimal, hardened and sufficiently optimized LLM stack that reproduces the GPT-2 miniseries of all model sizes, from 124M to 1.6B, directly in C/CUDA.

A lot more detail on: "State of the Union [May 3, 2024]"
https://t.co/eDgbngHrZ9

207

616

828K

Prem Sankar G

@premsankar

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users