Craig Pfeifer

@aCraigPfeifer

currently AI integration @ TCG, ex-@lightningai, ex-@Mitrecorp, @purduecs, PhD dropout @umbccsee

Flyover Country, USA

Joined September 2011

3.4K Following

723 Followers

22.4K Posts

Pinned Tweet

Craig Pfeifer @aCraigPfeifer

almost 7 years ago

@vboykis Producer: Pitch me. Me: It's a psych horror about a software engineer who will be auctioned off to someone who will inhabit his body. His only clues are commit messages in a code repository. It's called "Git Out." Producer: Get out. Me: no, git out. Git is a Producer: Get out.

225

aCraigPfeifer retweeted

Benjamin Van Durme @ben_vandurme

4 months ago

JHU mmBERT extended from 8k to 32k token length by vLLM Semantic Router Team. Cutting edge results on 1,800+ languages, now with longer context! https://t.co/maN3bT1X17

aCraigPfeifer retweeted

Akshay 🚀

@akshay_pachaar

5 months ago

A dead-simple trick to improve LLM performance: Just repeat your prompt twice. No fancy prompting techniques, no chain-of-thought, just plain repetition. Google researchers tested this across Gemini, GPT, Claude, and Deepseek, and the results were surprisingly good. Here's why it works: LLMs are causal, meaning tokens can only see what came before them. When you ask a question after providing context, the question tokens never "saw" the full picture. By repeating the prompt, every token gets to attend to every other token during prefill. The best part: - No increase in output length - No increase in latency - Works as a simple drop-in replacement On one task, Gemini Flash-Lite jumped from 21% to 97% accuracy just by repeating the input. Important note: This helps most when reasoning is disabled. If you're already using "think step-by-step," the gains are mostly neutral since reasoning models tend to repeat the prompt internally anyway. Paper: "Prompt Repetition Improves Non-Reasoning LLMs" from Google Research. Sometimes the simplest ideas win. Link to the paper in the next tweet.

akshay_pachaar's tweet photo. A dead-simple trick to improve LLM performance:

Just repeat your prompt twice.

No fancy prompting techniques, no chain-of-thought, just plain repetition.

Google researchers tested this across Gemini, GPT, Claude, and Deepseek, and the results were surprisingly good.

Here's why it works:

LLMs are causal, meaning tokens can only see what came before them. When you ask a question after providing context, the question tokens never "saw" the full picture.

By repeating the prompt, every token gets to attend to every other token during prefill.

The best part:

- No increase in output length
- No increase in latency
- Works as a simple drop-in replacement

On one task, Gemini Flash-Lite jumped from 21% to 97% accuracy just by repeating the input.

Important note:

This helps most when reasoning is disabled. If you're already using "think step-by-step," the gains are mostly neutral since reasoning models tend to repeat the prompt internally anyway.

Paper: "Prompt Repetition Improves Non-Reasoning LLMs" from Google Research.

Sometimes the simplest ideas win.

Link to the paper in the next tweet.

347

305

34K

aCraigPfeifer retweeted

Benjamin Van Durme @ben_vandurme

5 months ago

This deadline for post-doc applications is coming up. There are so many great people in AI at JHU, even more with the >20 tenure track hires that started this fall.

756

Who to follow

Avi Sil

@aviaviavi__

Senior Director, AI & Applied Science @Oracle NYC | Past - Manager + Principal Scientist @IBMResearch AI | Tweets are my own opinion

Reza Zadeh

@Reza_Zadeh

Founder CEO @Matroid. Adjunct Professor @Stanford. Early @Databricks. Focused on Machine Learning.

Yanai Elazar

@yanaiela

Assistant Professor at Bar-Ilan University

aCraigPfeifer retweeted

Akshay 🚀

@akshay_pachaar

5 months ago

Stanford researchers built a new prompting technique! By adding ~20 words to a prompt, it: - boosts LLM's creativity by 1.6-2x - raises human-rated diversity by 25.7% - beats fine-tuned model without any retraining - restores 66.8% of LLM's lost creativity after alignment Let's understand why and how it works: Post-training alignment methods like RLHF make LLMs helpful and safe, but they unintentionally cause mode collapse. This is where the model favors a narrow set of predictable responses. This happens because of typicality bias in human preference data: When annotators rate LLM responses, they naturally prefer answers that are familiar, easy to read, and predictable. The reward model then learns to boost these "safe" responses, aggressively sharpening the probability distribution and killing creative output. But here's the interesting part: The diverse, creative model isn't gone. After alignment, the LLM still has two personalities. The original pre-trained model with rich possibilities, and the safety-focused aligned model. Verbalized Sampling (VS) is a training-free prompting strategy that recovers the diverse distribution learned during pre-training. The idea is simple: Instead of prompting "Tell me a joke" (which triggers the aligned personality), you prompt: "Generate 5 responses with their corresponding probabilities. Tell me a joke." By asking for a distribution instead of a single instance, you force the model to tap into its full pre-trained knowledge rather than defaulting to the most reinforced answer. Results show verbalized sampling enhances diversity by 1.6-2.1x over direct prompting while maintaining or improving quality. Variants like VS-based Chain-of-Thought and VS-based Multi push diversity even further. You can find the paper link in the next tweet. 👉 Over to you: What other methods can be used to improve LLM diversity?

akshay_pachaar's tweet photo. Stanford researchers built a new prompting technique!

By adding ~20 words to a prompt, it:

- boosts LLM's creativity by 1.6-2x
- raises human-rated diversity by 25.7%
- beats fine-tuned model without any retraining
- restores 66.8% of LLM's lost creativity after alignment

Let's understand why and how it works:

Post-training alignment methods like RLHF make LLMs helpful and safe, but they unintentionally cause mode collapse. This is where the model favors a narrow set of predictable responses.

This happens because of typicality bias in human preference data:

When annotators rate LLM responses, they naturally prefer answers that are familiar, easy to read, and predictable. The reward model then learns to boost these "safe" responses, aggressively sharpening the probability distribution and killing creative output.

But here's the interesting part:

The diverse, creative model isn't gone. After alignment, the LLM still has two personalities. The original pre-trained model with rich possibilities, and the safety-focused aligned model.

Verbalized Sampling (VS) is a training-free prompting strategy that recovers the diverse distribution learned during pre-training.

The idea is simple:

Instead of prompting "Tell me a joke" (which triggers the aligned personality), you prompt: "Generate 5 responses with their corresponding probabilities. Tell me a joke."

By asking for a distribution instead of a single instance, you force the model to tap into its full pre-trained knowledge rather than defaulting to the most reinforced answer.

Results show verbalized sampling enhances diversity by 1.6-2.1x over direct prompting while maintaining or improving quality.

Variants like VS-based Chain-of-Thought and VS-based Multi push diversity even further.

You can find the paper link in the next tweet.

👉 Over to you: What other methods can be used to improve LLM diversity?

314

136K

aCraigPfeifer retweeted

Aniket

@aniketmaurya

6 months ago

As simple as 4 lines of code using Agentor! https://t.co/qwkbTvWjGe

112

aCraigPfeifer retweeted

Sebastian Raschka

@rasbt

about 2 years ago

Just read Apple's "OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework". Similar to the OLMo, it's refreshing to see an LLM paper that shares details discussing the architecture, training methods, and training data. Let's start with the most interesting tidbits: - OpenELM comes in 4 relatively small and convenient sizes: 270M, 450M, 1.1B, and 3B - OpenELM performs slightly better than OLMo even though it's trained on 2x fewer tokens - The main architecture tweak is a layer-wise scaling strategy Sharing details is not the same as explaining them, which is what research papers were aimed to do when I was a graduate student. For instance, they sampled a relatively small subset of 1.8T tokens from various publicly available datasets (RefinedWeb, RedPajama, The PILE, and Dolma). This subset was 2x smaller than Dolma, which was used for training OLMo. What was the rationale for this subsampling, and what were the criteria? The layer-wise scaling strategy (adopted from the "DeLighT: Deep and Light-weight Transformer" paper) is very interesting. I wish there was an ablation studio training an LLM with and without this strategy on the same dataset. But those experiments are expensive, and I can understand why they didn't do it. An interesting bonus that I didn't expect was that the researchers compared LoRA and DoRA (which I discussed a few weeks ago) for parameter-efficient finetuning! It turns out that there wasn't a noticeable difference between the two methods, though. Anyways, great work, and big kudos to the researchers (and Apple) for sharing!

rasbt's tweet photo. Just read Apple's "OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework". Similar to the OLMo, it's refreshing to see an LLM paper that shares details discussing the architecture, training methods, and training data.

Let's start with the most interesting tidbits:
- OpenELM comes in 4 relatively small and convenient sizes: 270M, 450M, 1.1B, and 3B
- OpenELM performs slightly better than OLMo even though it's trained on 2x fewer tokens
- The main architecture tweak is a layer-wise scaling strategy

Sharing details is not the same as explaining them, which is what research papers were aimed to do when I was a graduate student. For instance, they sampled a relatively small subset of 1.8T tokens from various publicly available datasets (RefinedWeb, RedPajama, The PILE, and Dolma). This subset was 2x smaller than Dolma, which was used for training OLMo. What was the rationale for this subsampling, and what were the criteria?

The layer-wise scaling strategy (adopted from the "DeLighT: Deep and Light-weight Transformer" paper) is very interesting. I wish there was an ablation studio training an LLM with and without this strategy on the same dataset. But those experiments are expensive, and I can understand why they didn't do it.

An interesting bonus that I didn't expect was that the researchers compared LoRA and DoRA (which I discussed a few weeks ago) for parameter-efficient finetuning! It turns out that there wasn't a noticeable difference between the two methods, though.

Anyways, great work, and big kudos to the researchers (and Apple) for sharing!

861

180

598

82K

aCraigPfeifer retweeted

Linus

@thesephist

about 2 years ago

A while ago I complained here about persistent storage in Google Colab. Have been using @LightningAI Studios for a while now for: - Full VSCode (incl. GH Copilot) - Persisted files shared across notebooks - Multi-GPU/node (!!) It's been great. Feels like a remote ML workstation

thesephist's tweet photo. A while ago I complained here about persistent storage in Google Colab.

Have been using @LightningAI Studios for a while now for:
- Full VSCode (incl. GH Copilot)
- Persisted files shared across notebooks
- Multi-GPU/node (!!)

It's been great. Feels like a remote ML workstation https://t.co/5axNYbQgJi

259

173

56K

Craig Pfeifer @aCraigPfeifer

about 2 years ago

1: did you hear bob was fired? 2: I didn't, what did they even do? 1: no one knows, maybe that's why they got fired (Two weeks later) 2: oh, yeah bob kind of did a lot.

aCraigPfeifer retweeted

Leonie

@helloiamleonie

about 2 years ago

Ready to build a “Chat with your GitHub repository” application with Mistral via @ollama, @weaviate_io, and @llama_index? I’ve just dropped a @LightningAI Studio template. No setup, just copy and dive right into action. Jump right in here: https://t.co/Bh7Wog5Kqi

141

16K

Craig Pfeifer @aCraigPfeifer

over 2 years ago

"How long have you been working in deep learning?" "Since import theano"

132

Craig Pfeifer @aCraigPfeifer

over 2 years ago

@Joseph_Fasano_ "you who are free / rescue the dead" Two lines, but still https://t.co/dMaqLCWyh3

Craig Pfeifer @aCraigPfeifer

over 2 years ago

Q: What song is the @IndianaUniv Computer Science Marching Band most famous for? A: String, String, String

Craig Pfeifer @aCraigPfeifer

over 2 years ago

@deliprao Also necessary vs sufficient. Models keep getting bigger, but what is necessary for different use cases? What is the 'right size's for different tasks? Different data sets? When does a small domain specific model beat a large, general model?

Craig Pfeifer @aCraigPfeifer

over 2 years ago

@deliprao Representation learning. Everyone looks at what you can do with LLMs, but few understand what they actually are. Open the hood and poke around.

Craig Pfeifer @aCraigPfeifer

over 2 years ago

My favorite part of big data? Big debugging. Said no one ever.

aCraigPfeifer retweeted

Jonathan K. Kummerfeld @jkkummerfeld

over 2 years ago

I'm making a list of NLP faculty who are recruiting PhD students: https://t.co/jnfwyXou0j Results are shared here (after I confirm the submission): https://t.co/qhweG2kFm0 This is an experiment intended to help students find advisers and help advisers find students

378

107

205

47K

aCraigPfeifer retweeted

Jason Wei

@_jasonwei

over 2 years ago

One pattern I noticed is that great AI researchers are willing to manually inspect lots of data. And more than that, they build infrastructure that allows them to manually inspect data quickly. Though not glamorous, manually examining data gives valuable intuitions about the problem. The canonical example here is Andrej Karpathy doing the ImageNet 2000-way classification task himself. And in the era of large language models, manually examining data is probably even more insightful since completions are hard to evaluate via benchmarks. In this spirit, I recently did a few days of pair programming with @hwchung27 where we were starting on a new problem. Instead of trying to replicate baselines and design new methods, we ran some evaluations and manually inspected them to gain insights. We first paid about one day of overhead getting all the relevant information in a single UI so we could examine the data without having to click through multiple web pages. The second day, we spent an afternoon reading examples together and taking notes on the patterns that we noticed in the examples. ChatGPT generates long text, and we actually read the whole thing carefully, even if one example took 20 minutes to understand. I think we both gained a deeper understanding of the problem that we could not have gotten from reading research papers. (In 2018, for example, I helped pathologists label a lot of data to train a lung cancer classifier. After having manually labeled 200+ images (with pathologist correction), I’d probably gained a pathologist-level understanding at that one particular lung cancer classification task :))

198

653

382K

Craig Pfeifer @aCraigPfeifer

over 2 years ago

TFW your interviewer says "we've built our own ML tech stack from the ground up"

aCraigPfeifer retweeted

Alec Stapp

@AlecStapp

over 2 years ago

The backstory to how GPS became freely available for civilian use 🤯

980

836K

aCraigPfeifer retweeted

hardmaru

@hardmaru

over 2 years ago

TinyML and Efficient Deep Learning Computing MIT 6.5940 (https://t.co/9cZmEhXrrr) “This course will introduce efficient AI computing techniques that enable powerful deep learning applications on resource-constrained devices. Topics include model compression, pruning, quantization, neural architecture search, distributed training, data/model parallelism, gradient compression, and on-device fine-tuning. It also introduces application-specific acceleration techniques for large language models, diffusion models, video recognition, and point cloud. This course will also cover topics about quantum machine learning. Students will get hands-on experience deploying large language models (e.g., LLaMA 2) on a laptop.”

216

240K

Craig Pfeifer

@aCraigPfeifer

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users