Christy Bergman @cbergman - Twitter Profile

Christy Bergman @cbergman

about 2 months ago

Thank you @michelleefang ♥️

Michelle Fang 🌁

@michelleefang

about 2 months ago

Thursday 4/16 ‣ Agent Builders Breakfast - Founders & Builders in SOMA, SF https://t.co/QRmSX8uREE @miradu ‣ AI Breakfast: Build your AI workspace in one morning https://t.co/9lPtAlEwLA @rajoshighosh ‣ Platform Engineering & AI https://t.co/N54XPaEgmO @cbergman ‣ Voice AI Meetup: Medical Mode https://t.co/2aKaB14EXc @ryanseams @theaievangelist ‣ Claws Out🦞 GMI ClawHub Demo Night https://t.co/5fm2cI2BJd @nicoleegong @gmi_cloud @yuqih ‣ Voice AI builders night https://t.co/2qn7jABD79 @modal @braintrust ‣ AI Meets HumanX Social w/ Rootly AI, MongoDB, Runpod, & More! https://t.co/52HTzP6ff5 ‣ Slides Down: AI Founders Party by DigitalOcean & Zendesk Apr 16 https://t.co/gr9F4vs7J3 @neffko @julianachyzhova @yfilipch

2

8

3

1

2K

0

39

cbergman retweeted

Michelle Fang 🌁

@michelleefang

about 2 months ago

Thursday 4/16 ‣ Agent Builders Breakfast - Founders & Builders in SOMA, SF https://t.co/QRmSX8uREE @miradu ‣ AI Breakfast: Build your AI workspace in one morning https://t.co/9lPtAlEwLA @rajoshighosh ‣ Platform Engineering & AI https://t.co/N54XPaEgmO @cbergman ‣ Voice AI Meetup: Medical Mode https://t.co/2aKaB14EXc @ryanseams @theaievangelist ‣ Claws Out🦞 GMI ClawHub Demo Night https://t.co/5fm2cI2BJd @nicoleegong @gmi_cloud @yuqih ‣ Voice AI builders night https://t.co/2qn7jABD79 @modal @braintrust ‣ AI Meets HumanX Social w/ Rootly AI, MongoDB, Runpod, & More! https://t.co/52HTzP6ff5 ‣ Slides Down: AI Founders Party by DigitalOcean & Zendesk Apr 16 https://t.co/gr9F4vs7J3 @neffko @julianachyzhova @yfilipch

2

8

3

1

2K

cbergman retweeted

Sebastian Raschka

@rasbt

2 months ago

https://t.co/HH23b6VZ6O

76

3K

419

5K

622K

Christy Bergman @cbergman

6 months ago

💓@AndrewYNg Note to self: look here before next CFP submission or helping others. Ask the model to summarize best advice per conference CFP rules and topic submitter wants to talk about...

Andrew Ng

@AndrewYNg

6 months ago

Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better. I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was painfully slow. We wanted to see if an agentic workflow can help researchers iterate faster. When we trained the system on ICLR 2025 reviews and measured Spearman correlation (higher is better) on the test set: - Correlation between two human reviewers: 0.41 - Correlation between AI and a human reviewer: 0.42 This suggests agentic reviewing is approaching human-level performance. The agent grounds its feedback by searching arXiv, so it works best in fields like AI where research is freely published there. It’s an experimental tool, but I hope it helps you with your research. Check it out here: https://t.co/n7ctnDilJJ

AndrewYNg's tweet photo. Releasing a new "Agentic Reviewer" for research papers. I started coding this as a weekend project, and @jyx_su made it much better.

I was inspired by a student who had a paper rejected 6 times over 3 years. Their feedback loop -- waiting ~6 months for feedback each time -- was painfully slow. We wanted to see if an agentic workflow can help researchers iterate faster.

When we trained the system on ICLR 2025 reviews and measured Spearman correlation (higher is better) on the test set:
- Correlation between two human reviewers: 0.41
- Correlation between AI and a human reviewer: 0.42

This suggests agentic reviewing is approaching human-level performance.

The agent grounds its feedback by searching arXiv, so it works best in fields like AI where research is freely published there. It’s an experimental tool, but I hope it helps you with your research.

Check it out here: https://t.co/n7ctnDilJJ

247

6K

1K

5K

1M

0

60

Who to follow

Scott

@CreateCapital

X-high profile guy. Somewhat older, somewhat wiser.

kourosh hakhamaneshi

@CyrusHakha

LLMs + Ray @anyscalecompute 💻 prev PhD, EECS, @UCBerkeley 👨‍🎓

honghao

@honghao

Software Engineer & Vimer & Javascript, Ruby, Container Enthusiast

Christy Bergman @cbergman

12 months ago

Don't🍷about #OOM running out of memory! @huggingface is making it easier to run huge #TransformerandDiffuser models on consumer GPUs w quantization, tensor parallelism, offloading. Hear from @stevhliu how to fit these models on your setup. https://t.co/FJXgzA0FxD #HuggingFace

cbergman's tweet photo. Don't🍷about #OOM running out of memory!
@huggingface is making it easier to run huge #TransformerandDiffuser models on consumer GPUs w quantization, tensor parallelism, offloading. Hear from @stevhliu how to fit these models on your setup. https://t.co/FJXgzA0FxD #HuggingFace https://t.co/Zw1cabjnZL

0

1

0

89

cbergman retweeted

Towards Data Science

@TDataScience

about 1 year ago

Thankfully @cbergman's article can help you identify key convos with an AI hack to perform semantic clustering simply by prompting LLMs! https://t.co/J0z9FLuhq7

1

7

2

2K

cbergman retweeted

Sam Altman

@sama

over 1 year ago

GPT-4.5 is ready! good news: it is the first model that feels like talking to a thoughtful person to me. i have had several moments where i've sat back in my chair and been astonished at getting actually good advice from an AI. bad news: it is a giant, expensive model. we really wanted to launch it to plus and pro at the same time, but we've been growing a lot and are out of GPUs. we will add tens of thousands of GPUs next week and roll it out to the plus tier then. (hundreds of thousands coming soon, and i'm pretty sure y'all will use every one we can rack up.) this isn't how we want to operate, but it's hard to perfectly predict growth surges that lead to GPU shortages. a heads up: this isn’t a reasoning model and won’t crush benchmarks. it’s a different kind of intelligence and there’s a magic to it i haven’t felt before. really excited for people to try it!

3K

40K

4K

6M

cbergman retweeted

DeepSeek

@deepseek_ai

over 1 year ago

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster ⚡ 40+ GiB/s peak throughput per client node for KVCache lookup 🧬 Disaggregated architecture with strong consistency semantics ✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1 📥 3FS → https://t.co/JRZ2eYk1aK ⛲ Smallpond - data processing framework on 3FS → https://t.co/BoUA6YNjZK

523

10K

1K

3K

3M

Christy Bergman @cbergman

over 1 year ago

@TDataScience TL;DR my blog is about how to go from (data science + code) → (AI prompts + LLMs) for the same results—just faster and with less effort! Here is the @TDataScience archive link: https://t.co/ElOXBy2kym

0

6

1

3

2K

Christy Bergman @cbergman

over 1 year ago

I just published a blog in #DataScienceCollective, the new free open version of @TDataScience. Here, I look at 9 different discords and prompt #LLMs to do #Clustering on user messages. https://t.co/qovzCo8grF

1

0

86

Christy Bergman @cbergman

over 1 year ago

Thanks @pacoid ! I'd better get started preparing my talk for that! #SonomaAI #FoodWineAI

0

1

0

95

Christy Bergman @cbergman

over 1 year ago

Seems devil is in the details for accuracy/latency tradeoff decisions. #w8a8fp: 1. Weights quantized using usual symmetric fp8 method. 2. Activations quantized without pre-calibration i.e. symmetric quantization parameters calculated on-the-fly during model inference.

0

119

Christy Bergman @cbergman

over 1 year ago

🤔hmm, but this paper shows w8a8-fp (symmetric weight and dynamic per-token activation quantization in fp8) is "essentially lossless" in accuracy. https://t.co/Tfh2MZ9bQc

Christy Bergman @cbergman

over 1 year ago

Interesting! The most common inference quantization int8/fp8 is not necessarily the best. bf16 #quantization is a way better accuracy/latency tradeoff.

0

1

0

233

1

0

167

Christy Bergman @cbergman

over 1 year ago

Interesting! The most common inference quantization int8/fp8 is not necessarily the best. bf16 #quantization is a way better accuracy/latency tradeoff.

Aidan McLaughlin

@aidan_mclau

almost 2 years ago

aidan bench update: i ran llama 3.1 405b at bf16 (shoutout to @hyperbolic_labs) and we got a *way* better score. 405b fp8 is around gpt-4o-mini-level 405b bf16 beats claude-3.5-sonnet give me bf16 or give me death

aidan_mclau's tweet photo. aidan bench update:

i ran llama 3.1 405b at bf16 (shoutout to @hyperbolic_labs) and we got a *way* better score.

405b fp8 is around gpt-4o-mini-level
405b bf16 beats claude-3.5-sonnet

give me bf16 or give me death https://t.co/cWRrbLlS6p

44

518

31

148

227K

0

1

0

233

Christy Bergman @cbergman

over 1 year ago

Nice to meet and chat w/you too! @adamse @felipehoffa It was fun to get some hands-on time and see what's new with @awscloud Bedrock.

Adam Seligman

@adamse

over 1 year ago

@felipehoffa @cbergman so great to see you at the @awscloud GenAI Loft today!

0

5

1

0

1K

0

3

1

530

Christy Bergman @cbergman

over 1 year ago

I just tried this hack. Thanks, I really needed that! 😂

Thomas Wolf

@Thom_Wolf

over 1 year ago

Self-care life hack: if you feel a bit down/tired, paste the url of your website/linkedin/bio in Google's NotebookLM to get 8 min of realistically sounding deep congratulations for your life and achievements from a duo of podcast experts 😂

111

4K

336

3K

611K

0

2

0

1

108

cbergman retweeted

swyx

@swyx

over 1 year ago

CUDA MODE hackathon today! Here's @karpathy on the 🏖️ origin story of llm.c, and what it hints at for the fast, simple, llm-compiled future of custom software.

13

616

53

439

97K

Christy Bergman @cbergman

almost 2 years ago

Interesting take-down how to do LoRA properly, quickly, with less memory, on all layers @danielhanchen's tweet and blog https://t.co/p5WVg9isXF ! > For continued pretraining, I advise people to train on all layers (inc gate) + lm_head, embed_tokens, use RS LoRA, use rank>=256

Daniel Han

@danielhanchen

about 2 years ago

My take on "LoRA Learns Less and Forgets Less" 1) "MLP/All" did not include gate_proj. QKVO, up & down trained but not gate (pg 3 footnote) 2) Why does LoRA perform well on math and not code? lm_head & embed_tokens wasn't trained, so domain shifts not modelled. Also reason why "LoRA Forgets Less". Use "modules_to_save" in HF PEFT or "lm_head", "embed_tokens" in @UnslothAI 3) Code rank=256 used α=32 (too small!) (pg 18), but Maths α=2*r=512. RS LoRA paper showed α/sqrt(r) needed for larger ranks. & common practice is 2*r. So also why Code did worse than Maths 4) Extrapolating Maths vs fft looks good. Small datasets LoRA>fft, but I theorize that's because of reason 2 5) LoftQ & PiSSA paper init LoRA from SVD(W) => papers show comparable perf of LoRA 6) LoRA+ paper shows B matrix needs larger lr. DoRA (mentioned in paper) learns these scalars. TLDR: Code worse since α=32 is too small. No embed_tokens, lm_head (or layernorms), not even gate_proj? Better init & lr scaling can help For continued pretraining, I advise people to train on all layers (inc gate) + lm_head, embed_tokens, use RS LoRA, use rank>=256 LoRA paper: https://t.co/CzRpid0CUp RS LoRA paper: https://t.co/fg21UYqH58 LoRA+ paper: https://t.co/tgJjFdgSiG PiSSA paper: https://t.co/VBr22c1grv DoRA paper: https://t.co/YQApPi2MJk

danielhanchen's tweet photo. My take on "LoRA Learns Less and Forgets Less"

1) "MLP/All" did not include gate_proj. QKVO, up & down trained but not gate (pg 3 footnote)

2) Why does LoRA perform well on math and not code? lm_head & embed_tokens wasn't trained, so domain shifts not modelled. Also reason why "LoRA Forgets Less". Use "modules_to_save" in HF PEFT or "lm_head", "embed_tokens" in @UnslothAI

3) Code rank=256 used α=32 (too small!) (pg 18), but Maths α=2*r=512. RS LoRA paper showed α/sqrt(r) needed for larger ranks. & common practice is 2*r. So also why Code did worse than Maths

4) Extrapolating Maths vs fft looks good. Small datasets LoRA>fft, but I theorize that's because of reason 2

5) LoftQ & PiSSA paper init LoRA from SVD(W) => papers show comparable perf of LoRA

6) LoRA+ paper shows B matrix needs larger lr. DoRA (mentioned in paper) learns these scalars.

TLDR: Code worse since α=32 is too small. No embed_tokens, lm_head (or layernorms), not even gate_proj? Better init & lr scaling can help

For continued pretraining, I advise people to train on all layers (inc gate) + lm_head, embed_tokens, use RS LoRA, use rank>=256

LoRA paper: https://t.co/CzRpid0CUp
RS LoRA paper: https://t.co/fg21UYqH58
LoRA+ paper: https://t.co/tgJjFdgSiG
PiSSA paper: https://t.co/VBr22c1grv
DoRA paper: https://t.co/YQApPi2MJk

7

573

118

513

63K

0

1

109

cbergman retweeted

The AI Conference

@AIconference

almost 2 years ago

🌟Join our expert panel at The AI Conference 2024 to explore advanced RAG (Retrieval-Augmented Generation) techniques. Learn how integrating information retrieval with generative models is revolutionizing AI, making it more contextually rich and useful in real-world applications. Don’t miss out—register now to be part of the future of AI! https://t.co/JrDLmuSTcV #developers #TAIC2024 #data #programmers #software #innovators #techindustry #engineer #scientists #theaiconference

AIconference's tweet photo. 🌟Join our expert panel at The AI Conference 2024 to explore advanced RAG (Retrieval-Augmented Generation) techniques.

Learn how integrating information retrieval with generative models is revolutionizing AI, making it more contextually rich and useful in real-world applications.

Don’t miss out—register now to be part of the future of AI!
https://t.co/JrDLmuSTcV

#developers #TAIC2024 #data #programmers #software #innovators #techindustry #engineer #scientists #theaiconference

1

7

5

0

2K

cbergman retweeted

Zilliz

@zilliz_universe

almost 2 years ago

Monday Meetup is right around the corner! 🗣 Join us in SF on August 5 for exciting talks: 🔢 Using Ray Data for Multimodal Embedding Inference with @cbergman 📐 A Different Angle: Retrieval Optimized Embedding Models @marqo_ai 🛠 Building the Future of Neural Search: How to Train State-of-the-Art Embeddings with @mixedbreadai 🔗 Save your spot: https://t.co/3oOeYosewo #Meetup #AI #RAG #SFevents

zilliz_universe's tweet photo. Monday Meetup is right around the corner! 🗣 Join us in SF on August 5 for exciting talks:
🔢 Using Ray Data for Multimodal Embedding Inference with @cbergman
📐 A Different Angle: Retrieval Optimized Embedding Models @marqo_ai
🛠 Building the Future of Neural Search: How to Train State-of-the-Art Embeddings with @mixedbreadai

🔗 Save your spot: https://t.co/3oOeYosewo

#Meetup #AI #RAG #SFevents

0

7

5

1

551

Christy Bergman

@cbergman

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users