Dong-Sig Han @DongSigHan - Twitter Profile

4 months ago

One paper got accepted at CVPR! Learning Coordinate-based Convolutional Kernels for Continuous SE(3) Equivariant and Efficient Point Cloud Analysis Jaein Kim, Hee Bin Yoo, Dong-Sig Han, Byoung-Tak Zhang

0

7

0

629

DongSigHan retweeted

Accepted papers at TMLR @TmlrPub

6 months ago

Variational Online Mirror Descent for Robust Learning in Schrödinger Bridge Dong-Sig Han, Jaein Kim, HEE BIN YOO, Byoung-Tak Zhang. Action editor: Chris Maddison. https://t.co/DbCzZSwwHG #schrödinger #variational #wasserstein

0

5

1

6

1K

DongSigHan retweeted

François Chollet

@fchollet

over 1 year ago

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.

fchollet's tweet photo. Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task). It's very expensive, but it's not just brute -- these capabilities are new territory and they demand serious scientific attention.

201

9K

2K

3K

2M

DongSigHan retweeted

Andrej Karpathy

@karpathy

almost 2 years ago

You can tell the RL is done properly when the models cease to speak English in their chain of thought

290

7K

378

1K

679K

Who to follow

Sungjae Cho

@SungjaeAI

PhD parasite @Mila_Quebec | Neural intelligence ⇄ Machine intelligence | My Bio: CompSci + Math → AI & CogSci → neural TTS → CompNeuro → Neuro + AI

Sangwoong Yoon

@WoongSSang

Assistant Professor @ UNIST. Previously a postdoc @ UCL and KIAS. PhD, MS, and BS @ Seoul National Univ.

Sangyun Lee

@sang_yun_lee

PhD student @CMU_ECE | ex-intern @nvidia | Generative models

DongSigHan retweeted

Thomas Wolf

@Thom_Wolf

almost 2 years ago

Strong picture!

36

2K

308

398

422K

DongSigHan retweeted

Seohong Park @seohong_park

about 2 years ago

Most works in offline RL focus on learning better value functions. So value learning is the main bottleneck in offline RL... right? In our new paper, we show that this is *not* the case in general! Paper: https://t.co/1lsLPxrdR9 Blog post: https://t.co/BYXKEb49hO A thread ↓

6

333

53

192

57K

DongSigHan retweeted

Gabriel Peyré

@gabrielpeyre

almost 2 years ago

The heat equation can be applied to diffuse a function on a surface, but it can also be applied to evolve the surface itself. https://t.co/DvBR583hza https://t.co/Xy3Xanvvdq

6

739

120

188

29K

DongSigHan retweeted

Gabriel Peyré

@gabrielpeyre

about 2 years ago

There are two ways to differentiate a function over the space of measures: vertically (Euclidean) and horizontally (Wasserstein). These correspond to locally modifying the mass amplitude and positions, respectively. The Wasserstein derivative is the usual (pointwise) gradient of the Euclidean derivative.

gabrielpeyre's tweet photo. There are two ways to differentiate a function over the space of measures: vertically (Euclidean) and horizontally (Wasserstein). These correspond to locally modifying the mass amplitude and positions, respectively. The Wasserstein derivative is the usual (pointwise) gradient of the Euclidean derivative.

6

899

144

516

70K

DongSigHan retweeted

Arena.ai

@arena

about 2 years ago

🔥Breaking News from Chatbot Arena @AnthropicAI Claude 3.5 Sonnet has just made a huge leap, securing the #1 spot in Coding Arena, Hard Prompts Arena, and #2 in the Overall leaderboard. New Sonnet has surpassed Opus at 5x the lower cost and competitive with frontier models GPT-4o/Gemini 1.5 Pro across the boards. Huge congrats to @AnthropicAI for this incredible milestone! Can't wait to see the new Opus & Haiku. Our new vision leaderboard is also coming soon! (More analysis below)

arena's tweet photo. 🔥Breaking News from Chatbot Arena

@AnthropicAI Claude 3.5 Sonnet has just made a huge leap, securing the #1 spot in Coding Arena, Hard Prompts Arena, and #2 in the Overall leaderboard.

New Sonnet has surpassed Opus at 5x the lower cost and competitive with frontier models GPT-4o/Gemini 1.5 Pro across the boards.

Huge congrats to @AnthropicAI for this incredible milestone! Can't wait to see the new Opus & Haiku. Our new vision leaderboard is also coming soon!

(More analysis below)

26

1K

201

228

178K

DongSigHan retweeted

François Chollet

@fchollet

about 2 years ago

This is a great paper. LLMs cannot plan and reason on their own -- they're pattern-matchers. But that doesn't mean they can't be useful for reasoning. They work best *in conjunction* with systems that can actually plan or reason -- such as symbolic planners.

21

1K

243

932

132K

DongSigHan retweeted

Gabriel Peyré

@gabrielpeyre

about 2 years ago

Oldies but goldies: R Jordan, D Kinderlehrer, F Otto, The Variational Formulation of the Fokker-Planck Equation. Showed that the heat equation is the Optimal Transport flow (i.e., the flow of particle systems) minimizing the Shannon-Boltzmann entropy. More general entropies lead to non-linear PDEs. Opened a whole field of research. https://t.co/pUDxIZc1AJ

2

446

72

151

25K

DongSigHan retweeted

Nathan Lambert

@natolambert

about 2 years ago

lol at Apple sneaking this RLHF gem into their "Apple foundation models" blog post and no one talked about it "We have developed two novel algorithms in post-training: (1) a rejection sampling fine-tuning algorithm with teacher committee, and (2) a reinforcement learning from human feedback (RLHF) algorithm with mirror descent policy optimization and a leave-one-out advantage estimator. We find that these two algorithms lead to significant improvement in the model’s instruction-following quality." What it means tomorrow in @interconnectsai

9

424

35

482

109K

DongSigHan retweeted

Andrej Karpathy

@karpathy

about 2 years ago

📽️ New 4 hour (lol) video lecture on YouTube: "Let’s reproduce GPT-2 (124M)" https://t.co/QTUdu8b0qh The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model: - first we build the GPT-2 network - then we optimize it to train very fast - then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers - then we bring up model evaluation, and - then cross our fingers and go to sleep. In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Github. The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step. https://t.co/BOzkxQ8at2 Chapters. On a high level Section 1 is building up the network, a lot of this might be review. Section 2 is making the training fast. Section 3 is setting up the run. Section 4 is the results. In more detail: 00:00:00 intro: Let’s reproduce GPT-2 (124M) 00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint 00:13:47 SECTION 1: implementing the GPT-2 nn.Module 00:28:08 loading the huggingface/GPT-2 parameters 00:31:00 implementing the forward pass to get logits 00:33:31 sampling init, prefix tokens, tokenization 00:37:02 sampling loop 00:41:47 sample, auto-detect the device 00:45:50 let’s train: data batches (B,T) → logits (B,T,C) 00:52:53 cross entropy loss 00:56:42 optimization loop: overfit a single batch 01:02:00 data loader lite 01:06:14 parameter sharing wte and lm_head 01:13:47 model initialization: std 0.02, residual init 01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms 01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms 01:39:38 float16, gradient scalers, bfloat16, 300ms 01:48:15 torch.compile, Python overhead, kernel fusion, 130ms 02:00:18 flash attention, 96ms 02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms 02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping 02:21:06 learning rate scheduler: warmup + cosine decay 02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms 02:34:09 gradient accumulation 02:46:52 distributed data parallel (DDP) 03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU) 03:23:10 validation data split, validation loss, sampling revive 03:28:23 evaluation: HellaSwag, starting the run 03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro 03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA 03:59:39 summary, phew, build-nanogpt github repo

karpathy's tweet photo. 📽️ New 4 hour (lol) video lecture on YouTube:
"Let’s reproduce GPT-2 (124M)"
https://t.co/QTUdu8b0qh

The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model:
- first we build the GPT-2 network
- then we optimize it to train very fast
- then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers
- then we bring up model evaluation, and
- then cross our fingers and go to sleep.
In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.

Github. The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step.
https://t.co/BOzkxQ8at2

Chapters.
On a high level Section 1 is building up the network, a lot of this might be review. Section 2 is making the training fast. Section 3 is setting up the run. Section 4 is the results. In more detail:
00:00:00 intro: Let’s reproduce GPT-2 (124M)
00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint
00:13:47 SECTION 1: implementing the GPT-2 nn.Module
00:28:08 loading the huggingface/GPT-2 parameters
00:31:00 implementing the forward pass to get logits
00:33:31 sampling init, prefix tokens, tokenization
00:37:02 sampling loop
00:41:47 sample, auto-detect the device
00:45:50 let’s train: data batches (B,T) → logits (B,T,C)
00:52:53 cross entropy loss
00:56:42 optimization loop: overfit a single batch
01:02:00 data loader lite
01:06:14 parameter sharing wte and lm_head
01:13:47 model initialization: std 0.02, residual init
01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms
01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms
01:39:38 float16, gradient scalers, bfloat16, 300ms
01:48:15 torch.compile, Python overhead, kernel fusion, 130ms
02:00:18 flash attention, 96ms
02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms
02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping
02:21:06 learning rate scheduler: warmup + cosine decay
02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms
02:34:09 gradient accumulation
02:46:52 distributed data parallel (DDP)
03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU)
03:23:10 validation data split, validation loss, sampling revive
03:28:23 evaluation: HellaSwag, starting the run
03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro
03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA
03:59:39 summary, phew, build-nanogpt github repo

412

15K

2K

10K

2M

DongSigHan retweeted

Aran Komatsuzaki

@arankomatsuzaki

about 2 years ago

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Presents Mamba-2, which outperforms Mamba and Transformer++ in both perplexity and wall-clock time https://t.co/Sd3J3kPG5W

arankomatsuzaki's tweet photo. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Presents Mamba-2, which outperforms Mamba and Transformer++ in both perplexity and wall-clock time

https://t.co/Sd3J3kPG5W https://t.co/C2nAisXcoN

7

539

99

374

92K

Dong-Sig Han @DongSigHan

over 3 years ago

The Forward Forward Algorithm: Some Preliminary Investigations Geoffrey Hinton https://t.co/1zuHryxEfA

0

2

0

DongSigHan retweeted

James Thornton @JamesTThorn

over 3 years ago

For fellow bridge enthusiasts, check out an extension we carved out from the main based on iterative time reversals - Riemannian Diffusion Schrodinger Bridge - Accelerates Riemannian diffusions models - Facilities interpolation based on optimal transport https://t.co/NWcptBSPup

JamesTThorn's tweet photo. For fellow bridge enthusiasts, check out an extension we carved out from the main based on iterative time reversals - Riemannian Diffusion Schrodinger Bridge

- Accelerates Riemannian diffusions models
- Facilities interpolation based on optimal transport

https://t.co/NWcptBSPup https://t.co/hmkrWt6Q3a

0

22

5

3

0

DongSigHan retweeted

Lénaïc Chizat @LenaicChizat

over 3 years ago

Gradient Langevin dynamics generate samples from densities against the Lebesgue measure. What about densities against the Wiener measure (law of Brownian)? This is more tricky, but it can be done! Here's an example solving a "trajectory inference" problem. https://t.co/aoH7755Gxs

7

529

91

149

0

DongSigHan retweeted

Grant Sanderson

@3blue1brown

over 3 years ago

So close and yet so far

126

7K

349

283

0

Dong-Sig Han @DongSigHan

over 3 years ago

It is great to be at #NeurIPS2022! There are many papers that I would use in future research. I am also grateful that many people visited our posters yesterday.

DongSigHan's tweet photo. It is great to be at #NeurIPS2022! There are many papers that I would use in future research.

I am also grateful that many people visited our posters yesterday. https://t.co/ixZDs3Ozvz

0

5

0

DongSigHan retweeted

Jin-Hwa Kim

@jnhwkim

over 3 years ago

Thursday at 11 AM (CST), M.I.D. will be presented in Hall J from 11 AM to 1 PM. Look forward to seeing you! 🙌 "Mutual Information Divergence: A Unified Metric for Multimodal Generative Models" Hall J #900 https://t.co/tbw7z2lBja

jnhwkim's tweet photo. Thursday at 11 AM (CST), M.I.D. will be presented in Hall J from 11 AM to 1 PM. Look forward to seeing you! 🙌

"Mutual Information Divergence: A Unified Metric for Multimodal Generative Models"
Hall J #900
https://t.co/tbw7z2lBja https://t.co/hpJWn0CkNb

1

12

2

0

Dong-Sig Han

@DongSigHan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users