CryptoRiderW 🌖 @cryptoriderw - Twitter Profile

Pinned Tweet

CryptoRiderW 🌖 @cryptoriderw

almost 4 years ago

Oops achievement unlocked. FatMan could not accept criticism 🤣

0

1

0

cryptoriderw retweeted

Jason Zhu

@GoSailGlobal

about 1 month ago

Stanford CS336 上，Tatsu 讲了一节 LLM 架构课，把过去 3 年所有主流 LLM 拆开，看它们的共通模板结论挺爆：90% 的架构选择已经收敛，你随便挑一个开源大模型，它跟其他模型在这些维度上几乎一模一样讲师的原话 - 2024 年大家都在 cosplay Llama2 - 2025 年的主题是「怎么训得不崩」 - 2026 年的主题是「怎么扛住长上下文」下面是 2026 年开源 LLM 的标准模板你训自己的模型可以直接抄【架构层已经收敛的 7 件事】 1）Layer Norm 挪出残差流（pre-norm）原版 Transformer 把 LN 放在残差里几乎所有现代模型都挪到外面原因：keep your residual stream clean 梯度反传更稳 2）RMS Norm 替代 LayerNorm LayerNorm 的减均值 + 加 bias 那部分实际没怎么帮上忙丢掉之后 flops 只省 0.17% 但运行时省到 25% （瓶颈在数据搬运计算反而次要） 3）所有 bias 项全删跟 RMS Norm 一个道理系统层省内存搬运 4）激活函数用 SwiGLU 或 GeGLU gated linear unit 几乎所有现代模型都用 Llama 系 / Qwen / Mistral 用 SwiGLU Google 系（Gemma / T5）用 GeGLU 区别极小选哪个都行 5）位置编码用 RoPE 2024 年之后基本统一了原理：把每对维度按位置旋转一个角度让 inner product 只依赖相对位置 6）Transformer block 串联（不是并联） GPT-J / Palm 试过并联现在基本被放弃串联的实现优化得太好了并联省的那点系统开销不值得损失表达力 7）Layer norm 可以「撒」哪儿不稳就在哪儿加 LN attention 之前能加之后能加两边都加（double norm）也可以现代模型很多这样做【超参数已经收敛的 5 个数】 1）feedforward 维度 / hidden 维度 - 非 GLU 模型：4 倍 - GLU 模型：8/3 ≈ 2.67 倍（因为 GLU 多一组矩阵要保持总参数量） - Llama 系：3.5 倍 - T5 1.0 试过 64 倍后来 T5 1.1 改回标准别学 2）head 数 × head 维度 ≈ hidden 维度几乎所有模型都遵守 T5 是为数不多的例外 3）模型纵横比（hidden / 层数）≈ 100 太深 pipeline parallel 难做太宽表达力受限 100 这个数字是系统约束 + 表达力的平衡点 4）vocab size 单语模型：30K 左右（早期 GPT-2 那种）多语 / 通用模型：100K-200K（GPT-4 / Llama 3 / Gemma 都在这个范围）现代基本都是后者 5）weight decay 仍然普遍使用但研究发现它在 LLM 里干的事其实是优化器干预让你最终能收敛到更深的最优点跟你想的「防过拟合」没什么关系所以别因为「单 epoch 不会过拟合」就把它关掉【稳定性三个救命 trick】训练大模型最怕中途 loss 突然飙升然后 NaN 全军覆没现代模型用三个 trick 防这件事 1）Z-loss output softmax 的 normalizer 容易爆加一个 (log Z)² 的正则项让 Z 始终接近 1 DCLM / Olmo 都用 2）QK norm attention 的 Q 和 K 在矩阵乘之前各加一个 LN 让 softmax 的输入永远是单位尺度 multimodal 圈先用起来现在所有大模型都加 3）Logit soft cap（仅 Google 系） attention logit 用 tanh 硬封顶 Gemma 2/3/4 都在用但会损失一点点性能慎用【Attention 两个新趋势】 1）GQA（Grouped Query Attention）几乎统一原版 multi-head 推理时 KV cache 会让算术强度崩到 1/h GQA 共享 K 和 V 但保留多个 Q 表达力几乎不损失推理成本砍掉 80% 现在所有要做生产部署的大模型没有不用 GQA 的 2）局部 + 全局 attention 交替处理长上下文的新方式 Cohere Command A 起头现在 Llama 4 / Gemma 4 / Olmo 3 全在用比如每 4 层有 1 层 full attention 其他 3 层是 sliding window 只看附近的 token 比纯 SSM 更稳比纯 full attention 便宜得多（Qwen 3.5 做了变体把 sliding window 那 3 层换成 SSM）收尾一句如果你正在训自己的 LLM，上面这一套就是 2026 年的「默认配置」不需要重新发明，直接抄如果你只是想看懂 GitHub 上那些 modeling_xxx.py 这一份足够你不再被术语吓住

29

3K

591

5K

534K

cryptoriderw retweeted

Orange AI

@oran_ge

about 2 years ago

令人兴奋 Databricks 在上周推出了 DBRX 开源大模型，成为开源 LLM SOTA - 通用指标，超越 Gemini 1.0 Pro 和 GPT-3.5 Turbo - 编程能力，超越专门用于代码生成的模型 CodeLLaMA-70B - 细粒度的MoE架构，132B个参数，16专家选4，32K 上下文闭源模型亚历山大 https://t.co/uV1XRU3Q8G

oran_ge's tweet photo. 令人兴奋
Databricks 在上周推出了 DBRX 开源大模型，成为开源 LLM SOTA
- 通用指标，超越 Gemini 1.0 Pro 和 GPT-3.5 Turbo
- 编程能力，超越专门用于代码生成的模型 CodeLLaMA-70B
- 细粒度的MoE架构，132B个参数，16专家选4，32K 上下文
闭源模型亚历山大
https://t.co/uV1XRU3Q8G https://t.co/P2Vr9jJ9jb

3

113

25

94

31K

cryptoriderw retweeted

Yam Peleg

@Yampeleg

almost 3 years ago

The first model to beat 100% of ChatGPT-3.5 Available on Huggingface 🔥 OpenChat_8192 🔥 105.7% of ChatGPT (Vicuna GPT-4 Benchmark) Less than a month ago the world witnessed as ORCA [1] became the first model to ever outpace ChatGPT on Vicuna's benchmark. Today, the race to replicate these results open-source comes to an end. Minutes ago OpenChat scored 105.7% of ChatGPT. But wait! There is more! Not only OpenChat beated Vicuna's benchmark, it did so pulling off a LIMA [2] move! Training was done using 6K GPT-4 conversations out of the ~90K ShareGPT conversations. The model comes in three versions: the basic OpenChat model, OpenChat-8192 and OpenCoderPlus (Code generation: 102.5% ChatGPT) This is a significant achievement considering that it's the first (released) open-source model to surpass the Vicuna benchmark. 🎉🎉 - OpenChat: https://t.co/lglHYQpo2A - OpenChat_8192: https://t.co/XU9o3GaVsg (best chat) - OpenCoderPlus: https://t.co/qwPCD8mXkg (best coder) - Dataset: https://t.co/tXj34fv5Wp - Code: https://t.co/WhS5dPq6ml Congratulations to the authors!! --- [1] - Orca: The first model to cross 100% of ChatGPT: https://t.co/vRyupCy7Tg [2] - LIMA: Less Is More for Alignment - TL;DR: Using small number of VERY high quality samples (1000 in the paper) can be as powerful as much larger datasets: https://t.co/58bo1qarSl

Yampeleg's tweet photo. The first model to beat 100% of ChatGPT-3.5
Available on Huggingface

🔥 OpenChat_8192

🔥 105.7% of ChatGPT (Vicuna GPT-4 Benchmark)

Less than a month ago the world witnessed as ORCA [1] became the first model to ever outpace ChatGPT on Vicuna's benchmark.

Today, the race to replicate these results open-source comes to an end.

Minutes ago OpenChat scored 105.7% of ChatGPT.

But wait! There is more!

Not only OpenChat beated Vicuna's benchmark, it did so pulling off a LIMA [2] move!

Training was done using 6K GPT-4 conversations out of the ~90K ShareGPT conversations.

The model comes in three versions: the basic OpenChat model, OpenChat-8192 and OpenCoderPlus (Code generation: 102.5% ChatGPT)

This is a significant achievement considering that it's the first (released) open-source model to surpass the Vicuna benchmark. 🎉🎉

- OpenChat: https://t.co/lglHYQpo2A
- OpenChat_8192: https://t.co/XU9o3GaVsg (best chat)
- OpenCoderPlus: https://t.co/qwPCD8mXkg (best coder)

- Dataset: https://t.co/tXj34fv5Wp

- Code: https://t.co/WhS5dPq6ml

Congratulations to the authors!!

---

[1] - Orca: The first model to cross 100% of ChatGPT: https://t.co/vRyupCy7Tg
[2] - LIMA: Less Is More for Alignment - TL;DR: Using small number of VERY high quality samples (1000 in the paper) can be as powerful as much larger datasets: https://t.co/58bo1qarSl

58

2K

437

2K

559K

Who to follow

Spajdzik

@spajdzik

Backend dev: Senior PHP Developer $SEI $LUNA

Afro₿it🧡

@AfroBit21

Study ₿itcoin and create value.

dgFjTiw nFksg

@Suppandiman

Fsjarhjad hadjfa afhafj jarj wtkarj

cryptoriderw retweeted

FENG DONG

@middlefeng

about 3 years ago · San Jose

看到最后。

32

644

185

69

139K

cryptoriderw retweeted

fox hsiao

@pirrer

about 3 years ago

OpenAI 是怎麼做出ChatGPT 的，在那里工作又是一種怎樣的體驗？我們採訪了參與ChatGPT 訓練工作的@ Trinkle，請他為我們講了講ChatGPT 背後的工作，以及他是如何一路學習成長並進入OpenAI 工作的。 https://t.co/CNeiEUzhDM

pirrer's tweet photo. OpenAI 是怎麼做出ChatGPT 的，在那里工作又是一種怎樣的體驗？我們採訪了參與ChatGPT 訓練工作的@ Trinkle，請他為我們講了講ChatGPT 背後的工作，以及他是如何一路學習成長並進入OpenAI 工作的。

https://t.co/CNeiEUzhDM https://t.co/ftBf3YSXlW

7

381

92

164

75K

cryptoriderw retweeted

Tim Qian @Tim_Qian

about 3 years ago

Introducing https://t.co/mYBLXrANtB: Create. Use. Share. ChatGPT Prompts

39

553

188

384

447K

cryptoriderw retweeted

Mayo Oshin

@mayowaoshin

about 3 years ago

I built a GPT-4 'Warren Buffett' financial analyst to 'chat' with and analyze multiple PDF files (~1000 pages) across @elonmusk's Tesla 10-k annual reports (2020-2022) #gpt4 #openai #investing #stocks #finance

280

10K

1K

10K

4M

cryptoriderw retweeted

Jiayuan (JY) Zhang

@jiayuan_jy

about 3 years ago

如何基于 ChatGPT 创建个人的知识库 AI 经过几周的内测，现在正式发布 Copilot Hub 👇 https://t.co/ofSQR0JVOc Copilot Hub 是一个帮助你基于私有数据创建智能知识库 & 人格化 AI 的平台。你可以基于文档、网站、Notion database 或其他数据源在几分钟内创建一个自定义的 ChatGPT。 🧵

jiayuan_jy's tweet photo. 如何基于 ChatGPT 创建个人的知识库 AI

经过几周的内测，现在正式发布 Copilot Hub 👇

https://t.co/ofSQR0JVOc

Copilot Hub 是一个帮助你基于私有数据创建智能知识库 & 人格化 AI 的平台。你可以基于文档、网站、Notion database 或其他数据源在几分钟内创建一个自定义的 ChatGPT。

🧵 https://t.co/ug5mlUvrSX

167

3K

1K

2K

679K

CryptoRiderW 🌖 @cryptoriderw

about 3 years ago

@oran_ge 並不是每個人都希望自己的作品成為AI的養份的。尤其是一些寫作的可能把很多自己的靈感放在筆記裡面，如果app直接把這些embed了去基本上是盜竊了原作者的idea 這不是跟不跟潮流的問題，而是隱私以及尊重原作的問題

1

7

0

761

cryptoriderw retweeted

DAIR.AI

@dair_ai

about 3 years ago

Top ML Papers of the Week (Mar 13 - Mar 19): - GPT-4 - FlexGen - NeRFMeshing - Resurrecting RNNs - An Overview of Language Models - Universal Prompt Retrieval for LLMs ...

12

893

131

522

225K

cryptoriderw retweeted

DAIR.AI

@dair_ai

about 3 years ago

Prompt Engineering Guide ICYMI, we recently launched the prompt engineering guide that makes it easier to stay up-to-date with prompt engineering techniques and papers. https://t.co/UrrKL5xHu6

dair_ai's tweet photo. Prompt Engineering Guide

ICYMI, we recently launched the prompt engineering guide that makes it easier to stay up-to-date with prompt engineering techniques and papers.

https://t.co/UrrKL5xHu6 https://t.co/JEkUbokREf

39

1K

319

1K

313K

cryptoriderw retweeted

Jiayuan (JY) Zhang

@jiayuan_jy

about 3 years ago

在 M2 的 Macbook Air 上进行 LLaMA 7B 模型的本地推理 🤯 虽然目前生成的效果还比较差，但是推理速度极快。真正做到了消费级硬件推理，开源社区生态发展太快了。视频未加速。 https://t.co/A9cTddgiGd

32

782

191

282

232K

cryptoriderw retweeted

Stanford Blockchain Club

@StanfordCrypto

about 3 years ago

1/ SBR Updates: We've just published Articles 5-7 of Volume 1, covering on-chain socials, crypto insider trading, and GameFi economics! (Easter eggs in 🐣 🧵👇) Thanks to our authors @bridge__harris @sophfuji, @sabina_beleuz, @0xCousinSY 📚 🎉🎉 https://t.co/gCboStLKHX

1

23

10

4

22K

cryptoriderw retweeted

Zain Kahn

@heykahn

about 3 years ago

GPT-4 will make you superhuman. But only if you know how to use it effectively. Here are 10 ways you can start using GPT-4 today:

500

36K

7K

31K

8M

cryptoriderw retweeted

宝玉

@dotey

about 3 years ago

朋友在写prompt，想让ChatGPT帮忙按照要求格式输出，老是不对，于是我帮他改了一下，成了 BTW: OpenAI的PlayGround用来调试prompt很方便的 https://t.co/tvROdhz0I7

17

321

72

120

92K

cryptoriderw retweeted

yetone

@yetone

about 3 years ago

大家好！现在全平台的 OpenAI Translator 已经支持深色模式，感谢 tywtyw2002 和 @lazy_static 同学的贡献！ https://t.co/F2loftYH4A

31

318

51

41

190K

cryptoriderw retweeted

Harsh Makadia

@MakadiaHarsh

about 3 years ago

All Unbelievable examples of GPT-4:

131

5K

1K

4K

2M

cryptoriderw retweeted

Jiayuan (JY) Zhang

@jiayuan_jy

about 3 years ago

OpenAI 刚刚发布了 GPT-4 GPT-4 是大型多模态模型（large multimodal model），支持图像和文本的输入，并生成文本结果。这个 thread 会汇总一下有关 GPT-4 的一些信息（包括论文中的一些要点和实际的体验）。 🧵

108

2K

758

704

609K

cryptoriderw retweeted

McQuaid @michaelgmcquaid

about 3 years ago

My list of the best #crypto traders to follow on CT: @trader1sz @Trader_XO @CanteringClark @Tradermayne @TraderKoz @Crypto_Chase @BobLoukas @BigCheds @Nebraskangooner @Pentosh1 @canuck2usa

11

25

3

12

19K

cryptoriderw retweeted

CallMeWhy

@PleaseCallMeWhy

over 3 years ago

LlamaIndex (GPT Index) 极大的降低了我这种 AI 菜狗的定制化门槛。。。只要准备好知识库，跑个几行 python，chatGPT 就能用自己的知识库做问答了。推荐这篇 https://t.co/HzzmeS9QrF，里面有作者的 Colab Notebook ，一眼就会，开箱即炼。

PleaseCallMeWhy's tweet photo. LlamaIndex (GPT Index) 极大的降低了我这种 AI 菜狗的定制化门槛。。。只要准备好知识库，跑个几行 python，chatGPT 就能用自己的知识库做问答了。推荐这篇 https://t.co/HzzmeS9QrF，里面有作者的 Colab Notebook ，一眼就会，开箱即炼。 https://t.co/5rrWkuhwlq

57

949

279

560

166K

CryptoRiderW 🌖

@cryptoriderw

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users