George Ang @gnap - Twitter Profile

22 days ago

@GlosPazura @ModelScope2022 Thx a lot for the infomation. Could you provide the spec of your inference enviroment？neofetch would be nice:)

0

40

George Ang

@gnap

22 days ago

@GlosPazura @ModelScope2022 Sorry for the inconveniency. Could you DM me how to produce your decode rate, such as which quantized gguf you use, and which quantized version of qwen 3.5 9B you are comparing with? Plus the specific llamacpp version which LM Studio use would be great.

1

0

69

George Ang

@gnap

about 1 year ago

@flaneur2023 这种代码好像哪里见过？

2

0

78

gnap retweeted

OpenBMB

@OpenBMB

over 1 year ago

💥 Introducing MiniCPM-o 2.6: An 8B size, GPT-4o level Omni Model runs on device ✨ Highlights: ~Match GPT-4o-202405 in vision, audio and multimodal live streaming ~End-to-end real-time bilingual audio conversation ~Voice cloning & emotion control ~Advanced OCR & video understanding ~Offline iPad-compatible multimodal live streaming 🔗 Try it out: GitHub:https://t.co/gtRJoHOlfd HF:https://t.co/IY9KgoOqSI Demo:https://t.co/IzZuyz0qB1

29

695

160

509

98K

Who to follow

Tim✨

@timyangnet

Co-Founder Westar Labs | 🛠️ $STC & AI Explorer | Ex-Chief Architect Weibo (NASDAQ:WB) What we hear is opinion; what we see is perspective. 此有故彼有此生故彼生

Jason Lee

@huacnlee

Be a rakish programmer.

𝓧𝓲𝓷 𝓛𝓲

@delphij

機器人及檔案系統維修工人，聯編系統掃地僧，互聯網遺老俱樂部會員。

George Ang

@gnap

over 1 year ago

讲一个听来的段子。一个团队搞模型量化结果不太理想，模型只会生成重复的 YES！在大家在讨论什么较准集，SQNR，重复惩罚时，一个哥们盯着书架上的《梦的解析》若有所思。第二天这哥们就和一个搞 TTS 的光速离职了。据说俩人都跳槽到了做成人保健品的公司。

0

5

0

369

gnap retweeted

Georgi Gerganov

@ggerganov

over 1 year ago

Here is the PR / tech blog: https://t.co/p1heIyecw1 I've tried to describe most of the interesting implementation details. I believe the performance is quite good and it should run nicely even on low-mid range hardware. Enjoy your local copilot in the terminal!

1

56

4

20

7K

George Ang

@gnap

about 2 years ago

YOCO 的性能提升大头来自作者去年的 RetNet. 但工业界显然更愿意采用简单的结构，然后靠 Scaling Law 来提升模型效果。有种你训我推荐，我训我不用的画面感。

0

1

0

427

George Ang

@gnap

about 2 years ago

@flaneur2023 @zdyxry 收购了 HashiCorp 可以组成 horrible. :P

0

1

0

107

George Ang

@gnap

about 2 years ago

并行采样时受预测分布的影响，推出雷同的序列的概率蛮高的，尤其是输出的前缀。今天在内部引擎复现了 Paged Attention 论文里提到的 Copy-on-Write 机制，只在推出不同结果时才做页拆分。比较好奇 vllm 的实现里也有 COW 的机制，但看起来从 prompts 最后一个页就开始复制了。

0

1

0

413

George Ang

@gnap

about 2 years ago

@scv119 怎么不往社区提交哇？

0

74

George Ang

@gnap

over 2 years ago

节前调研了下目前社区最强的 PagedAttention 算子还是 TensorRT-LLM 的实现，直接在 FlashAttention 基础上实现了分页管理。可惜不开源，以 cubin 的方式分发。vllm 的实现是常规的 softmax(QK^t) 融合，矩阵乘法没有使用 mma 指令，也就没使用 TensorCore.

1

3

0

1

642

George Ang

@gnap

over 2 years ago

@ruanyf 看起来还是手机系统上的 AI 落地。这个无论喊不喊 All in 今年都是各厂商系统更新的重点。

0

3

0

2K

gnap retweeted

AlexZ 🦀

@blackanger

over 2 years ago

老黄这条言论，也有很多反对声音，见评论区。这条评论我比较认同。

11

102

26

40

45K

George Ang

@gnap

over 2 years ago

@imwsl90 如果产业用人时给人的能力套一个 sin 函数，那么积累再多也没用哇。

0

573

George Ang

@gnap

over 2 years ago

@xiaohuggg 亮点不应该是5:00-5:05pm 吗？向 scaling law 妥协只花 5 分钟。

0

1

0

489

gnap retweeted

Saining Xie

@sainingxie

over 2 years ago

Here's my take on the Sora technical report, with a good dose of speculation that could be totally off. First of all, really appreciate the team for sharing helpful insights and design decisions – Sora is incredible and is set to transform the video generation community. What we have learned so far: - Architecture: Sora is built on our diffusion transformer (DiT) model (published in ICCV 2023) — it's a diffusion model with a transformer backbone, in short: DiT = [VAE encoder + ViT + DDPM + VAE decoder]. According to the report, it seems there are not much additional bells and whistles. - "Video compressor network": Looks like it's just a VAE but trained on raw video data. Tokenization probably plays a significant role in getting good temporal consistency. By the way, VAE is a ConvNet, so DiT technically is a hybrid model ;) (1/n)