Xinyao Niu @sirius_ctrl - Twitter Profile

2 months ago

thread of ui ideas for claude code, codex and cowork type products warning long-ish, 31 tweets lmk if there's one you particularly like 1/ what if the websearch tool in coding agents had a more thorough status display:

chrisbarber's tweet photo. thread of ui ideas for claude code, codex and cowork type products

warning long-ish, 31 tweets

lmk if there's one you particularly like

1/ what if the websearch tool in coding agents had a more thorough status display: https://t.co/NCrKDlT2vm

10

80

1

164

8K

sirius_ctrl retweeted

Mr.RC｜𝟎𝐱𝐔

@MrRyanChi

3 months ago

https://t.co/TEUP3779I5

173

5K

1K

10K

2M

Xinyao Niu @sirius_ctrl

4 months ago

Agent will become the new frontend of our digital world

0

25

sirius_ctrl retweeted

nader dabit

@dabit3

5 months ago

https://t.co/q9anIlmWM0

51

1K

127

2K

375K

sirius_ctrl retweeted

Lech Mazur

@LechMazur

over 1 year ago

All the data is here: https://t.co/rqhZOZfJl4 The top three best stories overall are now from R1 (linked there).

1

11

2

6

1K

sirius_ctrl retweeted

leloy!

@leloykun

over 1 year ago

(Linear) Attention Mechanisms as Test-Time Regression By now, you've probably already heard of linear attention, in-context learning, test-time scaling, etc... Here, I'll discuss: 1. The unifying framework that ties them all together; 2. How to derive different linear attention variants from scratch; and 3. How to parallelize training linear attention models

leloykun's tweet photo. (Linear) Attention Mechanisms as Test-Time Regression

By now, you've probably already heard of linear attention, in-context learning, test-time scaling, etc...

Here, I'll discuss:

1. The unifying framework that ties them all together;
2. How to derive different linear attention variants from scratch; and
3. How to parallelize training linear attention models

6

426

77

568

75K

sirius_ctrl retweeted

Riccardo Grazzi @riccardograzzi

over 1 year ago

LLMs can now track states, finally matching this cat! And we prove it. But how? 🧵👇 1/ Paper: https://t.co/aKvrqYtkWh with @julien_siems @jkhfranke @ZelaArber @FrankRHutter @MPontil

2

58

17

33

8K

sirius_ctrl retweeted

Jack Parker-Holder

@jparkerholder

over 1 year ago

Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of consistent worlds, playable for up to a minute. We believe Genie 2 could unlock the next wave of capabilities for embodied agents 🧠.

276

3K

456

1K

3M

Xinyao Niu @sirius_ctrl

about 2 years ago

Perhaps this is what translation tasks look like in the new era, and maybe this is the charm of large-scale pre-training. Perhaps, in the context of large-scale synthetic data, the critical factor is the underlying rule governing data generation, rather than the content itself?

sirius_ctrl's tweet photo. Perhaps this is what translation tasks look like in the new era, and maybe this is the charm of large-scale pre-training. Perhaps, in the context of large-scale synthetic data, the critical factor is the underlying rule governing data generation, rather than the content itself? https://t.co/dHtkHEXyuS

0

31

sirius_ctrl retweeted

Bonnie Li

@bonniesjli

about 2 years ago

How do LLMs scale to million token context window? Ring Attention is a nice trick to parallelize long sequence across devices and rotate them in a ring with zero overhead scaling. In our new blog, we cover the tricks behind this magic. It looks like this (1/5🧵)

bonniesjli's tweet photo. How do LLMs scale to million token context window? Ring Attention is a nice trick to parallelize long sequence across devices and rotate them in a ring with zero overhead scaling.

In our new blog, we cover the tricks behind this magic. It looks like this (1/5🧵) https://t.co/AD1jPS0HjB

13

674

114

708

102K

Xinyao Niu @sirius_ctrl

over 2 years ago

@FireworksAI_HQ May I ask how you get the FP8 version of mixtral?

0

1

0

467

sirius_ctrl retweeted

Horace He

@cHHillee

over 2 years ago

Two additions to gpt-fast this week. The first one is an optimization to tensor-parallelism added by @foofoobuggy which improves our TP perf by 20-50%. This gives us 200 => 330 tok/s for Llama-7B fp16 and 64 => 91 tok/s for Llama-70B int4 with *no* speculative decoding. (1/4)

cHHillee's tweet photo. Two additions to gpt-fast this week. The first one is an optimization to tensor-parallelism added by @foofoobuggy which improves our TP perf by 20-50%.

This gives us 200 => 330 tok/s for Llama-7B fp16 and 64 => 91 tok/s for Llama-70B int4 with *no* speculative decoding.

(1/4) https://t.co/P2ctlJQWC5

9

386

45

207

107K

sirius_ctrl retweeted

Simon Boehm @Si_Boehm

over 3 years ago

I wrote the most naive CUDA matrix multiply and iteratively optimised it to ~80% of cuBLAS performance: https://t.co/zHBigTKhCB

14

1K

165

780

249K

sirius_ctrl retweeted

Dmitry Tuzoff @Tuzoff

over 2 years ago

Very cool dataset BTW, recently, I tried ChatGPT 4 on Caribou Contest (online Canadian math Olympiad) tasks for Grade 2 and Grade 7-8 (I photographed each problem from screen) To my surprise, it solved only 1 out of 8 for Grade 1 and 9 out of 14 for Grade 7-8. The problem is that practically all of the problems for lower grades are visual and almost all of the problems for upper grades are textual. Turns out GPT4 is great at OCR but poorer at precise object classification and abstracting higher-level concepts from images Can be a nice way to test multi-modal models. I’ll be happy if someone develops this further

0

2

0

1K

sirius_ctrl retweeted

Jim Fan

@DrJimFan

over 2 years ago

Instead of taking OAI's merger offer, Anthropic launched major updates for Claude 2.1🎉. I think the below chart is the most interesting: this is how all LLM papers that claim "long context" should report: error rates on "Beginning", "Middle", and "End". There're a bunch of papers making wild claims, all the way up to "1B context tokens". Here's a friendly reminder that the 30-year-old LSTM literally supports infinite context. It's a meaningless number unless you show detailed evaluations at different locations in the context. LLMs tend to be "Lost in the Middle", i.e. struggle to remember and reason on information at the middle section of the context window: https://t.co/8Lcr1NNf9h Claude 2.1 also claims "2x hallucination" - please take this with a BIG grain of salt. A while back, I expressed my concerns about Vectara's benchmarking protocol. Same concerns apply here too. The trivial solution to achieve 0% hallucination is simply refusing to answer every query. One cannot claim victory here without a careful Safety vs Usefulness analysis. How many questions that Claude used to answer correctly are now rejected? In any case, kudos to Dario & Anthropic team on assuring us a solid alternative during turmoil! 🩷https://t.co/ZvXXfAPAKD

DrJimFan's tweet photo. Instead of taking OAI's merger offer, Anthropic launched major updates for Claude 2.1🎉. I think the below chart is the most interesting: this is how all LLM papers that claim "long context" should report: error rates on "Beginning", "Middle", and "End".

There're a bunch of papers making wild claims, all the way up to "1B context tokens". Here's a friendly reminder that the 30-year-old LSTM literally supports infinite context. It's a meaningless number unless you show detailed evaluations at different locations in the context. LLMs tend to be "Lost in the Middle", i.e. struggle to remember and reason on information at the middle section of the context window: https://t.co/8Lcr1NNf9h

Claude 2.1 also claims "2x hallucination" - please take this with a BIG grain of salt. A while back, I expressed my concerns about Vectara's benchmarking protocol. Same concerns apply here too.

The trivial solution to achieve 0% hallucination is simply refusing to answer every query. One cannot claim victory here without a careful Safety vs Usefulness analysis. How many questions that Claude used to answer correctly are now rejected?

In any case, kudos to Dario & Anthropic team on assuring us a solid alternative during turmoil! 🩷https://t.co/ZvXXfAPAKD

21

607

91

150

134K

Xinyao Niu

@sirius_ctrl

Last Seen Users on Sotwe

Trends for you

Most Popular Users