bilge @bilgeacun - Twitter Profile

Xuezhe Ma (Max) @MaxMa1987

5 months ago

After about 2 years, we are proud to release Gecko, an efficient architecture that improves upon Megalodon, with capability of efficiently and inherently processing sequences with unlimited context length. One of the most important idea in Gecko is Adaptive Working Memory(AWM), implemented using a linear attention mechanism with a position-aware online softmax activation. Notably, AWM globally compresses information into memory, rather than discarding historical information through forgetting. In a controlled head-to-head comparison with Llama2 and Megalodon, Gecko achieves better performance in the scale of 7B and 2T training tokens. Gecko achieves 1.68 training loss, vs. 1.67 of Llama2-13B, with half number of parameters on 2T tokens. Paper: https://t.co/hLJZ9VPnea Code: https://t.co/UPNhjlvNq3

MaxMa1987's tweet photo. After about 2 years, we are proud to release Gecko, an efficient architecture that improves upon Megalodon, with capability of efficiently and inherently processing sequences with unlimited context length.

One of the most important idea in Gecko is Adaptive Working Memory(AWM), implemented using a linear attention mechanism with a position-aware online softmax activation. Notably, AWM globally compresses information into memory, rather than discarding historical information through forgetting.

In a controlled head-to-head comparison with Llama2 and Megalodon, Gecko achieves better performance in the scale of 7B and 2T training tokens. Gecko achieves 1.68 training loss, vs. 1.67 of Llama2-13B, with half number of parameters on 2T tokens.

Paper: https://t.co/hLJZ9VPnea
Code: https://t.co/UPNhjlvNq3

3

146

24

87

22K

bilge @bilgeacun

about 1 year ago

😺

AI at Meta

@AIatMeta

about 1 year ago

CATransformers is a carbon-driven neural architecture and system hardware co-design framework. Using CATransformers, we discover greener CLIP models that achieve an average of 9.1% reduction potential in total lifecycle carbon emissions while maintaining accuracy (or increasing accuracy) and latency. This research is the first to look into carbon-driven neural architecture and system hardware co-design. It is enabled by a first-of-its-kind architectural carbon modeling tool – ACT, which we developed at FAIR. Check out: Our paper ➡️ https://t.co/uwyBQ1M2WF; code repository ➡️ https://t.co/TCTiqMZ5B2; and the additional carbon design tools and research artifacts in Sustainable AI ➡️https://t.co/S0VH1DgVFS

20

324

71

80

33K

2

1

0

198

bilgeacun retweeted

Aran Komatsuzaki

@arankomatsuzaki

about 2 years ago

Meta presents Is Flash Attention Stable? Finds that Flash Attention sees roughly an order of magnitude more numeric deviation as compared to Baseline Attention at BF16 when measured during an isolated forward pass https://t.co/zXtDpQ8Box

arankomatsuzaki's tweet photo. Meta presents Is Flash Attention Stable?

Finds that Flash Attention sees roughly an order of magnitude more numeric deviation as compared to Baseline Attention at BF16 when measured during an isolated forward pass

https://t.co/zXtDpQ8Box https://t.co/NA9cdXsv8I

5

308

49

173

60K

bilge @bilgeacun

about 2 years ago

Breaking my X fast to post this. Need ~2x faster LLM inference? Check this out! 👇

Mostafa Elhoushi

@m_elhoushi

about 2 years ago

Excited to present our latest research: 🦘LayerSkip! https://t.co/D8wQNH1VRM We run a subset of earlier layers of an LLM, & verify/correct using the remaining layers, to achieve upto 🚀2.16x speedup on Llama 7B @AkshatS07 @bilgeacun @bwasti @Ahhegazy77 @BeidiChen @CarolejeanWu

m_elhoushi's tweet photo. Excited to present our latest research: 🦘LayerSkip!
https://t.co/D8wQNH1VRM

We run a subset of earlier layers of an LLM, & verify/correct using the remaining layers, to achieve upto 🚀2.16x speedup on Llama 7B

@AkshatS07 @bilgeacun @bwasti @Ahhegazy77 @BeidiChen @CarolejeanWu https://t.co/nPaB3wlHjY

7

106

23

52

26K

0

5

0

530

Who to follow

Parallel Software and Systems Group

@hpc_group

The Parallel Software and Systems Group is a research group @UofMaryland @umdcs, directed by Prof. @bhatele. Research on HPC, Vis. and ML. RTs ≠ endorsements.

Yifeng Ding

@YifengDing_

CS PhD candidate @siebelschool. Research intern @AIatMeta. Towards training code agents. Prev: @AmazonScience @GoogleResearch

bilgeacun retweeted

over 2 years ago

10 years of FAIR. 10 years of advancing the state of the art in AI through open research. We're celebrating the 10th anniversary of Meta's Fundamental AI Research team and continuing that legacy by sharing our work on three exciting new research projects today. Details below 🧵

26

751

152

126

450K

bilgeacun retweeted

AI at Meta

@AIatMeta

about 3 years ago

Today, Meta researchers together with @MLCommons working group, are launching DataPerf, the first platform for building data & data-centric AI algorithm leaderboards. We're excited for how DataPerf will help to push the data-centric AI field forward ⬇️

AIatMeta's tweet photo. Today, Meta researchers together with @MLCommons working group, are launching DataPerf, the first platform for building data & data-centric AI algorithm leaderboards.

We're excited for how DataPerf will help to push the data-centric AI field forward ⬇️

8

102

25

22

28K

bilgeacun retweeted

MLCommons @MLCommons

about 3 years ago

The future of #ML is data-centric! That’s why we built #DataPerf, the leaderboard for data. It is the 1st platform and community for data-centric competitions. Together we will break through data limitations and unlock better ML for the world https://t.co/GAKiFAKS6E

0

25

18

0

11K

bilge @bilgeacun

over 3 years ago

@SashaMTL Same also, including all global operations since 2021 https://t.co/PqeOc1rQeR

0

20

bilge @bilgeacun

over 3 years ago

@msharmavikram @TheRegister @tomshardware @arstechnica @techradar @pcgamer Congrats! Hope to see you at asplos!

1

0

146

bilge @bilgeacun

over 3 years ago

@SashaMTL You can find all of Meta's datacenter locations and the renewable energy projects that power them here: https://t.co/ITAUNSVXku

bilgeacun's tweet photo. @SashaMTL You can find all of Meta's datacenter locations and the renewable energy projects that power them here: https://t.co/ITAUNSVXku https://t.co/WWaVbjFOwZ

0

1

0

40

bilge @bilgeacun

over 3 years ago

@SashaMTL While I agree with the premise of this tweet (i.e. DC location does really matter), I think that LLaMA authors are being 'generous' by assuming they are emitting US avg CO2. All of Meta's datacenters are powered by renewable energy: https://t.co/ITAUNSVXku

bilgeacun's tweet photo. @SashaMTL While I agree with the premise of this tweet (i.e. DC location does really matter), I think that LLaMA authors are being 'generous' by assuming they are emitting US avg CO2. All of Meta's datacenters are powered by renewable energy: https://t.co/ITAUNSVXku https://t.co/xDFHyU0Ult

1

0

55

bilgeacun retweeted

zeynep tufekci

@zeynep

over 3 years ago

Big earthquake in Southeast Turkey, populated area and at night—people will be caught asleep at home. Preliminary reports are M 7.8. Early photos already showed pancaked bulletin. #DEPREMOLDU is the hashtag. (Or #deprem). Almost certainly needs global rescue team mobilization.

23

1K

577

51

1M

bilgeacun retweeted

zeynep tufekci

@zeynep

over 3 years ago

The 1999 Izmit earthquake killed ~18,000. Magnitude 7.4. Now, *two* earthquakes in Turkey, ten hours apart: M 7.8 and 7.7. Richter is a log scale. Each one is ~2.5 times bigger and ~four times stronger. Fault break seems to be hundreds of kilometers. All populated areas.😢

zeynep's tweet photo. The 1999 Izmit earthquake killed ~18,000. Magnitude 7.4.

Now, *two* earthquakes in Turkey, ten hours apart: M 7.8 and 7.7.

Richter is a log scale. Each one is ~2.5 times bigger and ~four times stronger. Fault break seems to be hundreds of kilometers.

All populated areas.😢 https://t.co/z6QwVr8Kwz

3

110

55

9

47K

bilgeacun retweeted

Benjamin C Lee @Lee_BenjaminC

over 3 years ago

Excited to share our ASPLOS'23 paper on carbon-aware datacenters. We study how renewable energy from diverse sources, energy storage, and workload scheduling can balance trade-offs between embodied and operational carbon. Congrats to @bilgeacun and team! https://t.co/pFIIpb35Bl

1

26

4

0

2K

bilge @bilgeacun

over 3 years ago

@noman_bashir I will probably do the same in Mastodon. :D What server are you on there?

1

0

53

bilge @bilgeacun

over 3 years ago

Just when I was thinking about using Twitter more actively to write about research stuff, this place turned into a circus town. I guess I'll just continue not using it much as before. 🤐😅

1

5

0

628

bilge @bilgeacun

over 3 years ago

@msharmavikram Thanks Vikram! ☺️

0

bilge @bilgeacun

over 3 years ago

I'm looking for PhD research interns for summer 2023 to work in ML efficiency and sustainability areas. Reach out to me with your CV if interested!

0

10

2

3

0

bilge @bilgeacun

over 3 years ago

(I was trying to stay off of Twitter amid the chaos but had to post this..)

0

1

0

bilge @bilgeacun

over 3 years ago

Sad news. CS community lost a legend today. I was privileged to meet and talk with Fred Brooks @HLForum in 2018. #mythicalmanmonth

bilgeacun's tweet photo. Sad news. CS community lost a legend today.

I was privileged to meet and talk with Fred Brooks @HLForum in 2018.

#mythicalmanmonth https://t.co/j02kPbI08U

1

2

0

bilge

@bilgeacun

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users