After about 2 years, we are proud to release Gecko, an efficient architecture that improves upon Megalodon, with capability of efficiently and inherently processing sequences with unlimited context length.
One of the most important idea in Gecko is Adaptive Working Memory(AWM), implemented using a linear attention mechanism with a position-aware online softmax activation. Notably, AWM globally compresses information into memory, rather than discarding historical information through forgetting.
In a controlled head-to-head comparison with Llama2 and Megalodon, Gecko achieves better performance in the scale of 7B and 2T training tokens. Gecko achieves 1.68 training loss, vs. 1.67 of Llama2-13B, with half number of parameters on 2T tokens.
Paper: https://t.co/hLJZ9VPnea
Code: https://t.co/UPNhjlvNq3
CATransformers is a carbon-driven neural architecture and system hardware co-design framework. Using CATransformers, we discover greener CLIP models that achieve an average of 9.1% reduction potential in total lifecycle carbon emissions while maintaining accuracy (or increasing accuracy) and latency.
This research is the first to look into carbon-driven neural architecture and system hardware co-design. It is enabled by a first-of-its-kind architectural carbon modeling tool – ACT, which we developed at FAIR.
Check out:
Our paper ➡️ https://t.co/uwyBQ1M2WF;
code repository ➡️ https://t.co/TCTiqMZ5B2;
and the additional carbon design tools and research artifacts in Sustainable AI ➡️https://t.co/S0VH1DgVFS
Meta presents Is Flash Attention Stable?
Finds that Flash Attention sees roughly an order of magnitude more numeric deviation as compared to Baseline Attention at BF16 when measured during an isolated forward pass
https://t.co/zXtDpQ8Box
Excited to present our latest research: 🦘LayerSkip!
https://t.co/D8wQNH1VRM
We run a subset of earlier layers of an LLM, & verify/correct using the remaining layers, to achieve upto 🚀2.16x speedup on Llama 7B
@AkshatS07@bilgeacun@bwasti@Ahhegazy77@BeidiChen@CarolejeanWu
10 years of FAIR.
10 years of advancing the state of the art in AI through open research.
We're celebrating the 10th anniversary of Meta's Fundamental AI Research team and continuing that legacy by sharing our work on three exciting new research projects today.
Details below 🧵
Today, Meta researchers together with @MLCommons working group, are launching DataPerf, the first platform for building data & data-centric AI algorithm leaderboards.
We're excited for how DataPerf will help to push the data-centric AI field forward ⬇️
The future of #ML is data-centric! That’s why we built #DataPerf, the leaderboard for data. It is the 1st platform and community for data-centric competitions. Together we will break through data limitations and unlock better ML for the world https://t.co/GAKiFAKS6E
@SashaMTL While I agree with the premise of this tweet (i.e. DC location does really matter), I think that LLaMA authors are being 'generous' by assuming they are emitting US avg CO2. All of Meta's datacenters are powered by renewable energy: https://t.co/ITAUNSVXku
Big earthquake in Southeast Turkey, populated area and at night—people will be caught asleep at home. Preliminary reports are M 7.8.
Early photos already showed pancaked bulletin. #DEPREMOLDU is the hashtag. (Or #deprem). Almost certainly needs global rescue team mobilization.
The 1999 Izmit earthquake killed ~18,000. Magnitude 7.4.
Now, *two* earthquakes in Turkey, ten hours apart: M 7.8 and 7.7.
Richter is a log scale. Each one is ~2.5 times bigger and ~four times stronger. Fault break seems to be hundreds of kilometers.
All populated areas.😢
Excited to share our ASPLOS'23 paper on carbon-aware datacenters. We study how renewable energy from diverse sources, energy storage, and workload scheduling can balance trade-offs between embodied and operational carbon. Congrats to @bilgeacun and team!
https://t.co/pFIIpb35Bl
Just when I was thinking about using Twitter more actively to write about research stuff, this place turned into a circus town. I guess I'll just continue not using it much as before. 🤐😅