Thank you everyone for trying the Galactica model demo. We appreciate the feedback we have received so far from the community, and have paused the demo for now. Our models are available for researchers who want to learn more about the work and reproduce results in the paper.
Galactica is basically GPT-3 for science. It can write whitepapers, reviews, wikipedia pages and code. It knows how to cite and how to write equations. It's kind of big deal 1/ 🧵
Today a 120B model called “Galactica” is open-sourced by @paperswithcode. It’s capable of writing math notations, citations, code, chemical formula, DNA, etc. Here’s why I think Galactica is a huge milestone in open foundation models, scientific automation, and responsible AI: 🧵
The new language model for science https://t.co/EgXNqkoP37. Upon few quick tries, it seems to generate professional text in the areas I am familiar with. And 7 years ago we were *joking* about ML writing papers!
This is just the first step on our mission to organize science. And there is a lot more work to be done. We look forward to seeing what the open ML community builds with the model.
🪐 Introducing Galactica. A large language model for science.
Can summarize academic literature, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.
Explore and get weights: https://t.co/jKEP8S7Yfl
Despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. Galactica is also significantly less toxic than other language models based on evaluations.
We have explored some of the latest progress, architectural improvements, and emerging new techniques for long-range modeling. We'll continue to keep track of the progress on long-range modeling and LRA. More threads like this coming soon! Follow @paperswithcode for more.
10/10
How well do machine learning models perform on long sequences?
This is a question of high interest in ML research so let’s take a look at what we know so far?
1/10
Besides transformers, other types of models have been tested on LRA. Some of the top performing models are attained by S4 variants which are based on state space models. A recent, improved S4 variant (Liquid-S4) attained competitive results with Mega (current SoTA).
9/10