Wild idea in this paper 🤯
How might we store knowledge affordably yet comprehensively? Memory³ proposes an intriguing method - compressing factual data separately. Introduces a third form of memory in addition to the implicit knowledge stored in model parameters and the short-term working memory used during inference (context key-values).
👨🔧 LLMs struggle with inefficient knowledge storage and retrieval, leading to high training and inference costs. The paper aims to address this by introducing a more efficient memory format.
📌 Memory3 introduces explicit memory as a third memory format for LLMs, alongside model parameters (implicit memory) and context key-values (working memory). This explicit memory is implemented as sparse attention key-values, allowing for more efficient knowledge storage and retrieval.
📌 Defines a memory hierarchy for LLMs: plain text (RAG) → explicit memory → model parameters. As you move up this hierarchy, write cost increases while read cost decreases. The goal is to optimize knowledge placement across this hierarchy based on usage frequency.
📌 Memory3's architecture involves converting reference texts into explicit memories before inference. During inference, these memories are retrieved and integrated into self-attention layers. This design allows for smaller model size while maintaining performance.
📌 The explicit memory format uses intense compression to save space. It selects only the first half of attention layers as memory layers, uses grouped query attention to reduce key-value heads, and selects only 8 out of 128 tokens for each key-value head based on attention weights.
📌 The training process involves a two-stage approach: a warmup stage without explicit memory, followed by a continual train stage with explicit memory. This approach was necessary as starting with explicit memory from the beginning rendered the memories useless.
📌 Introduces a "memory circuitry theory" to formalize the concept of knowledge in LLMs. It defines knowledge as circuits (equivalence classes of subgraphs) in the computation graph, categorizing them as specific or abstract knowledge.
📌 The Memory3 model achieved better performance than larger models and RAG models on various benchmarks, while maintaining higher decoding speed. It showed particular improvements in factuality and reduced hallucination.
🥁 Llama3 is out 🥁
8B and 70B models available today.
8k context length.
Trained with 15 trillion tokens on a custom-built 24k GPU cluster.
Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases.
More versions are coming over the next few months.
https://t.co/EkU9aIHdZE
A vos agendas ! Rendez-vous lundi 29 pour discuter UI avec une présentation d'Avalonia UI par @oaz 🤩
Inscriptions sur #meetup : https://t.co/L2MdfRqVFI
#UI#OpenSource#dotnet#AvaloniaUI
🚀@imihalcea plonge dans le futur de l'IA avec nous! 🤖 Sera-t-il éclipsé par une IA super intelligente en tant que speaker ? 🌟 Ne ratez pas le live pour percer ce mystère! #IA#AGI 😜🔍
L'IA peut-elle penser comme un philosophe.
Aujourd'hui, non. En cela je suis d'accord avec @Enthoven_R.
Mais y parviendra-t-elle demain? C'est très probable.
* Language is low bandwidth: less than 12 bytes/second. A person can read 270 words/minutes, or 4.5 words/second, which is 12 bytes/s (assuming 2 bytes per token and 0.75 words per token). A modern LLM is typically trained with 1x10^13 two-byte tokens, which is 2x10^13 bytes. This would take about 100,000 years for a person to read (at 12 hours a day).
* Vision is much higher bandwidth: about 20MB/s. Each of the two optical nerves has 1 million nerve fibers, each carrying about 10 bytes per second. A 4 year-old child has been awake a total 16,000 hours, which translates into 1x10^15 bytes.
In other words:
- The data bandwidth of visual perception is roughly 16 million times higher than the data bandwidth of written (or spoken) language.
- In a mere 4 years, a child has seen 50 times more data than the biggest LLMs trained on all the text publicly available on the internet.
This tells us three things:
1. Yes, text is redundant, and visual signals in the optical nerves are even more redundant (despite being 100x compressed versions of the photoreceptor outputs in the retina). But redundancy in data is *precisely* what we need for Self-Supervised Learning to capture the structure of the data. The more redundancy, the better for SSL.
2. Most of human knowledge (and almost all of animal knowledge) comes from our sensory experience of the physical world. Language is the icing on the cake. We need the cake to support the icing.
3. There is *absolutely no way in hell* we will ever reach human-level AI without getting machines to learn from high-bandwidth sensory inputs, such as vision.
Yes, humans can get smart without vision, even pretty smart without vision and audition. But not without touch. Touch is pretty high bandwidth, too.
🎉 Le prochain meetup aura lieu mardi 19 mars, et on se retrouve pour deux sessions : IA 🤖 et Monads 🥳 !Vous pouvez réserver votre soirée ✨ Détails et inscriptions à venir très vite.
@JMDeruty@imihalcea Cela nous permettra de naviguer entre le respect de la précision historique et l'aspiration à une représentation plus inclusive et diversifiée, sans pour autant compromettre l'un ou l'autre.
@JMDeruty@imihalcea Il est donc impératif de rester critiques envers les modèles et leur utilisation, tout en continuant à éduquer sur leurs potentiels risques et biais.
Like @AndrewYNg, I have observed a definite shift in the prevalent discourse about AI at Davos:
- Few people still talk about existential risk, and few people believe that current technology, even scaled up, will present an existential risk.
- Everyone agrees that open source AI platforms are a good thing for cultural and linguistic diversity, local sovereignty, education, science, and businesses.
- Everyone agrees that regulating AI-powered products can be useful in certain areas (health, transportation, etc).
- The debate is still on for whether AI research and development and open source AI platforms should be regulated.
- Many people are worried about a new flood of AI-powered political disinformation. Industry-wide standards for content authentication are needed.
- AI has become the most talked-about topic.