Roee Hendel

6 months ago

1/4 🚀Introducing Jamba2, a memory-efficient open source model family built for total enterprise reliability and steerability.

AI21Labs's tweet photo. 1/4 🚀Introducing Jamba2, a memory-efficient open source model family built for total enterprise reliability and steerability. https://t.co/OoitXAELoU

1

26

9

2

3K

RoeeHendel retweeted

9 months ago

1/5 Releasing Jamba Reasoning 3B under Apache 2.0: Hybrid SSM-Transformer architecture that tops accuracy & speed across record context lengths. e.g. 3-5X faster than Llama 3.2 3B and Qwen3 4B at 32K tokens.

AI21Labs's tweet photo. 1/5 Releasing Jamba Reasoning 3B under Apache 2.0: Hybrid SSM-Transformer architecture that tops accuracy & speed across record context lengths. e.g. 3-5X faster than Llama 3.2 3B and Qwen3 4B at 32K tokens. https://t.co/2fgZ5d9OMY

5

202

28

78

410K

RoeeHendel retweeted

AVN nominated adult performer. Content creator. https://t.co/qCJnXTaJH5

almost 2 years ago

We released the #Jamba 1.5 open model family: - 256K #contextwindow - Up to 2.5X faster on #longcontext in its size class - Native support for structured JSON output, function calling, digesting doc objects & generating citations https://t.co/tebBJW09c5 #AI #LLM #AI21Jamba

AI21Labs's tweet photo. We released the #Jamba 1.5 open model family:

- 256K #contextwindow
- Up to 2.5X faster on #longcontext in its size class
- Native support for structured JSON output, function calling, digesting doc objects & generating citations

https://t.co/tebBJW09c5

#AI #LLM #AI21Jamba https://t.co/hdl62pRsZq

105

415

96

127

165K

Who to follow

denismartireal

@denismartireal

Adi Haviv

@adihaviv

CS Ph.D. Candidate at @TelAvivUni. Researching #NLProc and Computer Vision.

Itay Itzhak

@Itay_itzhak_

NLProc, deep learning, and machine learning. Ph.D. student @TechnionLive and @HebrewU

RoeeHendel retweeted

about 2 years ago

Introducing Jamba, our groundbreaking SSM-Transformer open model! As the first production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU. 🥂Meet Jamba https://t.co/f2XZFOQbxh 🔨Build on @huggingface

AI21Labs's tweet photo. Introducing Jamba, our groundbreaking SSM-Transformer open model!

As the first production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU.

🥂Meet Jamba https://t.co/f2XZFOQbxh

🔨Build on @huggingface https://t.co/WGOjJMiOoE

35

1K

240

473

333K

RoeeHendel retweeted

AK

@_akhaliq

about 2 years ago

AI21 Labs presents Jamba SSM-Transformer open model production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU.

_akhaliq's tweet photo. AI21 Labs presents Jamba

SSM-Transformer open model

production-grade model based on Mamba architecture, Jamba achieves an unprecedented 3X throughput and fits 140K context on a single GPU. https://t.co/DcLp2dg5uJ

8

700

107

302

86K

over 2 years ago

@yalishandi @_akhaliq We haven't explored this in depth, but it's a promising direction. Studies linking ICL to SGD could offer clues. In related tests, ICL sometimes acts like an empirical risk minimizer, fitting random examples, while other times ignoring them. Further exploration is needed.

0

1

0

18

RoeeHendel retweeted

AK

@_akhaliq

over 2 years ago

In-Context Learning Creates Task Vectors paper page: https://t.co/RdQiiHkA3I In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

_akhaliq's tweet photo. In-Context Learning Creates Task Vectors

paper page: https://t.co/RdQiiHkA3I

In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set S to find a best-fitting function f(x) in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query x and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing S into a single task vector theta(S) and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.

11

563

110

364

102K

over 2 years ago

@_akhaliq 🙏 @_akhaliq

0

2

0

595

about 3 years ago

@LouisKirschAI Great work! Did you release the code?

0

70

over 3 years ago

@kushal_tirumala Is it possible that the different baseline values simply arise from the fact that larger models have an overall better language modeling capability, rather than memorization? It would be interesting to check the memorization value of the "special batch" prior to training on it.

0

36