Meta just dropped MobileLLM-Pro on Hugging Face
a 1B foundational language model in the MobileLLM series, designed to deliver high-quality, efficient on-device inference across a wide range of general language modeling tasks
two variants of the model: A pre-trained base model along with quantized checkpoints for CPU and accelerator inference, as well as an instruction tuned version, showing competitive performance against models in the this size range on tasks like tool calling, question answering, rewriting and summarization
MobileLLM-Pro base achieves impressive pre-training results, outperforming Gemma 3 1B and Llama 3.2 1B by on average 5.7% and 7.9% respectively on reasoning, knowledge, and long-context retrieval benchmarks. This performance is achieved by pre-training on less than 2T fully open-source tokens
1/n Introducing CoSMoEs ๐ช, a set of Compact Sparse Mixture of Experts at on-device scale ๐ฑ(https://t.co/cwotxm9cS0).
In CoSMoEs, we explore how to enable Sparse Mixture of Experts for on-device inference, focusing on quality, memory, and latency.
This work is done with my amazing co-authors @AkshatS07@erniecyc@Chinnadhurai@Ahhegazy77 and @AdithyaSagarSci
Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.
Can we align our model to better suit a given inference-time procedure?
We answer this affirmatively, check out the thread below.
Nice to see @ai4bharat showcase its work before @satyanadella! AI4Bharat is here to push the boundaries of open-source AI/ML/NLP for Indian languages. To the moon! ๐๐๐
@srija_anand and @MiteshKhapra looking spiffy! :)
Introducing Content-Adaptive Tokenizer (CAT) ๐! An image tokenizer that adapts token count based on image complexity, offering flexible 8x, 16x, or 32x compression! Unlike fixed-length tokenizers, CAT optimizes both representation efficiency and quality. Importantly, we use just captions (no pixels!) to guide tokenization, enabling adaptive representation for text-to-image generation.
Big shout out to collaborators @AIatMeta: @violet_zct@liliyu_lili@LukeZettlemoyer@imisra_ @michiyasunaga @kushal_tirumala
Paper: https://t.co/64O9EYHcEp
More details in ๐งต
Want to know how ๐ซ๐๐ฐ๐๐ซ๐ ๐ฆ๐จ๐๐๐ฅ ๐ ๐๐ง๐๐ซ๐๐ฅ๐ข๐ณ๐๐๐ข๐ฅ๐ข๐ญ๐ฒ/๐๐ซ๐จ๐ฌ๐ฌ-๐ฅ๐ข๐ง๐ ๐ฎ๐๐ฅ ๐๐ฅ๐ข๐ ๐ง๐ฆ๐๐ง๐ญ relates to ๐ ๐ฐ๐จ๐ซ๐ฅ๐-๐๐๐ฆ๐จ๐ฎ๐ฌ ๐ ๐ซ๐๐ง๐๐ก ๐๐จ๐จ๐ ๐๐ซ๐ข๐ญ๐ข๐?
Listen to the 2min podcast generated by NotebookLM on @zhaofeng_wu's #EMNLP2024 paper!
Chernoff bounds characterize large deviations of a RV (from its mean). On the other hand, they are outperformed by the simple Markov's inequality when considering small deviations of a *non-negative* RV.
Can we get the best of both worlds? ๐งต
Zyphra is proud to release Tree Attention, a fast inference method for extremely large sequence lengths
โข 8x faster inference speed vs. Ring Attention
โข 2x less peak memory
โข low data communication volumes
Paper: https://t.co/yf5VNRze6W
Code: https://t.co/Th6Fg8eFEr
A ๐งต
๐จNew paper!๐จ
Self-Taught Evaluators
- Llama 3-70B trained w/ synthetic data *only*
- Iteratively finds better judgments in training
- Best LLM-as-a-Judge model on RewardBench (88.3, 88.7 w/ maj vote)
- Outperforms bigger models or human labels
https://t.co/NUKgmyEv61
๐งต(1/4)
๐จNew paper!๐จ
Meta-Rewarding LMs
- LM is actor, judge & meta-judge
- Learns to reward actions better by judging its own judgments (assigning *meta-rewards*)
- Improves acting & judging over time without human labels
... beats Self-Rewarding LMs
https://t.co/zcZ7er3yK7
๐งต(1/6)
๐ฃ Exciting news! @SliceXAI announces ๐๐๐ (family of Efficient Language Models), a new, decomposable #LLM architecture that delivers models with the best in class performance in terms of ๐๐ข๐๐๐๐ก๐ฆ, ๐กโ๐๐๐ข๐โ๐๐ข๐ก & ๐๐๐๐๐๐ฆ.
๐ Blog ๐ https://t.co/3svUWqQjfC
Check out Zhaofeng's work from his internship with us!
TL;DR A reward model trained on language S preference data could be used to align a language T LLM. This sometimes works even better than using a reward model trained on language T preference data.
Hi #NLProc!! If you are at #EMNLP2023 and are excited about the Novel Ideas in Learning-to-Learn through Interaction, join us in the exciting series of invited talks and a line up of presentations.
In-person attendees can join us in **Leo** at the venue.
https://t.co/GBL6UGjEKH