@llllvvuu You're right, compute_delta is imprecise on where to cut, it should trim to the last EOS before subtracting (otherwise the <|im_end|>\n separator gets dropped on ChatML templates). We'll tighten the post. TRL already handles this correctly.
OpenEnv already ships 🚢 with a ready-to-deploy RLM environment on free HF Spaces
Drop "Attention Is All You Need", write code that spawns parallel LLM calls → ✅ answer in 4.2s
Run GRPO (TRL) → model learns to write that search strategy itself
👀@lateinteraction@a1zhang
@willccbb we have recently added these methods in TRL if you want to try them out https://t.co/AaNqLtHlDv https://t.co/AaNqLtHlDv and also https://t.co/ZLyrEoRg8I
Earlier this month, Apple introduced Simple Self-Distillation: a fine-tuning method that improves models on coding tasks just by sampling from the model and training on its own outputs with plain cross-entropy
and… it's already supported in TRL, built by @krasul. you can really feel the pace of development in the team 🐎
paper by @onloglogn, @richard_baihe,
@UnderGroundJeg, Navdeep Jaitly, @trebolloc, @YizheZhangNLP
at Apple 🍎
how it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed
you can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://t.co/zizfISD6bq
or benchmark a checkpoint with the eval script:
https://t.co/mKlafTyKSe
one neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train × T_eval, so a broad band of configs works well. even very noisy samples still help
want to dig deeper?
paper: https://t.co/aj1ZAcr8Mw
trainer docs: https://t.co/TNVz93kZi9
Today, we released PEFT v0.19.0 and it's a big one. Not only did we add 9 new PEFT methods, the release also contains a bunch of improvements to make PEFT more useful. Check the thread for details:
France is about to pass a law punishing support for the genocide in Palestine! 🇫🇷🇵🇸
just kidding. it’s actually a proposal to restrict criticism of Israel, in the so-called country of human rights and free speech.
@SandrineRunel, je vous appelle à voter contre la loi Yadan.
check out this new notebook by @krasul on TimesFM 2.5, Google's time series foundation model which is now supported in transformers
zero-shot forecasting, quantile predictions, LoRA fine-tuning, and forecasting with exogenous covariates
https://t.co/aUKP813nIw
Good news! Ulysses Sequence Parallelism from the Snowflake AI Research and the Deepspeed teams has been integrated into @huggingface Trainer, Accelerate and TRL
For extensive details please see this writeup:
https://t.co/2xDWUk8p3V
Thanks a lot to @krasul for helping make it happen. Also the others in the HF team who helped with integration.
@m_sirovatka@jackminong feel free to check the liger-kernels, we have added support for a lot of RL losses there also this open PR: https://t.co/CKnAcEqjiT