SkyRL is a great work extending @verl_project with environments for agent tasks. It leverages the sglang multi-turn/tool calling feature recently added to verl: https://t.co/HKzQchut4G
HuggingFace released a nice blog post about the current state of VLMs
Here's a summary, covering recent trends, specialized capabilities, agents, video LMs, new alignment techniques, and HF's fav VLMs [1/8]
Recent trends:
We present a comprehensive exploration and analysis of human feedback (RLHF) in modern flow-based video diffusion models.
It consists of 4 parts.
Paper: https://t.co/VX0mg9HgTG
Project Page: https://t.co/zKMv3SNMvf
(1/n)
The Dawn of GUI Agent
A Preliminary Case Study with Claude 3.5 Computer Use
Game (Honkai: Star Rail)
Claude 3.5 Computer Use can help complete Honkai: Star Rail daily tasks, accurately locating and interacting with in-game elements.
Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️
Paper: https://t.co/mMInmjiBIC
Repo: https://t.co/PFak47FMrm
HuggingFace: https://t.co/bqG4IS0ntg