I'm working on a Flux dev based model that can relight a photo conditioned on time (eg. 6AM, 7AM ) without changing the background unlike ic-light and LBM model
LTX-2.3 OmniCine V1 LoRA
- Anatomy fix
- Director controls:
- Better lip-sync and facial nuance + it finally stops burnt-in subtitles from ruining the video.
- Objects and characters don't warp when things get fast or chaotic.
- Handles 2D Anime, 3D CGI, photorealism.
https://t.co/i2eNhXLck8
Khala 1.0 just dropped — a music generation model from the Central Conservatory of Music in Beijing. Paper, code, weights, and demo all open-sourced.
I gave a talk there recently on ACE-Step and got an early look at Khala. Excited to see it officially out. Open-source music gen is thriving.
💻 https://t.co/iYQt9e1mMy
📝 https://t.co/fqwqtvHfP1
🎧 https://t.co/XAxqLEYGft
I've been working on a bigger AI VFX pipeline and needed audio-driven vid2vid lip sync for @LTXStudio LTX 2.3. Couldn't find a workflow for it, so I built this one in @ComfyUI.
More examples, free guide and free workflows below! 👇
We open-sourced the code and model for UniRelight! 🎉
Given an input video and a target lighting configuration, our method jointly predicts a relit video and its corresponding albedo.
Code: https://t.co/4zF94saWvo
Model: https://t.co/d8i66UyvhU
Wan2.2 again.
SwiftI2V: Efficient 2K I2V video gen with 21GB VRAM.
- uses 200x less GPU-time than CineScale
- exact image fidelity
- decoupled processing
no models yet.
https://t.co/UmfRrwq3IY
Another test with the LTX 2.3 vid2vid lip sync workflow. I've been finding the inpainting mode works more reliably overall, so I'd actually recommend turning it on even for close-ups.
Yet another amazing-lookingIC lora for LTX 2.3 lands on the scene.
Its v2v and text prompted. Does editing, removal, replacement and restyle.
Personally, I would REALLY like to know if it can handle a first frame as a reference. I'm guessing now though.
https://t.co/8Ymjmd0KQl
https://t.co/T2HPMrbtpy
new video model: 15B-parameter, 40-layer Transformer that jointly processes text, video, and audio via self-attention only. No cross-attention, no multi-stream complexity. Achieves 80.0% win rate vs Ovi 1.1 and 60.9% vs LTX 2.3
1/2 Qwen3.5 is here. The next frontier of Native Multimodal Agents is open. 🚀
We are thrilled to release Qwen3.5-397B-A17B, our flagship open-weight vision-language model. Built for the future of coding, reasoning, and seamless multimodal interaction.
Key Highlights:
Inference Efficiency: A massive 397B total parameters, but only 17B active—delivering flagship power at a fraction of the cost.
Hybrid Architecture: Innovative Gated Delta Networks (Linear Attention) + Sparse MoE for extreme speed.
True Multimodality: Exceptional performance across GUI interaction, video comprehension, and agentic workflows.
Global Scale: Qwen3.5 now supports over 200 languages.
Empowering developers and enterprises to build smarter, faster, and more versatile AI agents.
OpenCode + MLX + Qwen3.5-397B-A17B-4bit.
Video is 8x, but the goal is showing that It works!
This is something unimaginable just few months ago.
MLX Team is pushing like crazy and M5 Ultra will do the rest 🚀
Capybara? 14B model for T2V, T2I, TV2V, TI2I.
- based on HunyuanVideo1.5;
- byt5-small, Glyph-SDXL-v2, SigLIP;
- 480p-1080p; 16.7GB model, 5GB VAE..
mostly for video editing.
https://t.co/N34iJ4gC0K
Self-Refining Video Sampling: inference-time method using a video generator as its own refiner to correct physics and motion.
no retraining needed; scores >70% human preference; is validated on Wan2.2 & Cosmos.
https://t.co/NGdxcTUNeX