To top off this week, there's now a paper about DEMON published on @arxiv.
A lot of great insights there - especially for researchers from the AI audio-gen domain.
Here's the video covering some basics, and the paper is in the tweet below.
Next week, on Tuesday (Jun 2), we are hosting our webinar with @RyanOnTheInside, author of the paper, so you can ask any questions.
Built on ACE-Step, this lets you perform AI-generated music with synth hardware, almost like playing an instrument
Hardware knobs shape DiT's initial noise, and the music is generated in real time through streaming
A new kind of music is emerging
Live music is the future
I just released this open source project built on @ACEStep_Music. DEMON: Diffusion Engine for Musical Orchestrated Noise. It lets you play ACEStep like a musical instrument, remixing songs and loops with feedback that approaches real-time.
Its essentially StreamDiffusion but instead of Stable Diffusion it is ACEStep1.5, and instead of images it is full songs. It runs on 30/40/5090. Built with @DaydreamLiveAI team, testing, and building the demo. We are hosting it if you want to try it without installing. For full details, links, and writeup please see the pinned project page.
🚀 DiffSynth-Studio now supports training DiT-based musical models! To kick things off, we’re dropping 4 Instrument-Enhancement LoRAs for ACE-Step-v1.5-XL:https://t.co/kSUoQAQoN2
Differential LoRA Training to boost target instruments with high fidelity:🎸 Guitar | 🎹 Piano | 🥁 Drums | 🎼 Accompaniment🎧
Listen to the Drum demo below & try it out for yourself: https://t.co/XRPspqJla5
Can we transform offline audio diffusion into real-time streaming interactive instruments?
Yes!
Presenting Live Music Diffusion Models: a new paradigm for taking your favorite open models into live performance, right on your own laptop! 🎵🎵
🧵
Stable Audio 3, explained in 5 figures.
It’s a family of open-weight models for generating instrumental music and sound effects.
The models are fast, support editing, and are trained on licensed and Creative Commons audio.
👾 https://t.co/e8qhZpVv2w
🏋️♂️https://t.co/aRLGCXGBNr
Stable Audio 3, explained in 5 figures.
It’s a family of open-weight models for generating instrumental music and sound effects.
The models are fast, support editing, and are trained on licensed and Creative Commons audio.
👾 https://t.co/e8qhZpVv2w
🏋️♂️https://t.co/aRLGCXGBNr
Khala 1.0 just dropped — a music generation model from the Central Conservatory of Music in Beijing. Paper, code, weights, and demo all open-sourced.
I gave a talk there recently on ACE-Step and got an early look at Khala. Excited to see it officially out. Open-source music gen is thriving.
💻 https://t.co/iYQt9e1mMy
📝 https://t.co/fqwqtvHfP1
🎧 https://t.co/XAxqLEYGft
We released Diffusers 0.38.0, and it's packed with new pipelines and several library-related improvements 🔥
A bunch of new pipelines, including audio 🎼
* Ace-Step 1.5
* LongCat-AudioDiT
* Ernie-Image
And more!
Next up, we added support for:
* Flash Attention 4
* Loading with FlashPack
* Ring Anything as a new backend for context parallelism
Last but not least, we added an example on how to profile a DiffusionPipeline and potentially improve its performance.
Enjoy 🧨
ACE‑Step 1.5 is a community‑owned model. It is the fastest‑growing open‑source music model and the best local music generation model for co‑building an ecosystem. I personally have drawn a great many inspiring ideas from it. Thank you all sincerely.
https://t.co/DgXiWSEi4W
It depends on your actual needs and aesthetic preferences. I don't think the model is incapable of producing this style; rather, there's a mismatch between how you describe it and how the model interprets it. Simply input "ambient", use the format tool to rewrite the caption to get a detailed description, then have Claude or ChatGPT reference that format to rewrite what you want.
@ElevenLabs for anyone interested:
1,800 credits per minute when creating an AI track.
On the standard $5 plan that = approx. 7 songs at 3 minutes per song.
First of all, this model isn't for everyone. There's plenty of documentation and community tutorials on GitHub, and we're now in an era where coding isn't required at all. It has never been a barrier for those who are eager to explore.
The ComfyUI workflow has its limitations, which means many capabilities cannot be properly supported. The choice of workflow tool matters a lot. Even when I suggest new features to the ComfyUI team, getting them implemented is extremely difficult.
There are 20 companies that doing passable furniture music that is about 80% of most folks Spotify playlists.
This tech is open source and local and free in 13 months.
There is no moat for any music platforms.
None.
AI music with audience of 1.