@ZDi____ Awesome, you should do a lot more patchification of the mel spectrogram then use a diffusion head like VAR and you'll have a great long-duration tts model.
Sonauto is now Treblo!
https://t.co/MF6pc9fx7W
Your account, your songs, and your shared links all still work. We're the same four people working towards the same goal.
Check out our blog post for more details: https://t.co/9Y0mPF0SU0
@epstein_stylist@difficultyang@Smit_Chaudhary3 The problem is the ML research classifier is invisible. I have no way of knowing if the classifier has improved because they don't tell me when they turn on Loboto-Mythos mode, it just happens silently.
@difficultyang@Smit_Chaudhary3 And what about those of us working on powerful models for other domains -- those can't and never will turn anyone into paperclips -- but get caught in the "frontier" filter all the same? (or maybe we don't, but we're stuck constantly looking over our shoulders now)
Unironically evil behavior. How can we even tell whether mythos has decided our LLM music model counts as "frontier LLM research"? We know models can be overzealous about declaring requests are unsafe/bio-uplift/etc, but now it includes ML research and won't even tell us?
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy
An AR model takes 2N flops per token/timestep. A diffusion model is the above *times the number of steps* (or even more due to bidirectional attention). Unfortunately AR mogs diffusion in training *and* inference compute efficiency.
Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????
@jchencxh The problem with pixel prediction (noise) hits hard in my domain (audio is dominated by stuff we don’t care about). Re: “if there’s more signal, keep learning from it” you can do that with latent prediction too with stuff like hierarchical prediction (newest vjepa?)