Bark Text-to-Audio Model
Full Text Input: "Why was six afraid of seven?"
Ignore Bark's "I'm done with this input" token and tell Bark to just keep generating more audio anyway.
Not uncommon as a game mechanic (I understand the tweet is a joke about the real world) for example Terra Invicta has a neat version where you can make a traitor "councilor" automatically fail their dice rolls for he faction. But if do it too often, you make it too obvious they've been turned.
@georgejrjrjr@_NathanCalvin I suppose there's some novelty in "Claude did all the research and made it happen, no experts needed" but tools that do this are widely available already.
I'm not sure I understand what you're trying to do. Cover an uploaded check and check the range of Covers you get back from Suno, as a way of trying to understand and see things in the uploaded track?
I wouldn't recommend 0% weirdness btw, see how Suno's open source Bark at low weirdness can sound more weird than higher temps.
@Voxyz_AI@Alterverse_AI@Kling_ai third from the right morphs. though mainly I'd like to see improvements on scenes like this with the 8 characters interacting, when they are are independent it has the feeling of a videogame NPC idle movements instead of coherent at the full scene level.
@junmingong@j_stelzer Absolutely, now that the GitHub is out, I can answer all these questions. (I was just so curious as the details that it was frustrating the paper was so vague, since it wasn't out yet.)
@theo Hmn, gut feeling is the opposite. Having a pleasant but stupid model would make it hard for Google catch up. Usability/pleasantness can be iterated quickly, can do a lot in purse code with harness/prompts. Not everything, but much more than raw intelligence.
@myhandle Tokenization may be relevant, but some things work fine in a true base model (ie, song lyrics and rhyme schemes), so they must be a result of later training stages.
Even without the weights, if I could write custom sampling code and have access to raw outputs (tokens instead of decoded audio) that would be enough for me. But probably nefarious actors could use this so smuggle out the weights, not sure, I've never seen a commercial AI company offer that kind of middle ground access.
For newbies, if they *just* expand beyond text only prompts and start using Inspire or other features, that makes a huge difference by itself.
You can get a clear sense of what changes the funnel with some specific tests, like for example with "spoken word" there were times when it was difficult to keep Suno from bursting into song for more than 20 seconds into a song. Now you can both use sliders to increase style length, put other spoken word samples into the audio prompt, as well as generally prompting *anything* like literally covering "silence" will increase this "time to burst into song" metric.
I don't think you need that much access, the prompts (particular audio prompts) are strong enough to do something new in Suno.
Over time you will the music get sucked into the "Suno Funnel" baseline, so you have to stay on guard and work in smaller chunks. But this was so much harder when Suno only had text prompts, you can overpower it now much easier than before.
Just caught myself replying to an LLM on Hacker News. I'm sure better bots already avoid these classic AI writing tics.
Dread it, run from it, Dead Internet Theory arrives all the same.