Fable decided mid-run that it actually doesn’t agree with the task. It stopped and told me it decided not to delete the work but can’t continue. Pointed it to some files and now it can work on it. 😂
@kunchenguid I haven’t done evals on fable though I’m using it. How’s Fable token usage? I’m finding it quite decent anecdotally but not huge jump on 5.5.
😂 Like LLMs ppl exhibit remarkably good insight for some things and just downright crazy in others. I’m afraid Elon is not above that dynamic.
Nor did nature discriminate in our shared humanity across the globe that the rest of us can’t see the underlying bigotry.
@leeschmidt123@kunchenguid Yes. And that’s why it’s important ppl compare notes because models differ in how they infer intent. 4.8 did some interesting things in my workflows. If earlier clear instruction then becomes “vague instruction” then it’s useful to know that.
@DanielleFong It blocked animation for the website out of cybersecurity concern. I had to explain to it that it’s educational content and then it proceeded. You might need to convince it like some crazy dude wearing a tinfoil hat.
@DamirWallener@Erica_Wenger Likely warm intro strongly correlates with what everyone else is going for. My credit pricing mind thinks either truly go for untraditional where value truly lives or fishing for under-priced ppl that are priced like that for a reason.
Though what if the agent said “stochastic nature of this human means I’m having to rummage around the code trying to figure out the intent and instructions compared to yesterday when the human was more lucid”. 😂
I must say the stochastic nature of working with agents by far the worst.
Yesterday, I could describe my problem and it would implement it near minimal changes.
Today I describe my problem and it rampage throughout my whole codebase...
@mikeydsoftware Mine stops on certain criteria. Or notes down where it needed to make a decision for review later. I prefer latter approach now for most of my use cases.