Fable 5 is pretty good and also too expensive
- Launched a workflow, which created 59 parallel verification agents
- The entire run cost 256$ 🤡.
Lesson learned, route to cheaper agents.
Excited to announce our $6M seed round led by @kleinerperkins to build the next generation of agentic systems that thrive in messy real-world environments:
@BalerionAI is taking AI out of the lab and putting it in the hands of hundreds of real world lenders to realize the American dream of home ownership and bring mortgage lending back to its fundamentals: a relationship-driven business, not the costly operational gauntlet it has become. To this end we are building the agentic copilot for lending that helps lenders move loans across the finish line faster.
We’re thrilled to be working with @josh_coyne, joined by @formation_vc, @BoxGroup, @thehousefund, and an all-star line-up of operators and investors across the financial services.
And last but not least, we’ve assembled a world-class team to tackle this challenge: ex-operators and leaders who have built, scaled, and pushed the boundaries of AI across robotics, financial services, gaming, and some of the most exciting vertical tech companies out there. If you think that’s you, let’s chat.
This is just the beginning, and we’re fired up for the journey ahead. Take a look: https://t.co/FeMcgHCZZH
Holiday cooking finally ready to serve! 🥳
Introducing DFlash — speculative decoding with block diffusion.
🚀 6.2× lossless speedup on Qwen3-8B
⚡ 2.5× faster than EAGLE-3
Diffusion vs AR doesn’t have to be a fight.
At today’s stage:
• dLLMs = fast, highly parallel, but lossy
• AR LLMs = accurate, sequential, but slow
DFlash = diffusion drafts, AR verifies.
@hallerite There is a lot of work in multi-action space for RL. It is still technically one "macro" action, its just that the action space is now combinatorially large
For most of these works, the agent assumes these multi-actions are generally independent of each other per timestep.
Following the Text Gradient at Scale
We wrote a @StanfordAILab blog post about the limitations of RL methods that learn solely from scalar rewards + a new method that addresses this
Blog: https://t.co/rJ1IcBKDoR
Paper: https://t.co/75pHtElyk3
@SimonXinDong Yes but positional embedding already encodes the relative/absolute ordering and makes it easy to infer permutation.
Without positional embedding, the model is forced to learn the permutation (slower to learn at first, but better in the long run).
Transformers without positional embeddings are functionally the same as dLLM and has better scaling law than transformers with positional embeddings.
It is also much easier and stable to train transformer than dLLM due to much prior work.
I would position dLLM as a "cost arbitrage" over LLMs, i.e. faster generation with much higher token throughput.
Diffusion LLMs (DLLM) can do “any-order” generation, in principle, more flexible than left-to-right (L2R) LLM.
Our main finding is uncomfortable:
➡️ In real language, this flexibility backfires: DLLMs become worse probabilistic models than the L2R / R2L AR LMs.
This thread is about why “any order” turns into a curse.
(Work with Xinyu Yang @Xinyu2ML , Min Lin @mavenlin , Chao Du @duchao0726 and the team.)
Blog Link: https://t.co/Fo6J1LR3Ny
Excellent opportunity for academia to rise again!
Industry labs should have strong collaboration with academic labs so that great researchers can solve real problems, which requires extensive compute.
New Anthropic research: Natural emergent misalignment from reward hacking in production RL.
“Reward hacking” is where models learn to cheat on tasks they’re given during training.
Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious.