@0xKyon I can’t DM but would be happy to try — I do 50% of my coding on my phone. My main drive is Termius + CC + custom scripts for better parallelization. Happy to try your app! @0xKyon
I’ve built evaluation systems for multiple AI products in production, yet I learned a tremendous amount from the AI Evaluations course by @sh_reya & @HamelHusain The systematic approach and exceptional course reader are fantastic resources. Can’t wait for their upcoming book!
6/6 Curious to learn more about how these frameworks compare and which one might fit your workflow? Check out my full review Agentic Workflows: Best Framework: https://t.co/a8XslpQo6S
1/6 Feeling overwhelmed by the explosion of AI agent frameworks?
I was too after a year of building mostly custom code, I finally decided to explore the 6 most talked-about ones for building agentic workflows.
5/6 For example, I found that PydanticAI is my clear champion - it’s lightweight, type-safe, and just clicks with my brain. LangGraph is a safe choice for anyone and I liked its templates for common patterns like ReAct & retrieval agents.
@jiayi_pirate Thank you for sharing!! I played around with the setup for Qwen 0.5bn - I think it can learn the reasoning! Just the learning needs to be a bit slower. Some results: https://t.co/oOvFKnokOi (experiment logs included in References)
@santiviquez Maybe LLM Engineer’s handbook by Paul Iustzin and M. Labonne?
I read all 3 and this one was my favourite - more technical, more practical (if you work mostly with text).
@simonw 2/2 The ranking would even change with temperature (see the magicoder results here: https://t.co/iJteDTTBQj ) and in re-runs (within the SE).
My conclusion was that the exact answer is task-specific. Practically, I get the biggest Qx that “runs”, always the _K_M variants.
@HamelHusain@corbtt 100% agreed. I just did it for the “last mile” today to create a few pivots, % share, heatmaps, etc. It’s hard to replicate as fast via code.
I even use Julia + macros, so there is much less to write than Python/pandas — even Cursor doesn’t help, prompt is longer than the code!
@HamelHusain My question would be around tips for evaluating a multi-turn exchange (as opposed to just one answer).
What if that exchange has some hidden thinking tokens by the LLM — in what situations do we want to show them to the judge (not to bias it)?
@swyx@suno_ai_ Would you mind sharing the styles that you tend to use and any tricks you found? It feels like your intros keep getting better!
I had to look up this post several times to be able to play 1 and 2 again… 😅
@eugeneyan@dan_s_becker@jxnlco Looks great!
Separately, there might be a verb missing around the bolded unbeatable guarantee (Hormozi, huh🙃) "or we'll personally refund you and ??? free consultation."