Work is becoming a compression problem.
The important question is: which bits still need to come from you?
That set gets smaller as models get better.
Find those bits.
Push the rest into the model.
@mgratzer I wonder if we can use skills as a guided template for human learning more broadly. Skills are like a Neo downloading kung fu for AI, but what if it could also be the same thing for humans with AI as the tutor
@fchollet True for a subset of SaaS (mission critical, infra heavy, etc.). But much of SaaS = encoding best practices in a GUI workflow sold as a subscription. What happens when 95% of best practices can be encoded into skills with 0 defensibility?
@LakshyAAAgrawal@gepa_ai Underappreciated work. Do you find rubric grading effective even for X/100 type scores vs 0/1? Is there a concern that the scores are uncalibrated and you discard βgoodβ solutions
@abcampbell Bloomberg provides an officially supported Python API for terminal users (https://t.co/m8AId3lIjg). Caveat is data has to stay on your computer. Workbench just acts like a code-assist tool to run BQL/BDP/BDH queries on demand. Hence desktop app only. Can't run this on the cloud.
@GavinSBaker Both work similarly behind the scenes: an agentic loop w/ tool use. For Claude, the tools hook into the official Microsoft add-ins API for 3P devs. Copilot probably has privileged access to native APIs.
My Q is whether Microsoft will nerf 3P APIs to gain an edge over time.
@corbtt Interesting - intuitively you are capitalizing on the generation/verification gap and specifically making verification even easier by framing as a comparative problem. Does this help for domains where you have robust verification like math/coding?
@natolambert Even original v3 was distilling from both R1 and applied RL (though maybe not direct RLVR?). One apparent difference is maybe the lack of <thinking></thinking> tokens, but even that line blurs if the final responses are growing longer.
@natolambert@benthompson Fundamental problem is misaligned incentives and thus potential erosion of user trust. You see this in the tension for optimizing user experience vs. monetization with Google - do you put the most monetizable links vs. the most relevant links at the top?