While I'm waiting for Fable to come back, I started a new side project. Mostly because I want to get better at agentic coding, but also because it's an idea I've always wanted to build. So far we have a spec and a landing page, so good start. Check it out
https://t.co/fkPm6iBdP1
People who say that having tests makes AI able to handle any software complexity make no sense to me. Where did the tests come from? How do you know they're adequate? Are you strictly working on known evals and code migrations? How have you not just moved the problem?
@phosphenq Agents are already really good at tasks where you can easily define and produce ground truth. This mechanism doesn't solve that problem. The challenge is still making the task verifiable.
@LLMJunky@Continuum_Code@zeeg 1. They could not have built them without stealing data. If them stealing data is fine, then distillation is fine.
2. Can you explain the strategy here cause this doesn't make much sense?
@signulll I don't know if i've been conditioned, but my brain now turns off as soon as I am aware I am reading something written by AI. It registers as not worth reading even if the ideas were from a person.
One hand I agree that vague notions of "quality" are not helpful and too subjective. On the other hand, good code means something. A good system is built so it can scale in complexity, and so other developers can jump in and contribute to it without needing to understand the whole system and with low risk of breaking unrelated pieces. Problem is that this is really hard to prove without evidence from it existing in the wild.
@neerajjj6785 Microservices solves exactly one technical problem, deployment management for distributed systems.
It solves a lot of organizational problems though. I don't care how many users you have, I want to know how many developers are working on the project and how often they turn over
@daniel_mac8 kind of important to know what cheating means in this context. It means it either faked the passing tests or copied it from the internet. "Discarded" is arguably OK but still unreliable as the AI was probably more likely to cheat at tasks it would've failed at.
@sama This all sounds wrong and counter to OpenAI's stated mission, regardless of who picks the winners and losers. If safety is a problem, YOU could hire people to redteam and test. What you're doing is having people pay you for a product you explicitly aren't sure if it's safe.