@maxvonhippel@Defected_Saint I'm with you @maxvonhippel! I've hired a lot in in high-growth tech companies, it's better for both parties to be as honest and upfront as possible, and to cut the process short if it becomes evident it does not align. Win-win, less problems in the future, less time wasted.
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://t.co/NQ7IfEtYk7
@aarondotdev Did you read the paper?
It studied junior devs *learning* a new library. The ones who used AI to understand concepts scored just as well as the no-AI group. The ones who copy-pasted blindly didn't. The conclusion isn't "AI is bad", it's that you need "think while you use AI".
@erroneous_input@ns123abc@sama The pentagon agreed to the terms of Anthropic (i.e "no use for fully autonomous or mass surveillance of Americans") but they added the redline "except bulk data analysis" which is the same as saying they can use it for anything, because any AI system does bulk data analysis...
@vdub12@ns123abc@sama AI is not a military weapon!
AI could *support* a much bigger system to create fully autonomous weapons... But clearly Anthropic does not have access to all these other components.
@vdub12@JoelStransky@ns123abc@sama If you think the AI is a military weapons system you are very wrong. The AI can support the creation of autonomous military weapons, but is far from being a weapon on its own.
It would be the same as saying a bullet is a weapon if you have no gun.
It will significantly increase my opinion of @Anthropic if they do not back down, and honorably eat the consequences.
(For those who are not aware, so far they have been maintaining the two red lines of "no fully autonomous weapons" and "no mass surveillance of Americans". Actually a very conservative and limited posture, it's not even anti-military.
IMO fully autonomous weapons and mass privacy violation are two things we all want less of, so in my ideal world anyone working on those things gets access to the same open-weights LLMs as everyone else, and exactly nothing on top of that. Of course we won't get anywhere close to that world, but if we get even 10% closer to that world that's good, and if we get 10% further that's bad)
CC @DarioAmodei
https://t.co/RSzHnXqWFL
Excited to share an example of the many projects we're driving @SnorkelAI around enterprise-specific environments and benchmarks - including detail on:
- Domain-specific, enterprise env & tool development
- Persona simulation for multi-turn eval
- Nuanced rubrics
& more!
Excited to share a preview of @SnorkelAI 's new Agentic Coding benchmark - testing models on realistic, multi-step software engineering tasks in fully sandboxed execution environments across a calibrated range of task domains and difficulties, inspired by our work with the @terminalbench team!
With a top pass@5 score of 58% (Opus 4.5) - this new benchmark challenges the notion running wild on X right now that LLMs have "solved" software engineering.
And, with both unit tests and both final-output and trajectory-level rubrics, it's already giving us & partners insights into where coding agents fail. Excited to share more here shortly!
Link to benchmark & release post in 🧵👇
@nicolasmelo@Yuchenj_UW Anthropic valuation is 350Bn after the last Azure/Nvidia investment, trending to 500Bn according to some sources. Their growth YoY is also higher.
@emollick I find Opus 4.5 superior as it creates high quality Google Slides / PowerPoint that I can modify directly.
Having to update prompts to iterate is a nightmare.
Don't say: "vLLM is fastest" or "Ollama is easiest."
Wrong framing.
The real answer isn't about features - it's about matching serving philosophy to your constraints.
Local prototype vs. production scale vs. complex workflows = completely different frameworks.