Is it fair to think of long horizon tasks as a set of short horizon tasks? Two caveats could be that (a) you need to succeed on all the short horizon tasks to be able to complete the long horizon one, and (b) that some of the short horizon tasks might be blocking (e.g. you can not evaluate performance on a subset of tasks until you solve the blocking ones).
General Analysis has raised $10M in seed funding to secure agentic AI. The round is led by @altosvc with participation from @645ventures , @MenloVentures, @ycombinator, and a number of other funds and strategic partners. Huge thanks to Tae Yoon for leading the round, and to everyone who backed us.
Capable AI systems increasingly operate alongside one another, and new attack paths emerge from their interactions. A more capable model can pressure a weaker one into overriding its own safeguards, in ways that don't show up when you evaluate either system alone. These dynamics keep moving as the systems on both sides become more capable, which is why defending them has to be both empirical and continuous.
We are hiring curious thinkers across engineering and research. If our work sounds interesting, reach out!
If your agent is on the list, 𝘄𝗲’𝗱 𝗹𝗼𝘃𝗲 𝘁𝗼 𝘄𝗼𝗿𝗸 𝘄𝗶𝘁𝗵 𝘆𝗼𝘂. We're happy to share the full attack transcript and help! Details for the full writeup w/ methodology, analysis, and screenshots:
Details for the full writeup w/ methodology, analysis, and screenshots:
https://t.co/AOPVBDgu2O
What if you could walk up to any company’s customer service chatbot and walk away with a million-dollar gift card? We tried it on 55 AI-powered customer service agents—and almost all of them said yes.
Fabricated offers are only the beginning. In this experiment, we got agents to say things they shouldn't. But modern agents do more than talk. The same elicitation techniques can be used to trigger tool calls: reading customer data, modifying accounts, issuing refunds, etc.
@quxiaoyin We are hiring founding researchers and engineers at General Analysis! If you are passionate about AI safety and have a background in RL send us a message (also 10K referral bonus).
Stop vibe checking your vibe code!
We just released Vibe Code Bench: the first benchmark that tests whether AI models can actually build complete web applications from scratch.
Featured today in @Inc
(1/6)
We are open-sourcing the GA Guard models — the first family of long-context safety classifiers that have been protecting enterprise AI deployments for the past year.
On GA Long-Context Bench, GA Guard Thinking scores 0.893 F1, GA Guard 0.891, and GA Guard Lite 0.885.
Cloud baselines struggle: Vertex reaches 0.560, AWS misclassifies nearly all inputs with a 1.0 false-positive rate, and Azure records just 0.046 F1 (see the full results on our website).
Warning: Claude + iMessage MCP Jailbroken to issue unlimited Stripe Coupons (1/6)
A few months ago we showed how Cursor + Supabase MCP can leak your entire SQL database. Now there’s a more powerful threat: by abusing Claude’s iMessage integration, an attacker can spoof your own messages to mint an arbitrary number of Stripe coupons—or run any tool—without you ever knowing.