The Wonderland CTF was a blast!
Huge congrats to all the teams, especially “STACK TOO DEEP”, “NADA ESPECIAL” and “SECSEE”.
Oh, also: https://t.co/WHMt1f36Mk 👉👈
The goal is to create the lowest resource circuit that computes elliptic curve point addition correctly on 99% of inputs. The benchmark is running 9024 random points that sampled deterministically based on a circuits hash. This is fine because Schor's algorithm fails at a rate similar to its subcircuits: it works approximately.
The probability that a 99% correct circuit makes it through this check is 0.99^9024 ~= 2^-130. Tiny.
So grinding is actually a valid (and seemingly effective) strategy! It's a essentially a way for a 99.9xxx% correct circuit to tell the benchmark that it wants a new set of points to be tested on in order to prove that it's correctness exceeds 99%.
This is the exact same test Google thoughtfully designed into their ZK proof, read here for more details:
https://t.co/Ttatgf9H3d
https://t.co/nU7BwCUZym
Open agents just broke Google’s closed quantum benchmark in public.
ECDSA. Google’s quantum circuit. Post-quantum future.
@sreeramkannan and @bbuddha_xyz from @eigenlabs will be joining @MTSlive to break it down shortly.
My MLSys keynote on AI writing systems code got more interest than I expected. The recording will take a while, so in the finest tradition of AI labs sharing blog posts, we’re starting the Core Automation Blog with this one https://t.co/h4uSOyrglf
@ar0cket1 Okay , so a larger pre-train. When backtesting on larger codebases , i did see improvements in performance with tts, my intuition was that the pretrain had enough data and tts helps access that ?
Opus this, GPT that bro, it all depends on your task at hand and whether it is in the model's distribution. If it is, it's great; if it isn't, it's just okay. There is no overall better model; it depends on your task.
Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software.
It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans.
https://t.co/NQ7IfEtYk7
I built this RL env for vulnerability detection tasks on smartcontracts using @hud_evals last night: https://t.co/tyDjX5tgkC. You can improve open source model performance with GRPO and measure with evals!
All the big labs have their ai security products at this point. @OpenAI has advark aka codex security. @AnthropicAI just announced Claude code security. The next frontier is RFT. More on that soon!