Guys & girls!
Exactly a year ago I promised over 15 bugs in win32k.
You're welcome to read and find out about my biggest research so far: #win32k#SmashTheRef bug class - https://t.co/niPACKBBLd
Check out the paper and the POCs, there are some crazy stuff going on. Promise!
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem.
As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)!
I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work.
It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results?
88ms => 1.5ms
150K allocs => ~500 allocs
Incredible right? Nope.
My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path.
This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput.
The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity.
Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
Fork your dependencies, trim them to only your use case, never update unless it breaks for your users. I’ve been vocal about this for 10+ years. I’ve always said that updating is way riskier than latent bugs (which can be tracked and CVEs monitored).
If you are updating a dependency, it’s on you to analyze every single commit in the full transitive set of dependencies. If you dont see anything compelling, dont update!
I remember at HashiCorp once in awhile an engineer would try to update a dep or replace a DIY lib with an external one and id always ask “show me the commit we need.” Dont update for the sake of it.
Feeling pretty swell about this mentality with all the supply chain attacks happening.
@barzik חדמש. היתה קפיצה בכל המודלים בתחום הזה. כל הקפיצה הזאת ביחד אכן משמעותית. האם זה ישבור את כדור הארץ - לא. כמה משמעותית? מספיק כדי שיתחילו לשנות סדרי עבודה באנטרפרייז - זה ביג דיל.
Cloudflare's security team spent the last few weeks testing Anthropic's Mythos against fifty of our own repositories. What we learned about offensive AI, why faster patching is the wrong reaction, and what the architecture around vulnerabilities has to look like next. https://t.co/RSrRtIhgaV
You have no experience.
You’ve never started a company.
You’ve never had a full time job.
Nike is going to kill you.
You’re a kid.
You don’t have technical skills.
You shouldn’t build hardware.
Apple is going to kill you.
You can’t build hardware.
You can’t measure heart rate non-invasively.
Athletes don’t care about recovery.
Under Armour is going to kill you.
It won’t be accurate.
You don’t listen.
You’re an ineffective leader.
You can’t recruit great talent.
You’re going to have to pay every athlete.
You can’t measure sleep non-invasively.
It’s too expensive to research.
Athletes are a small market.
The product costs too much to make.
The product costs too much to sell.
Your valuation is too high.
Consumers aren’t going to want it.
Hardware is too hard.
You should measure steps.
Fitbit is going to kill you.
You can’t build a marketing engine.
You can’t raise enough money.
You need a real CEO.
Google is going to kill you.
You can’t be a subscription.
You can’t build a brand.
You can’t do consumer in Boston.
Your valuation is too high.
You shouldn’t make accessories.
You shouldn’t make apparel.
Lululemon is going to kill you.
You can’t predict Covid.
Stay in your niche.
You are going to run out of money.
You can’t build a health platform.
Amazon is going to kill you.
You can’t measure blood pressure.
You can’t get medical approvals.
The market is too small.
You don’t understand AI.
The market is too competitive.
It won’t work internationally.
The supply chain is too complicated.
You can’t build an AI.
You can’t raise enough money.
It’s too competitive.
Healthcare isn’t going to want it.
…
Just keep going ✌️
It’s time big bounties will be paid according to density and not only difficulty. If nobody found a vuln in said software for x months then it should go higher etc.
@joshanon Na
The problem is language models are fed with garbage like news, social media and forums data where there’s no etiquette, no moral compass, people lie etc.
If you take Japanese stuff up to 200 years ago and feed it to a model you’d get something superior in behavior, no betrayal
We evaluated @Tenzai_Labs AI hacker across six major CTF competitions designed for humans.
Result: Top 1% performance, outperforming 125,000+ human hackers across different domains - web hacking, ai hacking, low level system hacking.
We wanted to see what @Tenzai_Labs's hacking agent is really capable of in the most complicated and competitive environments, where to excel, one needs to solve increasingly difficult challenges.
The results we achieved surprised even me. This is incredible evidence of what AI agents with the right harness can do and I expect it to only get better from now.
https://t.co/cOXArrXbHN