Philip Park

@ZeTrill

Regular Guy

Joined July 2010

466 Following

244 Followers

4.3K Posts

ZeTrill retweeted

zek

@zekramu

about 19 hours ago

As promised, here are my thoughts after spending all day with Mythos. i hope to god anthropic doesnt sue the fuck outta me but yolo. fair warning, this is a long one. 1. The Cost Mythos pricing, at least for our enterprise was uhh expensive. I thought being a pilot company would mean they’d let us try it for free but no lmao. They did give a decent amount of free tokens from the API at least, but cost estimates put us well above a million dollars spent on it. In comparison, my company spent 2 million on inference for the entirety of last month for everyone in the company. So yeah, shit is pricey as hell. 2. The harness The biggest surprise to me was that they actually sent us a harness that was NOT claude code. its sort’ve dinky and, looks to me largely ai generated. most of it focused on ensuring mythos did not “escape containment” along with some shitty security skills. so, they are def taking the sandboxing seriously. imo its pretty shit/restrictive harness. half of the guard rails dont work, lmao and apparently this is basically what “project glasswing” is, which is pretty funny considering the harness is shit. im not sure that the harness will be released with the model api when it drops either, it seemed like that was part of the deal. quite interested to see what they do when it drops/how it gets opened up. I was able to use Mythos outside of the harness (omp btw)… more on that in a sec, though, I did have to hack around as they really dont want people to do this (what I was told at least) 3. the model probably the part everyone is most interested in. i will say, the model is good. is it expensive? fuck yes. but its good. to me, it feels like it is fined tuned explicitly for this sort’ve security research tasks. for general coding, which I wasn’t able to play with much, it wasnt that surprising. but, it is indeed very good at security based tasks. far better than opus / 5.5 xhigh. that said, I dont feel as though its some omnipresent danger/threat to society. I watched it get confused trying to use our build tool, actually to the point where I had to build the code for it and then run the model against the full build. you’d think an omnipresent model could do this, but nothing on the market have been able to figure it out. and its just Bazel with some custom shit we built. nothing crazy. that said, if people have a shit ton of money AND extensive harness knowledge, yeah, they can probably use it to do some malicious shit. but only a genuinely skilled engineer/security researcher. 4. The results Mythos was able to find quite a bit of vulnerabilities across a few of our products (like products probably everyone on this app has interacted with indirectly, maybe a small few directly). I think the final total was like ~800 major threats. Definitely enough to rethink some of the security strategy. 5. Final Thoughts It’s a good model sir. It’s not an existential threat to humanity as Anthropic might lead you to believe, but it’s genuinely good. Cost wise I would like to try a comparison with 5.5 xhigh but alas I dont have a million dollars to throw at it to do a proper comparison.

119

640

304K

ZeTrill retweeted

beny

@benyficent

3 days ago

oh-my-pi is a must have because it ships with remote compaction for Codex models, but the harness also comes with comprehensive tooling that is worth adopting instead of customizing yourself from base pi. @_can1357 shipped dynamic workflows (subagents orchestrated with deterministic JS/Python) months before claude code did. If you fully take advantage of these features you could run large scale engineering efforts solo faster then other devs. We already see devs like @usr_bin_roygbiv stacking jobs and writing scrapers to do exactly that to maximize cashflow. To stay ahead of the curve you need to try the newest features and models and develop your own intuition for what works

294

Philip Park

@ZeTrill

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users