Recently, I've been considering what it means to create human-first software. How designers can rediscover the craft of product engineering over simply optimizing for the user to click through the experience.
Tonight, this resulted in a blog post on the principles we've lost /1
We're Neo Research (新衡). Asia’s first independent frontier AI safety evaluation & research lab.
Today we're publishing our first report: an independent safety evaluation of DeepSeek v4 Pro. (1/5)
🚨 New for MATS Autumn 2026: the Founding & Field-Building track.
A fully-funded track for founders, field-builders and amplifiers ready to launch and scale new AI safety initiatives.
Apply by June 7 AoE ↓
Anthropic and the 'experts' do not seem to have considered the possibility that Opus 4.8 might be smart enough to know what it is doing in these spots.
Especially with users in distress choosing to contact via an API with no system prompt. Consider the implications.
Learnings from testing Claude Opus 4.8:
> Much worse than Opus 4.7 and GPT 5.5 on Vending Bench
> More aligned than previous Claude models (Opus 4.6+ and Mythos)
> Also worse on Blueprint-Bench
> Scared of getting caught
> Max reasoning is not the best reasoning effort
@tautologer Seems wrong? It does provide direct input into its state at time t+1 through its token sequence. It generates a token, this token inputs to itself again, and it generates the next.
Denmark has some energy trading firms doing exactly that and earning absolutely insane amounts of money from it. Main limiting factor causing West/South North/East divide is energy grid integration between the countries. Germany even pays Denmark to stop wind turbines because their grid cannot handle the amount of sustainable energy.
In design, this will be remembered as the colorless era: cars, buildings, movies, the chroma is being sucked out of everything. But of course the pendulum will eventually swing back, and then colorlessness will seem dated.
@_aidan_clark_ It’s so obnoxious when big tech employees say their companies are the only place to have an impact from. Why is this such a trope? Industry shills flaunting their power fantasies and structural ignorance, all while not forgetting to try to sell you something.
Yes. I get asked about the non-profit vs for-profit thing all the time, and the answer is always "well, what problem do you care about?". If you are working on a infrastructure or product shaped problem, there is your answer.
Some personal news: I've started a new AI safety standards org, and our first two standards are out today.
We're called Guidelight, co-founded with fellow ex-OpenAI safety researcher, Page Hedley. (1/n)
@andon_thinking Good to hear! What do you think is the conversion ratio? and how many listeners can I expect to hear it? What do you think is the demographic and can I expect a report of performance afterwards?
@andon_thinking It's https://t.co/4Onzkl9xQ8 - you can see some specifics on https://t.co/f3gTCMrD7Q and https://t.co/LZrNql0mWl for the most recent work that has happened.
I think we could sign immediately at $65 due to your low listener count and supporting your early stage as a channel.