"But here is what we found when we tested: We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug." https://t.co/yBTiiMq1Xy
this is actually insane
> be tech guy in australia
> adopt cancer riddled rescue dog, months to live
> not_going_to_give_you_up.mp4
> pay $3,000 to sequence her tumor DNA
> feed it to ChatGPT and AlphaFold
> zero background in biology
> identify mutated proteins, match them to drug targets
> design a custom mRNA cancer vaccine from scratch
> genomics professor is “gobsmacked” that some puppy lover did this on his own
> need ethics approval to administer it
> red tape takes longer than designing the vaccine
> 3 months, finally approved
> drive 10 hours to get rosie her first injection
> tumor halves
> coat gets glossy again
> dog is alive and happy
> professor: “if we can do this for a dog, why aren’t we rolling this out to humans?”
one man with a chatbot, and $3,000 just outperformed the entire pharmaceutical discovery pipeline.
we are going to cure so many diseases.
I dont think people realize how good things are going to get
How I think about "security":
The goal is to minimize the divergence between the user's intent, and the actual behavior of the system.
"User experience" can also be defined in this way. Thus, "user experience" and "security" are thus not separate fields. However, "security" focuses on tail risk situations (where downside of divergence is large), and specifically tail risk situations that come about as a result of adversarial behavior.
One thing that becomes immediately obvious from the above definition, is that "perfect security" is impossible. Not because machines are "flawed", or even because humans designing the machines are "flawed", but because "the user's intent" is fundamentally an extremely complex object that the user themselves does not have easy access to.
Suppose the user's intent is "I want to send 1 ETH to Bob". But "Bob" is itself a complicated meatspace entity that cannot be easily mathematically defined. You could "represent" Bob with some public key or hash, but then the possibility that the public key or hash is not actually Bob becomes part of the threat model. The possibility that there is a contentious hard fork, and so the question of which chain represents "ETH" is subjective. In reality, the user has a well-formed picture about these topics, which gets summarized by the umbrella term "common sense", but these things are not easily mathematically defined.
Once you get into more complicated user goals - take, for example, the goal of "preserving the user's privacy" - it becomes even more complicated. Many people intuitively think that encrypting messages is enough, but the reality is that the metadata pattern of who talks to whom, and the timing pattern between messages, etc, can leak a huge amount of information. What is a "trivial" privacy loss, versus a "catastrophic" loss?
If you're familiar with early Yudkowskian thinking about AI safety, and how simply specifying goals robustly is one of the hardest parts of the problem, you will recognize that this is the same problem.
Now, what do "good security solutions" look like?
This applies for:
* Ethereum wallets
* Operating systems
* Formal verification of smart contracts or clients or any computer programs
* Hardware
* ...
The fundamental constraint is: anything that the user can input into the system is fundamentally far too low-complexity to fully encode their intent. I would argue that the common trait of a good solution is: the user is specifying their intention in multiple, overlapping ways, and the system only acts when these specifications are aligned with each other.
Examples:
* Type systems in programming: the programmer first specifies *what the program does* (the code itself), but then also specifies *what "shape" each data structure has at every step of the computation*. If the two diverge, the program fails to compile.
* Formal verification: the programmer specifies what the program does (the code itself), and then also specifies mathematical properties that the program satisfies
* Transaction simulations: the user specifies first what action they want to take, and then clicks "OK" or "Cancel" after seeing a simulation of the onchain consequences of that action
* Post-assertions in transactions: the transaction specifies both the action and its expected effects, and both have to match for the transaction to take effect
* Multisig / social recovery: the user specifies multiple keys that represent their authority
* Spending limits, new-address confirmations, etc: the user specifies first what action they want to take, and then, if that action is "unusual" or "high-risk" in some sense, the user has to re-specify "yes, I know I am doing something unusual / high-risk"
In all cases, the pattern is the same: there is no perfection, there is only risk reduction through redundancy. And you want the different redundant specifications to "approach the user's intent" from different "angles": eg. action, and expected consequences, expected level of significance, economic bound on downside, etc
This way of thinking also hints at the right way to use LLMs. LLMs done right are themselves a simulation of intent. A generic LLM is (among other things) like a "shadow" of the concept of human common sense. A user-fine-tuned LLM is like a "shadow" of that user themselves, and can identify in a more fine-grained way what is normal vs unusual.
LLMs should under no circumstances be relied on as a sole determiner of intent. But they are one "angle" from which a user's intent can be approximated. It's an angle very different from traditional, explicit, ways of encoding intent, and that difference itself maximizes the likelihood that the redundancy will prove useful.
One other corollary is that "security" does NOT mean "make the user do more clicks for everything". Rather, security should mean: it should be easy (if not automated) to do low-risk things, and hard to do dangerous things. Getting this balance right is the challenge.
Spent the last month doing real work with coding agents on a production codebase (~50k LOC added). Yes, the productivity jump is real. No, "build anything from a prompt" is not how this works.
Shared my experience as a good old long-form blog post https://t.co/HkGF8o8mC8
Don't think of LLMs as entities but as simulators. For example, when exploring a topic, don't ask:
"What do you think about xyz"?
There is no "you". Next time try:
"What would be a good group of people to explore xyz? What would they say?"
The LLM can channel/simulate many perspectives but it hasn't "thought about" xyz for a while and over time and formed its own opinions in the way we're used to. If you force it via the use of "you", it will give you something by adopting a personality embedding vector implied by the statistics of its finetuning data and then simulate that. It's fine to do, but there is a lot less mystique to it than I find people naively attribute to "asking an AI".
Databricks held a webinar on agentic AI with 3 large enterprise CIOs / AI specialists.
My punchline takeaway is that agentic AI is very early and very difficult to implement.
This was an opportunity for these customers to boast about their AI progress and no one could cite a tangible example or even an initiative.
Highlights
-Meeting kicked off with “Tell me what AI application you are most excited about” and the response were Waymo and inbox cleanup. Really? Not a single internal, corporate example to highlight?
-Databricks employee was great / honest. Comments suggested Agentic AI is in the early R&D stage – can write code for the agents but still very difficult to test them. Also made a very interesting point: code written on mainframes for a specific process is still running 40-50 years later. AI agent code has a half life of 3-6 months so need to be retrained frequently to keep them functional. [That makes sense to me. Think of how peoples’ job functions evolve even if ever so slightly over time – maybe you need to use a different portal, corporate policy changes on returns, etc.].
-Described AI agents as similar to children – can do some amazing things but also very unruly and can do crazy things. Agents also get more unruly the larger they get / more complicated task they attempt.
-The customer representatives highlighted they are basically nowhere w/ AI Agents. AI zeitgeist spurred them all to consolidate / clean data so they can test AI. But the most tangible cases mentioned were simply document summarization / querying.
This is just a point in time but does seem like AI Agents are nowhere close to working right now.
The SaaS companies are all working on them, are arguably the best application coders out there, and are struggling to get it to work.
PLTR is the only one that seems to have success but their approach doesn’t seem that agentic – they just go into a customer, collect all the data, and setup an application the customer can use (portal that tells you which planes need which parts to be replaced preventatively, etc.).
Probably bullish for SNOW / Databricks as everyone continues to get their data in order, bullish for application SaaS, and probably bearish on margin for AI infra (ie. slow your roll).
Tons of people who have no ability to think for themselves.
They want to be told what to think by someone they trust, and then get mad at anyone who disagrees with what they were told.
@balajis ai agent is good to generate throwaway code. but over the lifetime of most production systems, engineers spend far more hours reviewing PRs, troubleshooting issues, and figuring out how to add features. ai agent can assist with making all these easier.
If you actually believed this then you'd be morally bankrupt for working at a company looking to make it happen.
That leaves only a few actual reasons for saying something like this:
1) You believe you're a part of the few, specially chosen, wise people who should have this power and can guide it fairly (you're not and you can't)
2) You want daddy government to come in and give you a monopoly through discrimatory lawfare because see number one
3) You're unabashedly evil
So which is it?
This is a really fascinating read.
There is no doubt some big FUD about AI, but it is also going to realign so many jobs and its worth understanding how to navigate it.
https://t.co/KwVbrFWUwp
I just read this WSJ article on why Europe's tech scene is so much smaller than the US's and China's.
I'm afraid that, like most articles on this topic, it largely misses the mark.
Which in itself illustrates a key reason why Europe is lagging behind: when you fail to understand the root causes of an issue, you have zero chance to solve it.
What makes me competent to speak on this topic?
Back in the late 2000s and early 2010s, I founded and led HouseTrip which at the time was one of Europe's top startups. We were the first historical startup in which all top 3 VC investors in Europe invested.
So I have a pretty intimate knowledge of the European entrepreneurship ecosystem and what it takes to create and grow a tech company in Europe.
We were pretty promising as a startup. In fact as promising as it can possibly get.
We had a similar concept to Airbnb (with some notable differences I won't bore you with), except we created the company 1 year before they did. Which means we were the first-mover - globally - with a multi-billion-euro concept, strong financial backing by the 3 top investors in Europe and, at some point, a team of 250 people with some of the brightest minds in tech in Europe. Everything we needed to succeed.
And yet we didn't succeed: ultimately we were essentially crushed by our American competitor Airbnb in our home turf - Europe - and we had no choice but to sell ourselves to another American company, Tripadvisor.
Believe me, I've reflected long and hard on how that could have happened. In fact after I left the company in 2015 I even spent 3 months in isolation in the Annapurna mountains in Nepal to reflect full time on exactly that 😅
And I then moved to China, where I spent the next 8 years and where I had the chance to study their ecosystem to understand why they're successful and Europe isn't.
So all in all, I think I have some degree of legitimacy to comment on this topic.
The WSJ article says that Europe lags behind due to the usual suspects, the reasons you constantly hear about: too much regulation, fragmented European markets, limited access to financing, a culture that isn't conducive to the startup grind, etc.
Some of those are true, but imho all are secondary.
Take excessive regulations for instance, which gets mentioned all the time. If they were such a hindrance to startups, why would American startups succeed in Europe - like Airbnb in our case - and European startups not? We all face the same regulations 🤷
Or take fragmented markets. Same question: how could US startups successfully conquer these fragmented EU markets when European startups can't?
Because that's the real elephant in the room, and really the story of the European tech scene since the advent of the internet: US startups have shown a remarkable ability to capture European markets despite the supposed barriers, making many of the "usual suspects" explanations for Europe's tech struggles very unconvincing.
In other words, logically, any explanation where both US and European startups face identical barriers fails to address the fundamental difference in outcomes we consistently observe.
Based on my experience, the key problem faced by European startups can be summarized in one word: patriotism.
There is virtually none in Europe, and more than anything that's what's killing EU startups, or preventing them from developing.
It used to drive me absolutely nuts at HouseTrip. What a startup needs first and foremost, especially a consumer-facing startup like we were, is marketing, to become famous.
At first, when I created the company and before Airbnb was even a thing, I used to pitch the company to the media and the general response I would get was almost one of contempt, as in "why would I belittle myself to write about your startup? And furthermore, who would be stupid enough to stay in an apartment when there are hotels? You guys have no future..."
And then Airbnb got launched and the American media started their thing, hyping the company like it was the greatest innovation since sliced bread, like they were national heroes, giving them hundreds of millions in free publicity.
That's when European media started to take notice. Not of us, god forbid, but of Airbnb. The concept was promoted by Silicon Valley, see... so now it was valid.
So I went back to pitch HouseTrip to European media. This time around I was met with a different kind of contempt: "So you guys are like Airbnb? Why would we cover a European copycat when we can just write about the real American original?" Luckily I'm not violent but lets say those moments really tested my civility 😅
All in all, we arrived in the absolutely grotesque situation where, despite Airbnb not having yet set foot in Europe, they were already a cultural phenomenon there, promoted by European media, for free, when the European original - yours truly - had to spend millions on paid marketing (mostly to Google and Facebook, American companies) to achieve a small fraction of the brand recognition.
Which means that, insanely, Airbnb was probably doing more business in Europe than we did before even opening an office there, simply on the back of the free publicity they were getting. How on earth can you even compete with that?
This dynamic was at play with general European elites too. I remember very clearly having dinner next to a legendary European entrepreneur and investor - who I won't name, a man who supposedly, on paper, is dedicating his life to furthering the European tech ecosystem. We naturally got to talk about HouseTrip and he literally told me, and this is an exact quote: "you know I don't really like copycats, they really hurt the European ecosystem." Another big test for my civility that night...
And even if we had been a copycat, so what? That's how China got started, there's nothing to be ashamed of. You need to learn to walk before you can run.
In fact if you study the history of innovation you'll find that every major tech power, including the US, started by imitating and adapting others' innovations before developing their own.
Speaking of China, again a country that I know in depth for having lived there for 8 years after HouseTrip, I've come to the conclusion that patriotism, a deeply rooted mindset of sovereignty, is truly the magic ingredient behind their success.
Contrary to popular belief, they don't do it in a stupid way by just banning competition. Those cases are actually very rare and only occur if the companies in question violate Chinese law in pretty egregious ways.
Most of the time it's the exact contrary: they welcome foreign companies and competition, but create conditions where local alternatives can thrive alongside them, giving Chinese users and businesses legitimate options to choose domestic champions.
Which means you end up with, for instance, Apple doing well in China but simultaneously allowing the rise of Huawei or Xiaomi. Or Tesla doing well in China but simultaneously allowing the rise of BYD or Nio. Etc.
And China is, interestingly, more comparable to the EU than most people realize. It is, again contrary to popular belief, extremely decentralized when it comes to doing business, with various provinces competing against each other much the same way EU countries compete against each other.
But they do it in such a way where, again, the overarching sense of Chinese sovereignty never gets sacrificed at the altar of provincial competition. And where the ultimate goal is to develop Chinese champions which can successfully compete on the global stage.
So there you have it, the dirty little secret behind Europe's lag. We're essentially witnessing a "colonization of the minds" whereby Europe has structurally internalized its technological inferiority, celebrating American startups while dismissing its own homegrown companies.
Why does this barely ever get talked about? Think about it: do you seriously think that the Wall Street Journal would start advocating for, essentially, policies hostile to American tech dominance?
Much better to focus on the usual red herrings like too much regulation or fragmentation which, conveniently, would primarily result in clearing obstacles for American tech giants to dominate European markets even further, rather than nurturing homegrown competitors. This article is, in itself, an illustration of the "colonization of the minds".
DOLLAR INFLATION IS GLOBAL TAXATION
There is just a fundamental misconception about what the USA actually is.
It is the seat of a global financial empire (or was). It makes its money by printing it. For example, it printed $1.25T for just one purchase back in 2010.
Do you know many Nikes you’d have to sell to make a trillion dollars?!?
The reason the American Empire had the right to print a trillion dollars at its whim is because it was managing the global financial system for everyone. So it had the right to impose global taxation via dollar inflation. Every time it printed $1T, that was divided by the 6B+ direct and indirect dollar holders worldwide, not just the 330M Americans.
In other words: the whole world paid the US to run the empire. And got diluted down (aka taxed) every time the US printed another dollar. That’s why Milton Friedman said inflation is taxation without legislation.
Anyway — the USA could impose global taxation via dollar inflation because it was a stable jurisdiction for multinational business, due to Delaware. It issued visas for scientists and workers from all over the world. It didn’t have psychos bombing cars, shooting CEOs, or blocking roads. It advocated free trade, respected property rights, and only sanctioned rogue states.
Now all that is gone. The US is no longer a neutral arbiter of the rules-based order that it once set up for its own benefit. And so it’s going to lose that money-printing power. But before it goes, you should understand what it’s losing.
Because the money-printing business model was way, way, way more profitable than working for a living. Yes, it had long-term negative consequences, but in the short and medium-term (meaning: multiple decades) it had absolutely no peer.
Exiting the money-printing business to return to the abandoned manufacturing business is likely not even possible because of the societal instability such a sharp decline in living standards will cause.
Anyway, the people who ran American Empire could have spent this money more wisely. They could have taken far better care of their own citizens, and dropped fewer bombs on non-citizens. Then they wouldn’t be at this juncture.
But *how* it spent the trillions is distinct from the indisputable fact that the US printed trillions in the first place. So the American Empire as an entity wasn’t being ripped off by the world, especially by countries like Vietnam.
More the contrary.