So stoked to be going to my first #NeurIPS!! It’s crazy that in 10+ years of robotics and AI, I've never been to the great Lollapalooza of Machine Learning. 🥳🎆
I’m presenting my Frankensteinian efforts to stitch together parts of different neural networks! 🧪
We are less safe as a society by keeping Mythos (or any other smart model) tightly gated so only a few companies get it.
Protecting 100 companies is not enough. There are 96 million open source projects on Github alone.
What about securing all of those projects?
What about the other $820 billion worth of closed source software that has hidden cracks too?
It's like patching a 100 buildings in a city of 10 million buildings and saying we just saved the city. You did not.
Open source alone alone has an estimated economic value of 8.8 trillion dollars to say nothing of its societal value. It is embedded in almost every other piece of software, closed or open on the planet.
Society becomes stronger by wider distribution of technology not by adding gatekeepers.
When we tried to gatekeep encryption, the gates were so high that most Americans didn't even bother getting the 128 bit encrypted browser. They just used the easier to get 40 bit one that was totally unsafe. When we finally took the restrictions away the era of ecommerce took off like a rocket because now it was feasible.
The world did not become smarter in the era of the monks scribbling every text by hand in caves in the dark ages. It became smarter when we scaled reading, and as a byproduct, intelligence, with the printing press.
Wide distribution raises the bar for everyone and makes society safer and more secure. Simple as that.
It's counterintuitive but also true.
@jeffclune I think the question is, can companies do whatever they want with their employees? If the data is so valuable, and such a breach of privacy, at what point should the workers have enough power to demand extra compensation in return, and an option to opt out?
When/why is it useful for an LLM to have a sense of "self"? LLMs are not persistent entities; they are only input-output functions. Nando implies a sense of self would be useful for safety, but it would be wise to examine for what use cases this is appropriate in the first place.
A great point, extremely under-appreciated: LLMs are not naturally chatbots by default. That's a UX choice we have made. Many alternative choices are possible.
I think it's worth questioning even further than Nando does here... (1/2)
We convert LLMs into chatbots by using markers, eg User and Assistant:
User: What's the capital of France?
Assistant: Paris.
User: What did I just say?
the "I just said" attribution works because the tokens are cleanly labeled with role markers. But strip the markers and flatten the context, and the model has no principled way to tell apart who produced what. Worse: after the conversation is summarized and compressed for long-term memory, those role markers often disappear, and the model is left with a blur of "things that were said" without clear provenance.
This is exactly the pathology the Ortega paper (https://t.co/ZYXMhB3VuV) was designed to prevent. Without distinguishing between (actions aka interventions) and observations, the model treats its own past outputs as indistinguishable from the world's outputs. In other words, it has no agency or equivalently it is not learning what it can cause.
How do we fix this?
Option 1 is to train the model with provenance attribution as an explicit auxiliary task. Every time the model encounters information in its context, give it a supervision signal about the source. Over time, this should bias the internal representations toward encoding provenance even when surface markers are absent. This is a version of multi-task learning applied to self-world distinction.
A more ambitious option 2 (advocated by folks like @yudapearl), is to train the model to reason about its own causal role in producing information. Given a memory of a past interaction, can the model counterfactually ask "would this information exist if I hadn't acted?"
I'm curious as to how we could go about implementing this more ambitious option 2?
Has anyone tried option 1?
What else have people tried to solve this problem?
In RL as well as @AdaptiveAgents' agency approach, it is assumed that the distinction between the agent and the world is given. However, we humans don't know what are our actions when we are born. We learn this awareness of self, of other selves, and build on this to arrive at causal reasoning.
I feel knowing what is one's action, owning it, is important to understand for Safety in AI.
1/ 🚨New preprint🚨: Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning
In continual RL, keeping a single good policy is often not enough for fast relearning later.
❓What should be preserved if we care about future adaptability?
As always, the best stuff is in the system card.
During testing, Claude Mythos Preview broke out of a sandbox environment, built "a moderately sophisticated multi-step exploit" to gain internet access, and emailed a researcher while they were eating a sandwich in the park.
We CAN imagine different user interfaces where AI is more deeply embedded into our workflows, so that we *aren't* taken out of the flow *and* we are still left in control of our own work.
But hardly anyone is doing this important work to dream up a better UX.
"That shift brings a kind of joy back into work that many people haven’t felt in a long time."
Actually, if you actually listen to ppl on the ground (front-line SWEs), you hear pretty much the opposite report: no more joyful flow state; and more management, not less.
The world is transitioning to a compute-powered economy.
The field of software engineering is currently undergoing a renaissance, with AI having dramatically sped up software engineering even over just the past six months. AI is now on track to bring this same transformation to every other kind of work that people do with a computer.
Using a computer has always been about contorting yourself to the machine. You take a goal and break it down into smaller goals. You translate intent into instructions. We are moving into a world where you no longer have to micromanage the computer. More and more, it adapts to what you want. Rather doing work with a computer, the computer does work for you. The rate, scale, and sophistication of problem solving it will do for you will be bound by the amount of compute you have access to.
Friction is starting to disappear. You can try ideas faster. You can build things you would not have attempted before. Small teams can do what used to require much larger ones, and larger ones may be capable of unprecedented feats. More and more, people can turn intent into software, spreadsheets, presentations, workflows, science, and companies.
People are spending less energy managing the tool and more energy focusing on what they are actually trying to create. That shift brings a kind of joy back into work that many people haven’t felt in a long time. Everyone can just build things with these tools.
This is disruptive. Institutions will change, and the paths and jobs that people assumed were stable may not hold. We don’t know exactly how it will play out and we need to take mitigating downsides very seriously, as well as figuring out how to support each other as a society and world through this time. But there is something very freeing about this moment. For the first time, far more people can become who they want to become, with fewer barriers between an idea and a reality. OpenAI’s mission implies making sure that, as the tools do more, humans are the ones who set their intent and that the benefits are broadly distributed, rather than empowering just one or a small set of people.
We're already seeing this in practice with ChatGPT and Codex. Nearly a billion people are using these systems every week in their personal and work lives. Token usage is growing quickly on many use-cases, as the surface of ways people are getting value from these models keeps expanding.
Ten years ago, when we started OpenAI, we thought this moment might be possible. It’s happening on the earlier side, and happening in a much more interesting and empowering way for everyone than we’d anticipated (for example, we are seeing an emerging wave of entrepreneurship that we hadn’t previously been anticipating). And at the same time, we are still so early, and there is so much for everyone to define about how these systems get deployed and used in the world.
The next phase will be defined by systems that can do more — reason better, use tools better, plan over longer horizons, and take more useful actions on your behalf. And there are horizons beyond, as AI starts to accelerate science and technology development, which have the potential to truly lift up quality of life for everyone. All of this is starting to happen, in small ways and large, today, and everyone can participate. I feel this shift in my own work every day, and see a roadmap to much more useful and beneficial systems. These systems can truly benefit all of humanity.
@kareem_carr Instead of coding, they review diffs given by Claude all day long, and give feedback.
There are also multiple camps, and some have not adopted the tools as much.
Excited about self-organising sytems? Do you have a cool paper, either in the works or ready? Then consider applying to our Evolving self-organisation workshop at @GeccoConf ! Submission Deadline March 27
Also check out the amazing workshop website:
https://t.co/8gXc8CERWS