Fine-tuning sequence-to-expression deep-learning models on personal genome and transcriptome data. A step toward individualized regulatory genomics. 🤖🧬 @GenomeBiology
https://t.co/dqBRJ0MkE2
🚀 We are introducing PerturbPair (with @TakaKud0) — a platform that combines parallel Perturb-seq and optical pooled screening (OPS/PerturbView) in primary cells to systematically map at massive scale how genetic perturbations reshape cellular states across modalities.
With wonderful collaborators @TakaKud0, @AnaMeireles, @AntRios, @jchuetter, @MinOta, @ORozenblattRosen, @LeviAGarraway, @KGeiger, @avtarsingh, @jkpritch, and Aviv Regev.
Paper link: https://t.co/fnSUymW95s
@bratton If intelligence and consciousness are independent qualities/dimensions of objects, to me it is almost definitionally more profound (well, at least more interesting) bc it means there is a greater variety of realizable systems wrt those dims.
Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology.
The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics.
We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity.
We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures.
ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences.
A world model of protein biology emerges through language modeling.
We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins.
The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science.
This understanding emerges without prior knowledge, just from language modeling of protein sequences.
Language models are becoming a powerful substrate to understand and program biology.
The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders.
I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.
Following up on the suggestion from Will Sawin, here is an illustration of the new configurations that disprove Erdos' unit distance conjecture (made with the help of ChatGPT 5.5 Thinking).
Ever wanted to talk to a gene regulatory network? Ever wonder what they could do in a different context, embodiment, space?
New #preprint! @YanboZhang3
"Language Game: Talking to Non-Human Systems"
https://t.co/JW3bGFRmVh
An early step in our program of developing ways to talk to organs, cells, molecular networks, and far weirder kinds of agents.
New #preprint, @PigozziFederico:
https://t.co/hJe7b14hVm
"The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents"
"A hallmark of life on Earth is the ability of agents to exert causal power and be drivers of subsequent events. This is key to cognition at all scales. Causal emergence, measuring the degree to which an agent exerts unique predictive power on its future, is one consequence of causal power. Indeed, recent discoveries have shown that biological agents, even minimal ones, increase their causal emergence after learning new memories. However, there is a major knowledge gap regarding how causally emergent artificial agents are. We focused on Reinforcement Learning (RL) of neural-network agents across an array of environmental conditions, encompassing different algorithms, agent architectures, and six environments arranged on a complexity spectrum. For consistency, we computed the causal emergence of their latent-space representations over their lifetimes. We used the recently proposed {\Phi}ID to estimate causal emergence and tested how it related to learning performance. Our results suggested a Causally Emergent Alignment Hypothesis: successful agents exhibited causal emergence that was consistently predictive of final reward early in training and whose representational dynamics aligned with reward improvement in most tasks. This idea suggests that causal emergence may be a previously undisclosed axis of reorganization of neural representations in RL agents, with the potential to establish causal relationships and interventions that will lead to better RL agents. Our work also highlights the alignment between causal emergence and learning as another way biological and artificial creatures compare."
Insight as a toy model for mystical experiences.
Our new paper shows how Aha!, insight, Eureka, and mystical experiences can be modelled as realizations varying in intensity and size of the network affected. A small Aha! might restructure a local problem representation, whereas a mystical experience destabilizes a much larger self/world model.
Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see.
@eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)
Lately I've been seeing a very interesting major shift.
Large, man-made things that used to be designed and build-planned like they’re architecture are being moved to be designed and built like manufactured products:
Ships and data centers.
Historically, these systems were "architected".
What does that mean? For the sake of brevity, I'm going to be overly reductive.
There are 4 major "CAD" companies that people use to design and plan "big assemblies with lots of parts". 3 focus on manufacturing (Siemens, PTC, Dassault -- actual CAD), 1 focuses on architecture (Autodesk -- called BIM).
Historically, ships were "architected". To this day, the person who is responsible for the design and manages the build of a ship and submarine is called a "Naval Architect".
When software came along, ships mostly either stayed on paper (ouch!) or made their way into the same software as buildings -- architecture-oriented CAD (BIM).
Similarly, the way data centers have been designed and planned were as buildings. This is somewhat understandable if you consider them to be one-offs, as they've often historically been. Thus, they too have lived entirely in the BIM/architecture world -- until now.
We're seeing two massive surges occur simultaneously: the AI boom demanding more more more data centers, and the defense boom demanding more more more ships.
To go from bespoke build (architecture) to modular, repeatable, scaled production, I've been seeing data center companies and maritime companies make a massive push:
All of them are migrating all of their designs away from BIM/architecture software (Autodesk) and onto manufacturing software (Siemens, PTC, Dassault).
We're seeing a migration away from a "bespoke, architected" built world to a more "modular, repeatable, scalable" built world.
To achieve the scale of product volume that their customers now demand, companies building ships and data centers have now moved to standardize and modularize their products so they can achieve economies of scale, allowing their systems and subsystems to be mass manufactured with consistency and reliability across different locations. This is needed so that they can be built quickly, repeatably, with the expectation that their subsystems have reliable interoperability and composability.
Gorgeous eruption that has become a halo CME on its way to Earth (likely arriving on December 9). Quite strong shock, given that proton flux has started to rise,
New paper with @robertchisciure !
"Cognition all the way down 2.0: neuroscience beyond neurons in the diverse intelligence era"
https://t.co/e6SI2WGlN4
"This paper formalizes biological intelligence as search efficiency in multi-scale problem spaces, aiming to resolve epistemic deadlocks in the basal “cognition wars” unfolding in the Diverse Intelligence research program. It extends classical work on symbolic problem-solving to define a novel problem space lexicon and search efficiency metric. Construed as an operationalization of intelligence, this metric is the decimal logarithm of the ratio between the cost of a random walk and that of a biological agent. Thus, the search efficiency measures how many orders of magnitude of dissipative work an agentic policy saves relative to a maximal-entropy search strategy. Empirical models for amoeboid chemotaxis and barium-induced planarian head regeneration show that, under conservative (i.e., intelligence-underestimating) assumptions, even ‘simple’ organisms are from two-hundred- to sextillion-fold more efficient in problem space exploration. In this sense, the deep insights of neuroscience are not about neurons per se, but about the policies and patterns of physics and mathematics that function as a kind of “cognitive glue” binding parts toward higher levels of collective intelligence in wholes of highly diverse composition and origin. Therefore, our synthesis argues that the “mark of the cognitive” is perhaps better sought in the measurable efficiency with which living systems, from single cells to complex organisms, traverse energy and information gradients to tame combinatorial explosions-one problem space at a time."
@srikosuri As a transplant from tech to bio, I totally agree. The root of the similarity is whether you're running experiment loops to understand something, and many more analogies blossom from there.