Talks at the intersection of systems engineering and computational biology
0:20 Why study systems x biology in "age of agents"
5:50 Forch: Building a utilitarian cloud container orchestrator (Max Smolin, LatchBio)
41:25 cyto: Ultra high-throughput processing of 10x Flex single-cell sequencing (Noam Teyssier, Arc Institute)
1:04:30 SLAF: A single-cell omics storage format for the virtual cell era (Pavan Ramkumar, SLAF Project)
1:33:30 Lessons in Perturbation Modeling: STATE, STACK, and Beyond (Dhruv Gautam, Arc Institute + UC Berkeley)
2:03:15 Leveraging Serverless Distributed Computing to Scale Computational Biology (Ben Shababo, Modal)
Topics span container orchestration, single-cell infra, perturbation modeling for biology at scale.
New NanoGPT Speedrun WR at 92.1 (-0.3s) from @dhrvji , by moving the bigram hash from CPU to GPU. As shown here, recently added architectures are a great place to look for engineering improvements. https://t.co/aygHwUTsfI
LLMs needed post-training to become useful to end users, leading to the advent of prompt engineering.
We're excited to announce STACK, a SOTA cell foundation model that leverages self-distillation based post-training to enable prompt engineering for cells.
@finbarrtimbers i see computer use as pretty necessary for scientific discovery,
interfacing with random software without apis, good plotting capabilities, and lots of superhuman imaging analysis put together with all the code gen abilities can definitely be v impactful
Arc bioinformatics scientists @noamteyssier and @a_dobin have just released cyto, an ultra-high throughput processor specifically optimized for @10xGenomics Flex single-cell data.
We are excited to make this resource open source: https://t.co/z5sxK6owjd
has anyone had any success in getting claude code/codex to setup a chain of SLURM dependency jobs; seems to really struggle with reasoning abt dependence even in plan mode
Predicting cell state in previously unseen conditions such as disease or in response to a drug has typically required retraining for each new biological context. Today, Arc is releasing Stack, a foundation model that learns to simulate cell state under novel conditions directly at inference time, no fine-tuning required.
LLMs needed post-training to become useful to end users, leading to the advent of prompt engineering.
We're excited to announce STACK, a SOTA cell foundation model that leverages self-distillation based post-training to enable prompt engineering for cells.
@jiaxinwen22@SonglinYang4 their discussion on evals and preventing optimization for making memorization easier is great; theres still so much unresolved on understanding in weights learning/icl dynamics in pretraining
@kenbwork i agree, i meant rather that when these models start outperforming humans bc of RL, we can start analyzing their CoTs to find new strategies (our current ones are probably not optimal). this is more likely to work with datasets where the data isn’t heavily human annotated/biased
@kenbwork though i could imagine that RL on these sorts of tasks (if sufficiently diverse enough), and inspecting the CoTs will actually give us "new" tools for bioinformatics analysis in the next year
@kenbwork yeah i'd imagine tool design for bio will stay for now. agents working on software engineering often can iterate and test things in one off scripts; enabling agents to manipulate biological data into various forms that give denoised & dif learning signals is not the simplest RL
@kenbwork Do you imagine this result to be null once models start training on these sorts of tasks? kind of like how early into swe bench (su24) the best agents had these really complex workflows and now longc / basic harnesses (https://t.co/Ybx6dnK2gY, https://t.co/uJnOsJ35fp) are ~sota
@4ndyXu gpt4b is a good example of 1. working with just midtraining, with a domain specific plm you can distill protein sequences and align spaces with a pretrained LLM
I imagine that the midtraining recipe will be very difficult to get right (ie annotating the sequences properly)
Here's how LLM providers (& anyone) should be doing age verification in 2025: Keep the ID private; prove "≥18" with ZK proofs.
Our new paper with @srinathtv "🌟Vega: Low‑Latency Zero-Knowledge Proofs over Existing Credentials" makes this practical today.
https://t.co/o5suZaW3pj