Josh Pacini

Verified account

@joshpacini

ceo @ valinor

Joined November 2023

1.8K Following

1K Followers

1.3K Posts

2 days ago

Biological datasets from living patients are rare yet essential for AI progress in healthcare. Really exciting work from the Crownlands team!

2 days ago

Human biology matters. Scientists and AI need human data to understand health and disease. Crownlands is open sourcing Gateway 4M, the largest single-cell tissue dataset ever released from living humans, to advance research on brain aging and neurodegeneration.

17

124

31

48

28K

0

12

1

0

261

2 days ago

Imagine watching Prometheus hand humanity fire and worrying about wood’s forward multiple

0

5

0

0

82

10 days ago

Railroads and steel? Overhyped. It’s all just circular revenue. Smart money is in horses.

0

4

0

0

167

11 days ago

Recruiting ML scientists to join us in modeling patient biology is the most fun I’ve ever had professionally

@valinordiscover

11 days ago

Thrilled to share that Manuel Tran is joining Valinor! Manuel joins us from Roche and TUM, where he spent his time as an ML scientist building generative models for various modalities like histopathology, transcriptomics, and proteomics. Among others, Manuel led the development of three models that are worth calling out: LoReTTa is a self-supervised pre-training algorithm that enables a single multimodal transformer to operate across heterogeneous data types — images, text, audio, genomics, transcriptomics, and proteomics — even when the training data never contained all modality combinations simultaneously. This is a critical bottleneck in healthcare, where complete multi-modal patient records are rarely available. Validated on cancer molecular data from 7,030 TCGA patients, it outperformed GPT, BERT, and CLIP on survival prediction across all modality combinations, including those entirely absent from training. HistoGPT is a vision-language model for dermatopathology report generation that operates on full gigapixel WSIs rather than the ≤1024px patches earlier generative models were limited to. It couples a pathology vision encoder to BioGPT via Flamingo-style gated cross-attention, keeping the backbone frozen and training only the XATTN blocks. It does zero-shot tumor subtype/thickness/margin prediction and exposes gradient×attention saliency maps that localize each generated token back to tissue. Phoenix is the more recent and ambitious model: single-cell spatial transcriptomics predicted from H&E via latent flow matching. It’s a 1.2B-parameter conditional flow-matching transformer (DiT-style, with an MLP-Mixer autoencoder for the gene latent) conditioned on pathology foundation model image features, trained on 22.2M Xenium cell-image/expression pairs. The notable result is generalization: with cohort-level train/test splits, it transfers zero-shot to unseen organs and tissues and improves Spearman correlation by 35–173% over baselines that otherwise collapse to the mean. It scales to a 9,544-patient TCGA atlas and fine-tunes cleanly to sarcoma and mouse PDAC. Manuel’s deep expertise in pathology foundation models, generative architectures, and getting models to actually generalize across the messy reality of clinical data will be critical as we scale our multimodal virtual patient models.

valinordiscover's tweet photo. Thrilled to share that Manuel Tran is joining Valinor! Manuel joins us from Roche and TUM, where he spent his time as an ML scientist building generative models for various modalities like histopathology, transcriptomics, and proteomics.

Among others, Manuel led the development of three models that are worth calling out:

LoReTTa is a self-supervised pre-training algorithm that enables a single multimodal transformer to operate across heterogeneous data types — images, text, audio, genomics, transcriptomics, and proteomics — even when the training data never contained all modality combinations simultaneously. This is a critical bottleneck in healthcare, where complete multi-modal patient records are rarely available. Validated on cancer molecular data from 7,030 TCGA patients, it outperformed GPT, BERT, and CLIP on survival prediction across all modality combinations, including those entirely absent from training.

HistoGPT is a vision-language model for dermatopathology report generation that operates on full gigapixel WSIs rather than the ≤1024px patches earlier generative models were limited to. It couples a pathology vision encoder to BioGPT via Flamingo-style gated cross-attention, keeping the backbone frozen and training only the XATTN blocks. It does zero-shot tumor subtype/thickness/margin prediction and exposes gradient×attention saliency maps that localize each generated token back to tissue.

Phoenix is the more recent and ambitious model: single-cell spatial transcriptomics predicted from H&E via latent flow matching. It’s a 1.2B-parameter conditional flow-matching transformer (DiT-style, with an MLP-Mixer autoencoder for the gene latent) conditioned on pathology foundation model image features, trained on 22.2M Xenium cell-image/expression pairs. The notable result is generalization: with cohort-level train/test splits, it transfers zero-shot to unseen organs and tissues and improves Spearman correlation by 35–173% over baselines that otherwise collapse to the mean. It scales to a 9,544-patient TCGA atlas and fine-tunes cleanly to sarcoma and mouse PDAC.

Manuel’s deep expertise in pathology foundation models, generative architectures, and getting models to actually generalize across the messy reality of clinical data will be critical as we scale our multimodal virtual patient models.

0

7

3

0

1K

0

9

1

0

974

17 days ago

@kylekuzma How do I invest in the fund

0

4

0

0

333

17 days ago

It will become increasingly obvious over time who outsources their writing to AI and who doesn’t

1

5

0

0

4K

17 days ago

@jehovahscript But does it beat other Ferrari’s that cost $500k?

1

0

0

0

52

17 days ago

Kuz joining a16z or Anthropic by EOY

17 days ago

It’s funny….every AI startup deck claims a data moat. 5% actually have one. Would your data be impossible to replicate even if a competitor raised $500M tomorrow? If yes cool you have a business.

90

953

27

164

217K

1

18

0

4

147K

20 days ago

Setting the claude credit spend limit to unlimited for your best engineer

joshpacini's tweet photo. Setting the claude credit spend limit to unlimited for your best engineer https://t.co/5u8LoeYLa8

2

18

1

1

951

20 days ago

@mattturck Podcasting and venture will be the only viable careers don’t worry

0

1

0

0

11

20 days ago

Podcasts are resilient to the rise of superintelligence because even after it’s achieved we will still want to hear from everyone who built it

22 days ago

Why AI Progress Suddenly Feels Real - my conversation with @yanndubs, who co-leads the Post-Training Frontiers team at @OpenAI 00:00 - Intro 01:30 - Why recent AI progress feels like a step function 04:13 - Model reliability & the emotional rollercoaster of shipping GPT-5.5 07:33 - How OpenAI structures vertical and horizontal teams 09:49 - Improving model efficiency and test-time compute 12:32 - Yann's journey from Switzerland to OpenAI 15:37 - Reasoning in 2026: Real-world utility vs verifiable rewards 18:34 - GPT-5.5 Thinking vs Pro: Scaling test-time compute 20:09 - How reasoning models become more efficient 23:23 - Pre-training scaling and overcoming the data wall 27:03 - Multimodal data, synthetic data, and embodied AI 31:05 - Demystifying mid-training and post-training 37:21 - Does RL create new capabilities in AI? 38:53 - The challenges and frontier of scaling RL 43:09 - Is building AI models a craft or a strict science 48:21 - How AI models generalize across different domains 54:18 - How reinforcement learning cures AI hallucinations 56:04 - Negative generalization and conflicting instructions 58:05 - Can RL scale to law, medicine, and the broader economy? 1:00:19 - The evaluation bottleneck and Model as a Judge 1:04:21 - Continuous AI progress & continual learning 1:08:49 - Will foundation models eat the agent harness 1:11:23 - Why startups should focus on the last mile of AI

8

392

39

777

99K

1

7

0

0

1K

21 days ago

@_sholtodouglas https://t.co/dol4m2vqex

21 days ago

We thought AI would be everyone’s personal Aristotle but turns out it’s just Dwarkesh

0

13

1

3

1K

0

8

0

0

671

21 days ago

We thought AI would be everyone’s personal Aristotle but turns out it’s just Dwarkesh

21 days ago

New blackboard lecture w @reinerpope How do chips actually work – starting with basic logic gates, and working up to why GPUs, TPUs, FPGAs, and the human brain each look the way they do. 0:00:00 – Building a multiply-accumulate from logic gates 0:16:20 – Muxes and the cost of data movement 0:25:59 – How systolic arrays work 0:39:00 – Clock cycles and pipeline registers 0:51:40 – FPGAs vs ASICs 1:03:14 – Cache vs scratchpad 1:07:16 – Why CPU cores are much bigger than GPU cores 1:11:49 – Brains vs chips 1:15:22 – A GPU is just a bunch of tiny TPUs Look up Dwarkesh Podcast on YouTube/Spotify/etc to watch. Enjoy!

94

6K

725

7K

925K

0

13

1

3

1K

22 days ago

We taught sand to think and now it’s recreating the visions of John and Ezekiel

Alvaro Lozano-Robledo

22 days ago

Following up on the suggestion from Will Sawin, here is an illustration of the new configurations that disprove Erdos' unit distance conjecture (made with the help of ChatGPT 5.5 Thinking).

mathandcobb's tweet photo. Following up on the suggestion from Will Sawin, here is an illustration of the new configurations that disprove Erdos' unit distance conjecture (made with the help of ChatGPT 5.5 Thinking). https://t.co/V0yfOy4pV3

114

3K

245

1K

2M

0

9

0

1

455

22 days ago

SpaceX, Cerebras, & OpenAI/Anthropic have made it obvious that the surest path to building something generational is to pursue something extremely difficult to the point of absurdity

1

40

3

5

15K

23 days ago

Now is the best time in history to be alive

joshpacini's tweet photo. Now is the best time in history to be alive https://t.co/AZxmVou0B4

1

18

0

2

4K

Last Seen Users on Sotwe

Trends for you

Most Popular Users