Delighted to share our latest work deciphering the landscape of chromatin accessibility and modeling the DNA sequence syntax rules underlying gene regulation during human development! https://t.co/zIUjPy6ZLz. Read on for more 🧵 [1/16]
@benoitbruneau@WJGreenleaf TBX motifs do show up as significantly enriched motifs in the ATAC marker peaks in an aCM cluster in the heart and an excitatory neuron cluster in the brain (Supp note 2) as well -- they are indeed important motifs for development
@benoitbruneau@WJGreenleaf Hi Benoit, good eye! We actually did identify T-Box motifs, but they have been lumped into the "HD: MEIS/TGIF" motif family (with a significant # of instances in the heart!) in Fig 3e due to motif similarities -- we could resolve this with finer motif clustering
@CarlosEAlvare17 These SNV-overlapping motifs generally have high PhyloP conservation scores. We haven’t looked at the enhancer conservation specifically, but it’s definitely an interesting question and can be explored further!
Delighted to share our latest work deciphering the landscape of chromatin accessibility and modeling the DNA sequence syntax rules underlying gene regulation during human development! https://t.co/zIUjPy6ZLz. Read on for more 🧵 [1/16]
@CarlosEAlvare17 Great question! We do highlight two SNVs associated with adult onset diseases that appear to have a fetal origin (fig 7c-d), including one SNV linked to asthma that disrupts a NRF1 motif in fetal liver macrophages.
Our single-cell atlas, trained models, accessibility tracks, per-basepair contrib scores, and annotated motif instances (and more) are available here: https://t.co/juprP34mNi. Analysis code & tutorials are available here: https://t.co/KLpk8WAbNM [16/16]
Big shoutout to pkgs that were essential: ChromBPNet, the seq-to-accessibility model (@panushri25); Fi-NeMo, to identify predictive motif instances (@austintwang); tangermeme, for in silico experiments (@jmschreiber91); & BPCells, for efficient genomic plots (@bn_parks) [15/16]
Last, we found that putative causal noncoding variants for various traits were enriched in CREs in a cell type specific manner, and we used our models to predict how SNVs alter motifs and thus alter accessibility, providing mechanistic interpretation of variant effect: [14/16]
Deep learning models can identify sequences predictive of accessibility, but also which *negatively* impact accessibility. We found a small set of negative motifs (in peaks!) that were extremely abundant in every cell type, enriched near nucleosome dyads and peak flanks [13/16]
We also found examples of "soft" syntax, where motifs synergize across longer distances (<150 bp), potentially reflecting active or passive competiton of TFs with nucleosomes mediating cooperativity: [12/16]
Our models were able to predict synergistic effects at exactly the motif syntax described for the Coordinator motif by @seungsookim10&Wysocka lab, where X-ray crystallography showed DNA stabilizes weak contacts between TWIST1&ALX4, which directs mesenchymal gene programs: [11/16]
We found 100s of composite motifs, so we used our models to systematically define effects of motif syntax (spacing/orientation) on synergy in silico. We found dozens of "hard" syntax cases, where synergy relies on strict motif position, likely due to direct interactions: [10/16]
We used this map of motif instances in every cell type to identify ubiquitous and cell type specific motifs, and found that a small set of ubituitous, CG-rich motifs tended to occur in promoters, while cell type specific motifs were predominant at distal&intronic regions: [9/16]
Clustering these motifs, we assembled a lexicon of 508 unique motifs which influence accessibility during development, and mapped back to peaks to automatically annotate predictive motif instances in open chromatin in every cell type, representing putative TF binding sites [8/16]
We used model interpretation techniques (DeepLIFT & TF-MoDISco) to score the contribution of every nucloetide to accessibility, and find recurrent patterns of sequences predictive of local chromatin accessibility - and these patterns turn out to resemble TF binding motifs! [7/16]
What are the DNA sequences that drive accessibility in each cell type? We trained ChromBPNet models - deep learning models tasked with predicting chromatin accessibility at basepair resolution from sequence alone, accounting for Tn5 bias. These models work remarkably well: [6/16]
Do these CREs drive activity in vivo? We inspected the VISTA enhancers (validated in reporter mice), and our data suggested previously unappreciated activity of some enhancers in the liver! Our data resolved one enhancer to be erythroblast-specific, confirmed w/ histology: [5/16]