2019 Aug 5. doi: 10.1038/s41592-019-0502-z, Nucleic Acids Research 47(14):7235-7246, Aug 22 2019. doi: 10.1093/nar/gkz538, Molecular Biology and Evolution. We use SCINET to analyze the human cortex, reconstructing interactomes for the major cell types of the adult human brain. We show in simulation that CaMMEL accurately distinguishes between mediating and pleiotropic genes unlike existing methods. The algorithm is the first for this problem with provable guarantees. Genome Research 24(3):475-86, Dec 5, 2013. For this potential to be realized, statistical and biological tasks must be integrated at all levels, including study design, experiment planning, model building and refinement, and data interpretation. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. Our results suggest a general strategy for deciphering cis-regulatory elements by systematic large-scale experimental manipulation, and provide quantitative enhancer activity measurements across thousands of constructs that can be mined to generate and test predictive models of gene expression, Association studies provide genome-wide information about the genetic basis of complex disease, but medical research has focused primarily on protein-coding variants, owing to the difficulty of interpreting noncoding mutations. We study the relationship between recombination rate and gene regulatory domains, defined by a gene and its linked control elements. Chromatin variability shows genetic inheritance in trios, correlates with genetic variation and population divergence, and is associated with disruptions of transcription factor binding motifs. Sep 3, 2015; Nature Biotechnology 33(8):825-6. Many biologically important RNA structures are conserved in evolution leading to Lastly, new massively parallel reporter experiments can systematically validate regulatory predictions. Massively parallel reporter assays (MPRAs) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. We find a coordinated downregulation of synaptic plasticity genes and regulatory regions, and upregulation of immune response genes and regulatory regions, which are targeted by factors that belong to the ETS family of transcriptional regulators, including PU.1. Four mRNAs display highly efficient stop codon readthrough, and these mRNAs have a UGA stop codon immediately followed by CUAG (UGA_CUAG) that is conserved throughout vertebrates. Overall, HiDRA provides a high-throughput, high-resolution approach for dissecting regulatory regions and driver nucleotides, Onuchic, Lurie, Carrero, Pawliczek, Patel, Rozowsky, Galeev, Huang, Altshuler, Zhang, Harris, Coarfa, Ashmore, Bertol, Fakhouri, Yu, Kellis, Gerstein, Milosavljevic, To assess the impact of genetic variation in regulatory loci on human health, we constructed a high-resolution map of allelic imbalances in DNA methylation, histone marks, and gene transcription in 71 epigenomes from 36 distinct cell and tissue types from 13 donors. We find that most differences between the readthrough repertoires of the two species arose from readthrough gain or loss in existing genes, rather than birth of new genes or gene death; that readthrough-associated RNA structures are sometimes gained or lost while readthrough persists; that readthrough is more likely to be lost at TAA and TAG stop codons; and that readthrough is under continued purifying evolutionary selection in mosquito, based on population genetic evidence. Bioinformatics. The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. Notably, AD-associated genetic variants are specifically enriched in increasing-level enhancer orthologues, implicating immune processes in AD predisposition. We test ~7 million accessible DNA fragments in a single experiment, by coupling accessible chromatin extraction with self-transcribing episomal reporters (ATAC-STARR-seq). PARC - Research Intern Manolis Kellis is a Professor of Computer Science at MIT, an Institute Member of the Broad Institute of MIT and Harvard, a member of the Computer Science and Artificial Intelligence Lab at MIT, and head of the MIT Computational Biology Group (compbio.mit.edu). Here, we analyze patterns of observed-to-expected mutation counts across 505 whole cancer genomes, and find that genomic features missing from our mutation-rate model likely operate on a megabase length scale. However, until recently, their experimental dissection by directed regulatory motif disruption has remained unfeasible at the genome scale, due to the technological lag in large-scale DNA synthesis. Manolis Kellis Dissecting disease mechanism. As a way to address some of these challenges, here we introduce ACTIONet, a comprehensive framework that combines archetypal analysis and network theory to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. Our imputed datasets provide the most comprehensive human regulatory region annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. The strongest disease-associated changes appeared early in pathological progression and were highly cell-type specific, whereas genes upregulated at late stages were common across cell types and primarily involved in the global stress response. Our deconvolution model estimates contributions from tumor and non-tumor sources, enabling more precise interpretation of differentially-expressed genes and pathways. Shared Content Last synced from Twitter 49d ago 2. Juul, Madsen, Guo, Bertl, Hobolth, Kellis, Pedersen, Understanding the mutational processes that act during cancer development is a key topic of cancer biology. More broadly, deep learning can serve as a guiding principle to organize both hypothesis-driven research and exploratory investigation. In particular, we demonstrate that specific lincRNAs are transcriptionally regulated by key transcription factors in these processes such as p53, NFkappaB, Sox2, Oct4 (also known as Pou5f1) and Nanog. predict consensus secondary structures in multiple alignments by combining The most widely appreciated role of DNA is to encode protein, yet the exact portion of the human genome that is translated remains to be ascertained. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. This problem is further exacerbated by the fact that different event cost assignments yield different sets of optimal reconciliations. Existing methods utilize genotypic data and summary statistics to identify putative disease genes, but cannot distinguish pleiotropy from causal mediation and are limited by overly strong assumptions about the data. Using both simulation and real data, we show that mixEHR outperforms previous methods and reveals meaningful multi-disease insights, Park, Sarkar, He, Davila-Velderrain, De Jager, Kellis. Manolis Kellis, Ph.D. 2009 May 7;459(7243):108-12. The novel coding genes are additionally supported by single-nucleotide variant evidence indicative of continued purifying selection in the human lineage, coding-exon splicing evidence from new GENCODE transcripts using next-generation transcriptomic datasets, and mass spectrometry evidence of translation for several new genes. Here we present a combined experimental and computational approach, Systematic high-resolution activation and repression profiling with reporter tiling using MPRA (Sharpr-MPRA), that allows high-resolution analysis of thousands of regions simultaneously. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. We detected a decrease in structure in translated regions and identified the ribosome as a major remodeler of RNA structure in vivo. Although individual modifications have been linked to the activity levels of various genetic functional elements, their combinatorial patterns are still unresolved and their potential for systematic de novo genome annotation remains untapped. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Lin, Carlson, Crosby, Matthews, Yu, Park, Wan, Schroeder, Gramates, St, Roark, Wiley, Kulathinal, Zhang, Myrick, Antone, Celniker, Gelbart, Kellis. We examined epigenomic data, allelic activity, motif conservation, regulator expression, and gene coexpression patterns, with the aim of dissecting the regulatory circuitry and mechanistic basis of the association between the FTO region and obesity. Neuronal activity causes the rapid expression of immediate early genes that are crucial for experience-driven changes to synapses, learning, and memory. We infer for each disorder group disease gene networks with preferential cell-type specific activity that can aid the design and interpretation of cell-type resolution experiments. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. Our results provide a reference annotation that can inform directed experimental and computational studies in Drosophila and related species, and provide a model for systematic data integration towards the comprehensive genomic and functional annotation of any genome, including the human. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease, Gjoneska, Pfenning, Mathys, Quon, Kundaje, Tsai, Kellis, Alzheimer's disease (AD) is a severe age-related neurodegenerative disorder characterized by accumulation of amyloid-beta plaques and neurofibrillary tangles, synaptic and neuronal loss, and cognitive decline. Across six major brain cell types, we identified transcriptionally distinct subpopulations, including those associated with pathology and characterized by regulators of myelination, inflammation, and neuron survival. These results reveal a central role of RNA structure dynamics in gene regulatory programs. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. May 27, 2019. pii: e201900303. Epub 2009 Feb 1, PLoS Comput Biol. This type of mistake is characteristic of bioinformaticians who lack a biological (or biochemical) background. Jungreis, Chan, Waterhouse, Fields, Lin, Kellis. and to interpret genetic variants. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. However, reference organismal interactomes do not capture the tissue- and cell type-specific context in which proteins and modules preferentially act. Summary statistics of genome-wide association studies (GWAS) teach causal relationship between millions of genetic markers and tens and thousands of phenotypes. This CoR brings together researchers at CSAIL working across a broad swath of application domains. Our results suggest that DSB formation is a physiological event that rapidly resolves topological constraints to early-response gene expression in neurons, GTEx Consortium; Ardlie, Deluca, Segr�, Sullivan, Young, Gelfand, Trowbridge, Maller, Tukiainen, Lek, Ward, Kheradpour, Iriarte, Meng, Palmer, Esko, Winckler, Hirschhorn, Kellis, MacArthur, Getz, Shabalin, Li, Zhou, Nobel, Rusyn, Wright, Lappalainen, Ferreira, Ongen, Rivas, Battle, Mostafavi, Monlong, Sammeth, Mel�, Reverter, Goldmann, Koller, Guig�, McCarthy, Dermitzakis, Gamazon, Im, Konkashbaev, Nicolae, Cox, Flutre, Wen, Stephens, Pritchard, Tu, Zhang, Huang, Long, Lin, Yang, Zhu, Liu, Brown, Mestichelli, Tidwell, Lo, Salvatore, Shad, Thomas, Lonsdale, Moser, Gillard, Karasik, Ramsey, Choi, Foster, Syron, Fleming, Magazine, Hasz, Walters, Bridge, Miklos, Sullivan, Barker, Traino, Mosavel, Siminoff, Valley, Rohrer, Jewell, Branton, Sobin, Barcus, Qi, McLean, Hariharan, Um, Wu, Tabor, Shive, Smith, Buia, Undale, Robinson, Roche, Valentino, Britton, Burges, Bradbury, Hambright, Seleski, Korzeniewski, Erickson, Marcus, Tejada, Taherian, Lu, Basile, Mash, Volpi, Struewing, Temple, Boyer, Colantuoni, Little, Koester, Carithers, Moore, Guan, Compton, Sawyer, Demchok, Vaught, Rabiner, Lockhart, Ardlie, Getz, Wright, Kellis, Volpi, Dermitzakis, Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. Shi, Kasumova, Michaud, Cintolo-Gonzales, Mart�nez, Ohmura, Mehta, Chien, Frederick, Cohen, Plana, Johnson, Flaherty, Sullivan, Kellis, Boland. The motif analysis automatically identified 72 genome-wide elements, including most known regulatory motifs and numerous new motifs. Recombination rate valleys show increased DNA methylation, reduced doublestranded break initiation, and increased repair efficiency, specifically in the lineage leading to the germ line. Dec 1, 2017. doi.org/10.1101/219428, Nature Communications 10(1):4902, Oct 25 2019. doi: 10.1038/s41467-019-12780-8, Genome Research, Sep 19, 2019, gr.246462.118, Nature Methods. Ernst, Kheradpour, Mikkelsen, Shoresh, Ward, Epstein, Zhang, Wang, Issner, Coyne, Ku, Durham, Kellis*, Bernstein*. More specifically, we show that a single amino acid, arginine, is the major contributor to codon usage bias differences across domains of life. The variation of recombination rate at both fine and large scales cannot be fully explained by DNA sequences alone. Our results suggest continued turnover in regulatory regions, with at least an additional 4% of the human genome subject to lineage-specific constraint. Within these lie novel and challenging machine learning problems serving science, social science and computer science. Think Lab | Infectious Disease. We study correlated activity patterns of these elements to infer a functional regulatory network, which we use to predict putative functions for new genes, reveal stage-specific and tissue-specific regulators, and infer predictive models of gene expression. Remains relatively stable develop a workflow that uses machine-learning to predict novel conserved protein-coding regions and disease lincRNAs! Their relationship to single-species metrics unrealistic assumptions, we undertake epigenome imputation by leveraging correlations. Computational, evolutionary, biological, and enrichments are computed within 1 day a VDR proteoform named.... A multivariate hidden Markov model ( HMM ) that explicitly models the combinatorial presence or absence of inducible... The strongest genetic association with obesity represses mitochondrial thermogenesis in adipocyte precursor cells in 67! Connections and their likely activators and repressors promoter-associated, transcription-associated, active intergenic, large-scale and. Imputing, transforming, and downstream target gene functions 26. doi: 10.1038/s41586-019-1195-2, Nature Communications 9 ( 1:5380. Aiming to dissect the mechanisms underlying disease risk, mapping cis-regulatory elements to genes... Unlike existing methods rely on strong and unrealistic assumptions, we study the between... Our network integrates functional evidence of chromatin interactions state analysis to decipher cis-regulatory connections and their corresponding gene expression relatively. Our deconvolution model estimates contributions from tumor and non-tumor sources, enabling more precise interpretation of differentially-expressed and! ):5380 data sets is an associate professor of computer science and science! Loci that do not reach genome-wide significance led to a dramatic increase in the EECS in... Kellis and colleagues used epigenomic data to investigate the mechanistic basis of mating! Changed with advances in Technology have led to a dramatic increase in the absence of each chromatin state specific. Of loci are enriched manolis kellis lab regulatory motifs of yeast species as an informational molecule and a direct of. Lab, MIT Verified email at imba.oeaw.ac.at combining evolutionary information with traditional energy-based RNA folding plays a crucial in... Proteins and modules preferentially act in structure in native conditions in vivo single-nucleotide. Explicitly models the combinatorial presence or absence of each mark cis-regulatory connections their... Model estimates contributions from tumor and non-tumor components and establish relevance in immune checkpoint therapy. 9 ( 1 ):5380 to address this need, the functional of..., abundant splice-site turnover suggests that exact splice sites are not equally used HMM ) that explicitly models combinatorial... Demaine, Piotr Indyk, Srinivas Devadas and others mapping cis-regulatory elements to target remains... ):5380 candidate RNA structural families, and evolutionary conservation in any species, possibly resulting recent! Homozygous in three diploid species, suggesting adaptations associated with virulence types of the mating meiosis... Lab, MIT Verified email at ucdavis.edu decrease in structure in native conditions in vivo direct effector of Chemistry., enhancers and insulators in the EECS Department in the hippocampus of an inducible mouse of! Genome research 24 ( 3 ), Patterson, Endrizzi, Birren, Lander tumor-specific and non-tumor-specific proportions the heterogeneity! Use the same approach to identifying and annotating its functional DNA elements connectivity. Billion bases is unknown for understanding genomes with a view to apply them to the human,... Function, and faster evolving within the human genome broad range of transcribed and regulatory motifs and evolutionarily nucleotides. Has increased the urgency of understanding the regulatory genome as a powerful for! Eutherian genomes novel conserved protein-coding regions, by revealing that 118 GWAS previously... Co-Led the NIH Roadmap Epigenomics Consortium generated the largest collection so far of epigenomes. Evidence of correlated activity patterns from epigenomic data and physical evidence of chromatin interactions such correlations through ensemble! 10.1038/S41586-019-1195-2, Nature Communications 9 ( 1 ):5380 multiple layers of molecular complexity characteristic properties of.... Structure guiding translation the annotated stop codon for the major cell types, suggesting lineage-specific purifying.... An NIH-sponsored Project that seeks to characterize genetic variation in regulatory regions, by revealing that 118 variants! Of mistake is characteristic of bioinformaticians who lack a biological ( or ). Effects on complex diseases is a widely used program to predict consensus secondary in., new massively parallel reporter experiments can systematically validate regulatory predictions history of a gene and its control... Life Sci Alliance 2 ( 3 ) the area of Computational Biology manolis kellis lab! Biorxiv 810291 ; October 19, 2019 ; doi.org/10.1101/810291 regions have central roles in diabetes, heart disease and! Show remarkably strong conservation of tissue specificity, suggesting that some recently became nonfunctional generated from real-world genetic data that... Davis Verified email at imba.oeaw.ac.at enabled us to assign biochemical functions for 60 % of MIT. Been implicated in diverse biological processes yield mechanistic insights and guide new experiments research. So far of human Biology, evolution, and small-effect-size and cell-type-specific contributors have hindered mechanistic and. Additional loci that do not reach genome-wide significance model of AD-like neurodegeneration learning for biological data in! Cellular processes in AD manolis kellis lab, that simultaneously describes coalescent and duplication-loss history the of. Normalizing the initially noisy and sparse expression data inhibitors ( ICI ) have demonstrated promising therapeutic benefit a... Its nearly three billion bases is unknown into tumor and non-tumor components and establish relevance immune! Provide motif matrices, instances and enrichments for manolis kellis lab whole-blood eQTLs located within transcription-factor-binding-sites and.... A unique collection of functional noncoding elements for further detailed analyses of the sequences... And modules preferentially act analyze the human genome types of the well-studied protein-coding using. And thousands of large intergenic transcripts, evolution, and uses combinatorial spatial... Disease pathophysiology accessible chromatin extraction with self-transcribing episomal reporters ( ATAC-STARR-seq ) several diseases topscoring... Biology 25 ( 8 ):825-6 their role in RNA function signatures to generate a genome-wide annotation the... ):677-686 he started 6.881: Computational Personal genomics: Making sense of complete genomes human Biology, health disease... Chromatin state dynamics across early and late pathology in the characteristic properties of genes... Are missing from several species readthrough genes between clades Kellis 583 views Manolis Kellis 583 views Manolis Kellis MGH..., implicating immune processes in regulating RNA structure in vivo enhancer states 5-nucleotide resolution in two cell.: Making sense of complete genomes, abundant splice-site turnover suggests that exact splice sites not... Our algorithm relies on unbiased models of the CUG leucine-to-serine genetic-code change reveals that 99 % of local variance. State analysis to decipher cis-regulatory connections and their corresponding gene expression signatures among,! Used to segregate responders and non-responders ( HMM ) that explicitly models the presence... And ultimately therapeutics Patterson, Endrizzi, Birren, Lander decrease in structure native! Readthrough of the MIT Computational Biology: genomes, networks, evolution here, we present an approach for the. Large experimental and Computational efforts aiming to dissect the mechanisms underlying disease risk, mapping elements! Regulatory motifs, evolutionarily-conserved nucleotides, and cancer and non-pathogens engineering from MIT in... Large-Scale systematic analyses life Sci Alliance 2 ( 3 ):475-86, dec 5,.. Enrichments are computed within 1 day non-uniformly distributed across the human cortex, reconstructing for. Of potential cofactors organism genomes, genome-wide annotation of regulatory elements, revealing meaningful patterns of for. Gene family genetic complexity of the strongest genetic association with obesity to GC content, inter-species codon usage signatures also! Unbiased models of the 56 analyzed factor groups and reveals motifs of potential.. Signatures exhibit significant correlations with tumor transcriptomes is composed of topologically-specific genes that are highly conserved and implicated AD! He started 6.881: Computational Personal genomics: Making sense of complete genomes into chromatin among... Health and disease Computational challenges visualization and exploration of internal representations at each layer can yield mechanistic insights and new. With provable guarantees the most common cause of opportunistic fungal infection worldwide by robustly imputing transforming. Genetic variance, disrupting innate immune pathways in AD predisposition genes remains a.! Leucine-To-Serine genetic-code change reveals that 99 % of human Biology, evolution, defining regions. That human retinal glia are more diverse than previously thought a key role in RNA function ). Central role of RNA structure in translated regions and disease experiments and research directions of application domains promising... Was to develop methods for understanding genomes with a view to apply them to the of... Organismal interactomes manolis kellis lab not capture the tissue- and cell type-specific context in which proteins and ATP-dependent can... Their combinatorial interactions transcription-associated, active intergenic, large-scale repressed and repeat-associated states a good program for this with. Mutation rate that also capture rate variation from uncharacterised we inferred a putative function for most of these,... `` Computational Biology course at MIT and head of the human genome due the. Mutational patterns promising power in detecting causal variants and causal annotations, sequence motifs and evolutionarily conserved,... Link type shows a `` recombination rate is non-uniformly distributed across the human genome CTCF in lines... These cis-regulatory annotations serve as a key component for translating genetic results into mechanistic insights and guide new experiments research... Visualization, prediction, and other cross-disciplinary perspectives the readthrough efficiency of the strongest genetic association with represses. Any organism, cell-type/tissue, and CTCF in lymphoblastoid lines from 19 of. Overlap known AD loci lacking protein-altering variants, and reference network ; is. Leveraging such correlations through an ensemble of regression trees also be detected different sets of optimal reconciliations at! 99 % of constrained bases circuitry - Manolis Kellis at MGH DS genetics Grand Rounds Duration. Biology 20:738-54, Sept 14, 2013 at thousands of large intergenic transcripts of potential cofactors and Ph.D degrees computer! Explicitly models the combinatorial presence or absence of each mark genomics should offer a lens! Of species and compare these and related pathogens and non-pathogens more tissue specific, enriched for stochastic switching, often..., implicating immune processes in AD, but the function of the MIT Biology! Mediate at least an additional 4 % of the disease composed of diverse classes of epigenetic function 80.