Notice: This is an archived and unmaintained page. For current information, please browse

2015 Annual Science Report

Massachusetts Institute of Technology Reporting  |  JAN 2015 – DEC 2015

Early Animals: The Genomic Origins of Morphological Complexity

Project Summary

Understanding the origins of life’s complexity here on Earth is paramount to finding it else-where in the universe. The fossil record indicates that complexity on Earth arose in a near geological moment – the famous Cambrian explosion – about 525 million years ago. However, molecular sequence analyses indicate that complex animals actually arose nearly 200 million years before they make their first appearance in the fossil record. This disparity between the advent of morphological complexity and its appearance in the fossil record motivates an interesting question – why is it that we cannot detect complex life here on Earth for nearly 200 million years? And if we cannot detect it on Earth, what hope would we have on an-other distant Earth-like planet? Our research is focused on addressing this question by trying to obtain a better understanding of what encodes morphological complexity in the genome. Our research suggests that a group of non-coding RNA genes – microRNAs – might be instrumental for the advent and maintenance of complexity in animals, and therefore sequencing the genomes and the transcriptomes (the ex-pressed component of the genome) from carefully chosen taxa might allow us to better under-stand the biology of animals that predated the Cambrian explosion.

4 Institutions
3 Teams
1 Publication
0 Field Sites
Field Sites

Project Progress

We have hypothesized that a more complete understanding of the origins and evolution of morphological complexity must take into consideration – in addition to the protein-coding repertoire of a genome – the non-coding repertoire, i.e., the non-coding RNAs that are functional for the expression of phenotype, but functional in a regulatory role rather than a structural role. One group of genes that have been intensely studied over the past 20 are microRNAs (miRNAs), small non-coding RNA gene products that help control the translation of more conventional mRNAs, ultimately fine tuning the amount of protein gene-product per cell per unit time. However, despite the importance of these molecules for normal development and cellular physiology, determining what is and what is not a miRNA has been difficult, and this difficulty in the recognition of a proper miRNA from the myriad other types of non-coding RNAs has precluded a proper understanding of the evolution of miRNAs, both in the context of sequence evolution of the miRNAs themselves, on the evolution of the phenotype of the organisms whose genomes house the miRNAs.

To better understand the evolution of miRNAs, our research team evaluated over 7,000 miRNA entries deposited in the public repository miRBase ( to not only quantify and better qualify the characteristics of proper miRNAs, but to also evaluate the number of false positive entries in the data base, and then to evaluate the molecular evolution of miRNA sequences, and to understand the gains of losses of miRNA genes through time across the animalian kingdom. We showed that fewer than a third of the 1,881 human miRBase entries, and only approximately 16% of the 7,095 metazoan miRBase entries, are robustly supported as miRNA genes. Furthermore, we show that the human repertoire of miRNAs has been shaped by periods of intense miRNA innovation, and that the functional products of miRNA genes show a very different tempo and mode of sequence evolution than the remainder of the sequence, and that the informational content of miRNA gene sequences is entirely predicable from this understanding of mutational propensity of miRNA sequences.

Because of the extremely high rate of false positive sequences described as miRNAs in miRBase, we established a new open access database – – to catalog this set of robustly supported miRNAs, and erected a new nomenclature system for this set of genes to highlight phylogenetic relationships between gene sequences both within the same species, and across different species. Thus, although this database complements the efforts of miRBase, it differs from it by imposing an evolutionary hierarchy upon this curated and consistently named repertoire.

Now that the near-complete repertoire of miRNAs is curated and deposited in an open-access database, the fundamental hypothesis that miRNAs have a crucial role in shaping the evolution of morphological complexity can be tested and explored in much more detail. We have now sequenced several different genomes and transcriptomes from various types of animals including complex animals like different kinds of arthropods, chaetognaths, and vertebrates, and animals whose morphology suggest a whiff of secondary simplification including flatworms and rotifers. We have preliminary evidence that miRNA repertoire complexity is primitive for animals, and that secondary simplification is accompanied by secondary losses of miRNA genes.