Notice: This is an archived and unmaintained page. For current information, please browse
  1. Genomics Meets Geology

    The last third of the last century could be called the decades of molecular biology. Biologists turned the spotlight of chemistry on biological black boxes and began to understand how cells and inheritance function at a molecular level. The iconic capstone of this work was the sequencing of the human genome.

    But Steven A. Benner, a biological chemist at the University of Florida and member of the NASA Astrobiology Institute (NAI), is turning that trend toward reductionism on its head.

    “While all the biologists are rushing to become molecular biologists and chemists,” he says, “here we are chemists trying to become natural historians.”

    Benner and his colleagues have not left organic chemistry behind. Instead they’re combining chemistry, geological history and paleontology in an approach aimed at better understanding how life on Earth works now and how it evolved.

    When researchers discover a new protein, they want to know what proteins it resembles and how it works. Generally, they look for amino acid sequence similarity between the new protein and known proteins in a database. If they find a good match – indicating an ancestral relationship, or homology, between the two proteins – they predict that their new protein might function like the known protein it matches.

    Take the case of leptin, the so-called obesity protein.

    Following the discovery in the mid-1990s that a mutation disabling the leptin protein caused experimental mice to become obese, pharmaceutical companies pulled out their wallets. Researchers thought that therapeutically manipulating leptin protein levels in humans might affect weight gain. Leptin entered Phase 2 trials as a human anti-obesity drug. But six years later, we still have no magic pill to avoid gaining weight. What went wrong?

    Nothing, Benner says. “This illustrates the difficulties with assuming that if two proteins are related by common ancestry, then they must function in analogous ways.” Bad assumption, he asserts. “Perhaps one hundred million dollars in research and development money has been targeted at leptin based on the view that just because the loss of leptin in mice leads to a rotund rodent, then rotund humans must be deficient in leptin,” Benner says.

    Benner’s analysis of leptin evolution might have saved money. Published in 1998, the analysis showed that the sequence of leptin evolved rapidly in the lineage leading to primates after primates diverged from rodents. Rapid evolution in the sequence of a protein implies a change in the behavior of a protein. And change in the behavior of a protein implies a change in how it functions. “At the very least”, Benner noted, “one must select a primate as a model for humans to do pre-clinical testing, not a rodent model.”

    Some pharmaceutical companies took Benner’s advice. Here was an example where an historical view of biomolecular structure has had an impact on the practice of biomedical research. “The NASA Astrobiology Institute specializes in the history of life,” Benner notes. “But few expected a historical view to be important to the practical medical sciences.”

    The simple – and most commonly used – models for studying the evolution of proteins assume that a protein is just a chain of amino acids, like a necklace of colored beads arranged on a silk string.

    Cells do build proteins as strings of amino acids, but a working protein is more complicated. The strings of amino acids fold into more or less stable three-dimensional shapes. This folding brings amino acids from different, sometimes distant, parts of the string close together, where they function together. So how a protein folds turns out to be crucial to how it functions.

    The speed with which different proteins evolve, and the patterns that this evolution follows, hold a key to deducing fold and function, Benner says. The difference between how proteins really behave and how simple linear models would predict they behave actually contains a signal about how the proteins fold and how they function.

    To extract this information, Benner takes a multidisciplinary approach that combines computers with chemistry, biology, paleontology and history. “We begin by reconstructing the history of the protein family from the sequences of the descendent proteins” he says. The process is very similar to the way that ancient languages are reconstructed from their descendent languages, or the way in which one might infer how a parent might appear by examining the parent’s children.

    In the late 1980s, Benner was joined by Prof. Gaston Gonnet, from the Swiss Federal Institute of Technology, where they built a bioinformatics programming workbench called DARWIN (Data Analysis and Retrieval With Indexed Nucleic acid-peptide sequences). DARWIN offered a high-level programming environment in which scientists could ask questions about genomic sequences. This did not require them to know much computer science, a situation perfect for biologists wishing to exploit the information in genomic sequence databases.

    Benner then partnered with a commercial company called EraGen ( to build a commercial product to further empower biologists wishing to use genomic sequences. Called the “Master Catalog,” the product holds an evolutionary history for every family of protein sequences known to date. Within these families is captured information about changing function (such as the rapid evolution displayed by the leptin family), conserved function, and distant homologs. By pre-computing all of this information from genomic data, the Master Catalog allows biologists to begin using evolutionary models to analyze gene sequences.

    “Once you have the history of a family of proteins,” Benner remarks, “the fun can begin, especially if you know something about the natural history of the life that contains the proteins.” Here, paleontology and geologic history become part of the story. “And every protein family provides a story that forms a hypothesis that can guide experiments,” Benner says.

    Consider just one. About 38 million years ago, Earth’s climate changed dramatically. The Drake passage between South America and Antarctica opened, oceans mixed, and the globe cooled and dried.

    Prairies and savannahs appeared for the first time, huge swaths of land with gritty, low-quality food, the grasses. And grasses, Benner says, provide such poor quality nutrition that grazing mammals cannot digest it without some help.

    A cow does not really eat grass,” Benner says. “She collects it, and then feeds the grass to bacteria in her first stomach. Then the cow eats the bacteria.”

    Bringing molecular evolution into the picture, Benner notes that a close look at the cow genome shows a period of rapid change, and the emergence of enzymes to digest bacteria, about 38 million years ago. “When we see a protein with unknown function rapidly changing in the ancestor of the cows 38 million years ago, we immediately suspect that it may have something to do with this dramatic change in physiology.”

    NAI member Monica Riley, a microbiologist specializing in molecular evolution at The Marine Biological Laboratory in Woods Hole, Massachusetts, has used the Master Catalog to explore the evolution of proteins in the common gut bacterium E. coli. Riley was one of the scientists who recognized that pieces of proteins, or modules, might have independent evolutionary history. DARWIN and the Master Catalog can take researchers a step beyond looking only at amino acid sequences, Riley says.

    “Master Catalog has two things,” Riley says. “It uses DARWIN, which is superb for accurate distant relationships and it breaks proteins into subunits, modules.”

    Riley came up with some surprises using Master Catalog. “We found that although some proteins are very simple and have a single function… there are other proteins that are double or triple size, a result of past gene fusion. And what is fused to what is different from organism to organism.” Both ancestral proteins may continue to function in the single, larger new protein, or one function may be lost.

    Such gene fusions could confuse researchers using simpler databases, Riley says. “If you have proteins 1, 2 and 3, 1 and 2 might be joined in one organism, 2 and 3 in another and 1 and 3 in another. So, if you’re looking for sequence similarities, you’re going to find 1s in two of those, 2s in two of those and 3s in two of those. And it’s going to be very confusing unless you realize that you have fused genes.” Using the Master Catalog’s protein module information helps ferret out gene fusions by highlighting similarities between modules in the fused protein to modules in two different ancestral proteins.

    The Master Catalog now contains evolutionary histories of about 50,000 families of sequence modules, described using a combination of sequence alignment, an evolutionary tree (with approximate dates assigned to the branch points), reconstructed ancestral amino acid sequences and either a predicted folding pattern or an actual folding pattern if one has been experimentally determined, Benner says.

    “Now you’re able to do a kind of grand synthesis, taking advantage of the fact that genomic sequences have their history very transparently written in them. You want to combine that molecular history with the natural history, which is known from the geological records and the paleontological records. When you’re dealing with animals and plants, of course, that fossil record for the last 500 million years is very good and getting better,” he says.

    “I think the broadest thing one can take home from this,” says Harvard paleontologist Andrew Knoll, “is that fields that I think many people would have thought lay at opposite ends of the biological spectrum ten years ago – that is molecular evolution and paleontology – are coming into very close and creative contact.

    “I’m not sure it changes dramatically what I do when I take a hammer and go out to a rock outcrop,” he continues, “but I think, intellectually, the integration of these several fields is very important.”

    What’s Next

    Where does this all lead? Benner does not set his sights low. He calls the next step the Phanerozoic Project, named after the Phanerozoic eon, which stretches from about 542 million years ago to the present. During this half billion years, multicellular organisms evolved and flourished on Earth. “Phanerozoic,” in Latin, means “revealed life,” referring to the fossils left by multicellular organisms.

    “What the Phanerozoic Project is supposed to do is build a model of the evolution of life on Earth for the last 500 million years,” Benner says.

    Although he and his former students have this field pretty much to themselves today, Benner doesn’t expect to go it alone. “The Phanerozoic Project,” he says, “will sooner or later be a civilization-wide enterprise.”