The emergence of life as the ultimate problem in Biocomplexity






In the news


Apply to join Nigel's group


Nigel Goldenfeld

Biocomplexity Theme Leader
Institute for Genomic Biology,
University of Illinois at Urbana-Champaign

The Institute for Genomic Biology's Biocomplexity Theme recently received two substantial grants that probe life's origins. The National Science Foundation's prestigious Frontiers in Integrative Biological Research (FIBR) program has awarded nearly $5,000,000 for a project entitled "The emergence of life: from geochemistry to the genetic code", which is to be performed by researchers at a consortium of research institutions, including Professors Woese, Goldenfeld and Luthey-Schulten at UIUC. The Department of Energy has also awarded $900,000 to Woese, Goldenfeld and Luthey-Schulten for research into the evolution of the cell's translation apparatus. In this article, Nigel Goldenfeld, the Biocomplexity Theme Leader describes some of this work, and how it fits into the Theme's overall mission.


Put a big enough group of ostensibly sensible, rational people together, and stir: what do you get? The all-too-common answer is: "the madness of crowds". Observed in financial market crashes, riots, political systems, music fashions, perhaps even academic fads, cooperative phenomena arise when ever there are sufficient communication channels open for the behavior of individuals to be subsumed by collective effects. The madness of crowds doesn't just apply to people: we see its effects in animals (think "herd mentality") and even at the microbial level, as we shall see below. All of this is an example of the burgeoning science of biocomplexity, the collision between traditional biology and modern dynamical systems theory.

Complex systems are characterized by the presence of strong fluctuations, unpredictable and nonlinear dynamics, multiple scales of space and time, and frequently some form of emergent structure (riots, herds, .). The individual components of complex systems are so tightly coupled that they cannot usefully be analyzed in isolation, rendering irrelevant traditional reductionist approaches to science, obscuring causal relationships, and distinguishing complexity from mere complication. Biological complexity, or biocomplexity, arises from the inclusion of active components, nested feedback loops, and multiple layers of system dynamics, and is relevant to numerous aspects of the biological, medical and earth sciences, including the dynamics of ecosystems, societal interactions, and the functioning of organisms.

In the natural development of the sciences, issues of complexity are sensibly postponed until they can no longer be avoided. Thus, it is no surprise that numerous disciplines from science, engineering, biology and medicine are facing fundamental obstacles of a similar nature, as their natural intellectual development inevitably encounters the barrier of complexity. At the same time, a number of factors raise the stakes for a successful and useful understanding of complexity issues, including society's increasing dependence on complex communication networks, the enormous public interest in tackling diseases of complexity, such as cancer, and the challenge of comprehending the dynamics of the biosphere, thus ensuring an appropriate response to global climate change.

The Biocomplexity Theme at the IGB focuses on a triad of inter-connected problem areas that we feel represents the heart of future biology, namely microbial ecology, evolution and systems biology. Microbial ecology, because of the pervasive but poorly understood influence of microbes on the biosphere; evolution, because without the evolutionary context, it is impossible to understand the reasons for the biological forms and relationships observed; and systems biology, because understanding requires explicit and quantitative accounting for the interactions between an organism and its environment, the relationship between the cells and the organism, and the relationship between the myriad genetic and metabolic circuits acting within cells. Some examples of the sorts of questions that are being addressed are:

  1. Systems biology: How can we understand the universal characteristics of living systems? What are the predominant patterns of gene expression? Available evidence suggests common patterns across eukaryotes, bacteria and archaea: what are the common architectural characteristics of biological networks across the three domains of life? What characterizes the deviations from the common architecture and how do these deviations reflect the different environments and evolutionary history of organisms?

  2. Microbial ecology: How can we capture complex ecosystem dynamics? Microbes dominate biogeochemical cycles on both the global and local scale. Typically, no more than 1% have been successfully cultivated: beyond 16S rRNA phylotyping, how can their spatial context, abundance, community structure and population dynamics be characterized and understood in a high-throughput manner? How can their metabolic activity be understood in relation to its coupling to biogeochemical cycles? How do we begin to analyze true genomic diversity, when existing metagenomic studies are limited to extreme and homogeneous environments? How do we integrate genomic information with precision proteomics of natural environments?

  3. Evolution: How did life evolve so quickly from abiotic processes in the early earth? Perhaps the key conceptual question of biology is to explain the emergence of living organisms from early geochemistry. What were the key physical processes that led to self-organization of early metabolism and self-reproducing molecules? Can we make statistical predictions of the likely core metabolic pathways that emerge from known geochemistry? Can these ideas be extended to extra-terrestrial environments, such as Mars or Europa, and guide remote sensing of microbial or other life?

Interestingly, it is impossible to address each of these questions in isolation. For example, in studying microbial ecology, one must identify microorganisms from portions of their DNA, because culturing is very difficult. As Carl Woese was the first to show, the pattern of mutations in the DNA allows one to reconstruct the evolutionary relationship between the organisms: when put together with the ecological context, a compelling portrait emerges of the way in which the environment has shaped the community of organisms, and the way in which the organism adapted to, or even shaped the environment. This, in turn, requires detailed modeling of the metabolic activity of the microbes, the chemical processes going on in the ecosystem, and the way in which molecules and energy are exchanged between the microbes and the rest of the ecosystem. The IGB's Biocomplexity Theme is unique in its approach to creating the experience and capability to perform this kind of integrated science from the genome up to the environment.

During the past year, Carl Woese, Zan Luthey-Schulten and Nigel Goldenfeld began to collaborate on a fundamental set of problems related to the emergence of life, using this methodology. Because the earliest forms of life must have been much simpler than anything recognizable today as life, and must have emerged out of the environment's geochemistry, we must dive straight in to the relationship between geochemistry and metabolism at the scale of atoms (i.e. concrete objects or "Its"), the emergence of information processing technology at the molecular scale ("Bits"), and finally the genesis of cellular life ("critters" or "Crits"). To hijack one of John Archibald Wheeler's memorable slogans, we can term this program "From Its to Bits to Crits".

In taking this approach to the origin of life, the characteristics of the earliest organisms become very important: do they contain clues about the origin of life that we have not yet teased out?  Astronomers have long understood that by peering out at the farthest galaxies and stars, they are effectively looking back into time, receiving light rays that have been travelling for billions of years.  Is there a "time machine" that biologists can use to look back towards the origin of life?

It turns out that there is.  Every cell contains machines, called ribosomes, that act as a factory, making proteins from raw materials following instructions laid down by the cell's DNA.  The ribosome itself has to be created, and the instructions for it are naturally found in DNA.  However, as organisms have evolved over time, the DNA sequence for the ribosome has become altered by mutations.  Mostly these are not in any essential part of the DNA sequence, but by examining the precise sequence of the ribosomal DNA, it is possible to reconstruct the evolutionary relationship of different organisms to one another.  The first person to do this was Carl Woese at the University of Illinois, who is co-PI with Goldenfeld and others off-campus on the Frontiers in Integrative Biological Research grant that is funding the research into the emergence of life.  By building the "tree of life", he discovered that there are three families of organisms, or Domains of Life: Eukaryotes (organisms such as animals and plants), Bacteria and Archaea (both microbes, historically not distinguished from one another, but genetically as different from each other as they are from eukaryotes!).  All life descended from what some scientists call the Last Universal Common Ancestor, the root of the tree of life.  Although the tree of life goes far back in time - perhaps three billion years, it does not go far enough.  The earliest organism or perhaps community of organisms at the root of the tree of life already had much of the modern cellular apparatus.  Can we peer back even further, to life's true origins?

Carl Woese and Nigel Goldenfeld at the University of Illinois believe that we can.  They have found a clue to some of the earliest features of living organisms, by looking at the genetic code itself.  The genetic code is aptly named: it tells the cellular translation machinery how to interpret the sequence of nucleotides along the genome and translate the sequence into a sequence of amino acids, thereby building proteins.  Shortly after the genetic code was cracked in the early 1960's, Woese noticed that the code was not random, but had a built-in tendency to correct mistakes.  Imagine that you are trying to decode a message, but make a mistake, perhaps by misreading your codebook.  Wouldn't it be marvellous if the code had been cleverly constructed so that a wrongly-decoded word actually was very close to the correct word?  Then, by looking at the context, you would realise your mistake and still be able to interpret the message.  Woese noticed that the genetic code had this feature: amino acids with similar chemical properties were coded in a similar way.  The particular chemical property which seemed to best characterize the amino acids was measured for each acid by Woese, and termed the "polar requirement".  There matters rested for nearly thirty years, until in 1991 David Haig and Lawrence Hurst at Oxford University discovered that the genetic code was not merely error-resistant but that it was hard to invent alternative genetic codes that would be more error-resistant than the one we have.  Haig and Hurst, and later other workers, used a computer to generate thousands (and later millions) of synthetic genetic codes, all of which were, a priori, equally acceptable from the biochemical standpoint as the actual genetic code.  Was there anything special about the genetic code we have that distinguished it from the computer-generated synthetic codes? Our genetic code turned out to be perfectly typical, and not special in any way from the computer-generated synthetic genetic codes, except when errors were measured with respect to Woese's polar requirement. When evaluated in this way, the actual genetic code stood out like a sore thumb as being the best possible code, or more accurately "one in a million".

This astonishing finding cannot possibly be a coincidence, and indicates that the genetic code has been optimized by evolution before the root of the tree of life.  Moreover, the fact that it has been optimized with respect to Woese's polar requirement is a further clue as to what selection pressures were somehow acting on the code, directly or indirectly during its evolution.

The alert reader will, at this point, be rather puzzled.  How can a code evolve?  As a code is changed randomly, the messages read from the code, i.e. the proteins of life, will become increasingly scrambled, and the organism will malfunction, almost certainly fatally.  In fact, it was just this argument that led Francis Crick, the co-discoverer of the molecular structure of DNA, to presume that the code was just some "frozen accident", inherited by all organisms from an early form of life.  Goldenfeld and former graduate student Kalin Vetsigian (now at Harvard), however, have discovered that it is possible for codes and organisms to evolve together cooperatively, especially effectively through a mechanism known as "horizontal gene transfer".

As the name suggests, horizontal gene transfer involves cells providing genes with each other, rather than having genes develop in distinct lines unique to each organism.  Present day microbes, and presumably early organisms too, use horizontal gene transfer pervasively, in place of sex to mix genes, thereby creating novel combinations of genes that can generate new functionality.  Now it appears that the genetic code evolved this way, very early on in life's history, even before the root of the tree of life.  In some sense, then, the genetic code is a fossil or perhaps an echo of the origin of life, just as the cosmic microwave background is a sort of echo of the Big Bang.

If the genetic code has its origins so early on in the evolution of life, then working backwards from the genetic code might make connections with the chemical reactions that must have been important for early life. Zan Luthey-Schulten is working with Shelley Copley (University of Colorado), Harold Morowitz (George Mason University and Santa Fe Institute) and Eric Smith (Santa Fe Institute) to try and identify plausible pathways of chemical substrates and catalysts that could have formed the earliest self-reproducing metabolic processes. Remarkably, the scenario which seems the simplest, and at the same time the one most relevant to core metabolism, generates molecules that could plausibly play a part in precursors to the genetic code. Moving still further back in time, George Cody (Carnegie Institute of Washington) is attempting to perform high pressure and temperature experiments that simulate the environment of deep sea hydrothermal vents. This locale is an exciting one for biologists: there is a rich variety of minerals, turbulent mixing, and plenty of heat to activate reactions. Many biologists consider the deep sea vents the most plausible location for the origin of life. Furthermore, there is the exciting prospect that similar vents on other worlds, such as Jupiter's moon, Europa, may be the host to extra-terrestrial microbial life - and the best place to look for it in the future.

Last, but not least, our research activity will provide new insights into the origin of life on Earth that deserve to be communicated at a non-technical level to the general public. Accordingly, we have devised an integrated outreach program that emanates from the research activity, and involves underserved undergraduates, graduate students, K-12 teachers, and ultimately the public-at-large, through a Writer-in-Residence program. All of this fits into the bigger picture of creating a course of study and program in biocomplexity at the University of Illinois at Urbana-Champaign.

Our studies are undoubtedly small, incomplete but necessary steps towards a full understanding of the emergence of life - the ultimate biocomplexity problem. We are very pleased to be making them at The Institute for Genomic Biology at the University of Illinois.

Urbana, Dec 2005

Funding Agency Disclaimer

simple_bar.gif (1835 bytes)