Scientists at New York University’s Center for Genomics and Systems Biology, the American Museum of Natural History, Cold Spring Harbor Laboratory, and the New York Botanical Garden have created the largest genome-based tree of life for seed plants to date.

Their findings, published today in the journal PLoS Genetics, plot the evolutionary relationships of 150 different species of plants based on advanced genome-wide analysis of gene structure and function.

This new approach, called “functional phylogenomics,” allows scientists to reconstruct the pattern of events that led to the vast number of plant species and could help identify genes used to improve seed quality for agriculture.

“Ever since Darwin first described the ‘abominable mystery’ behind the rapid explosion of flowering plants in the fossil record, evolutionary biologists have been trying to understand the genetic and genomic basis of the astounding diversity of plant species.”- Rob DeSalle Sackler Institute Comparative Genomics

“Having the architecture of this plant tree of life allows us to start to decipher some of the interesting aspects of evolutionary innovations that have occurred in this group,” he added. The research, performed by members of the New York Plant Genomics Consortium, was funded by the National Science Foundation (NSF) Plant Genome Program to identify the genes that caused the evolution of seeds, a trait of important economic interest. The group selected 150 representative species from all of the major seed plant groups to include in the study.

The species span from the flowering variety – peanuts and dandelions, for example – to non-flowering cone plants like spruce and pine. The sequences of the plants’ genomes – all of the biological information needed to build and maintain an organism, encoded in DNA – were either culled from pre-existing databases or generated, in the field and at the New York Botanical Garden in the Bronx, from live specimens.

“Previously, phylogenetic trees were constructed from standard sets of genes and were used to identify the relationships of species.” – Gloria Coruzzi Professor New York University “In our novel approach, we create the phylogeny based on all the genes in a genome, and then use the phylogeny to identify which genes provide positive support for the divergence of species,” he added.

With new algorithms developed at the Museum and NYU and the processing power of supercomputers at Cold Spring Harbor Laboratory and overseas, the sequences – nearly 23,000 sets of genes (specific sections of DNA that code for certain proteins) – were grouped, ordered, and organized in a tree according to their evolutionary relationships. Algorithms that determine similarities of biological processes were used to identify the genes underlying species diversity. The results support major hypotheses about evolutionary relationships in seed plants.

The most interesting finding is that gnetophytes, a group that consists mostly of shrubs and woody vines, are the most primitive living non-flowering seed plants – present since the late Mesozoic era, the “age of dinosaurs.”

They are situated at the base of the evolutionary tree of seed plants. “Genes required for the production of small RNA in seeds were at the very top of the list of genes responsible for the evolution of flowering plants from cone plants.” – Rob Martienssen Professor Cold Spring Harbor Laboratory

The data and software resources generated by the researchers are publicly available and will allow other comparative genomic researchers to exploit plant diversity to identify genes associated with a trait of interest or agronomic value.

These studies could have implications for improving the quality of seeds and, in turn, agricultural products ranging from food to clothing.