Improving Data-Analysis Capabilities for Wheat Breeders

Crop breeders use genetic and genomic data to help them develop new crop strains that have desired traits such as yield, disease resistance, drought tolerance, and baking quality. The International Wheat and Maize Improvement Center (CIMMYT), the world’s primary source of breeding material for wheat and corn (maize), has amassed a huge database of information about millions of wheat genotypes. 

GEMS Informatics, a joint initiative of which MSI is a part, worked with CIMMYT to analyze a subset of their database, the 2012-2024 spring and durum wheat international nursery dataset. The project had three goals:

  • Build a Python code repository with a scalable infrastructure
  • Resolve naming discrepancies identified in the CIMMYT data
  • Provide harmonized pedigree datasets and query capabilities via an Application Programmer Interface (API) to CIMMYT

The project used MSI resources in several ways:

  • Nathan Carlson (MSI Operations group) set up the server and wrapped the queries for the API
  • Dr. Kevin Silverstein (MSI Bioinformatics group) and Dr. Jeff Thompson (U-Spatial) developed the Python code
  • Considerable computing time on MSI’s Agate supercomputing cluster was used to pre-compute pairwise genetic relatedness coefficients across the variety collection

A story about this project appears on the GEMS Informatics blog: Pedigree analysis tools illuminate ancestry of over 8.5M wheat varieties in CIMMYT’s international nursery.

GEMS Informatics is a joint initiative led by the College of Food, Agricultural and Natural Resource Sciences and the Research Computing group in the Research and Innovation Office at the University of Minnesota. They produce and enable the creation of research-ready scientific data, and turn that data into actionable information for farmers, scientists, governments, or companies. MSI, which is part of Research Computing, provides computing resources and expertise for GEMS Informatics research. 

field of wheat