Research Accomplishment Reports 2011

Ag Research logo

Computational Methods for mRNA Transcriptome from RNA-Seq Data

J.N. MacLeod
Department of Veterinary Sciences

 

Non-Technical Summary

We are entering an era of "personalized genomics" in which individualized medical care will be based on a patient's clinical history, presenting clinical status, and the specific nucleotide sequence of her/his genome. Rapid innovation of next generation sequencing technologies (NextGen) based on massively parallel nucleotide sequencing-by-synthesis without dideoxy-based chain termination is translating this vision into a practical reality.

However, individual genomic DNA data is only the beginning; personalized transcriptomics, the genome-wide measurement of gene expression and alternative splice variants in patients, will give even greater insight into the functional basis of disease. Biopsy samples, for example, will not only be assessed microscopiccally, but can also have their mRNA transcriptome determined and analyzed. Today it is possible to use NextGen technology to sequence the transcriptome of a tissue sample for around $1000.

The biological analysis of these data, however, remains an unsolved problem. New computational resources and analytical methods must be developed in order to make the benefits of individualized medicine feasible. In a January, 2009, editorial entitled "RNA-Seq: a revolutionary tool for transcriptomics" published in Nature Reviews Genetics (Wang et.al. 2009), Wang, Gerstein, and Snyder from Yale University discuss how RNA-seq technology has the transformative potential of specifically defining the starts and ends of exons and transcripts, resolving the extent of spliced heterogeneity, and capturing the quantitative dynamics of the transcriptome.

A major challenge for RNA-seq technology was recognized, however, as the critical need for "computationally simple methods" for the analysis of these massive datasets. This project targets the challenge directly.

2011 Project Description

Outputs completed during the reporting period were four talks at three different scientific meetings (Orthopaedic Research Society, Havemeyer Equine Genomics Workshop, and the 4th International Symposium on Animal Functional Genomics).

An additional scientific seminar was given at the Virginia Polytechnic Institute, Blacksburg, Virginia.

A public seminar to a non-scientific audience of horse industry professionals was given at the 2011 Thoroughbred Pedigree, Genetics, and Performance Conference. Lexington, Kentucky.

Three peer-reviewed scientific papers were published. A public workshop was organized and presented entitled "genomic annotation and functional modeling" (agenda below). There was an emphasis on equine genomics, but the underlying concepts and public database resources are relevant to most species.

Genomic Annotation and Functional Modeling Workshop Maxwell H. Gluck Equine Research Center
  • Tuesday, November 15, 2011, Gluck Center Auditorium Session 1: Genomic Annotation 8:00-8:15 Welcome and Introductions. 8:15-8:30 Aims of the Workshop. 8:30-9:30 Genomic annotation and pipelines at NCBI 9:30-10:00 Functional annotation of the horse genome 10:00-10:30 Discussion: what can the community do to assist with genomic annotation What resources are needed How can we contribute effectively 10:30-11:00 Break 11:00-11:30 Discussion: summary and formulation of action points for genomic annotation. 11:30-12:00 Strategies for modeling functional genomics datasets. 12:00-1:00 Lunch
  • Tuesday, November 15, 2011, Gluck Center Auditorium Session 2: Functional Modeling of Omic Data Sets 1:00-2:00 Approaches for functional modeling: GO enrichment, pathways & interaction analysis. Bringing it all together. 2:00-3:00 Introduction to GO for functional modeling: what the user needs to know. 3:00-3:45 Getting GO, adding GO and tools for functional enrichment.
  • Wednesday, November 16 2011, 246 Barnhart Building Computer Laboratory - Data Analysis and Tutorials 8:30 - 9:00 Strategy for working on your own data & tutorials. 9:00-12:00 Working with your own data or teaching set examples. Online Genomic Annotation Resources and Examples 1. Resources and information for Structural Annotation at NCBI 2. Overview of functional modeling strategy 3. Converting accessions between databases 4. GO browsers 5. Functional modeling examples 6. Summary of Tools for gene expression analysis 7. Websites & References used in this workshop

2011 Impact

Research that determined the primary DNA base sequence of the horse genome was completed in 2007 and 2008. This major accomplishment is in many ways just a beginning. Distributed within the 2.7 billion bases of DNA that compose the equine genome are approximately 20,000 protein-encoding genes. Understanding the structure of these 20,000 genes, what tissues express which genes, when the genes are expressed, and how much they are expressed represent functional parameters studied by many scientists working on equine health and disease.

The goal of this project is to analyze the equine gene structure and expression while developing a workflow for mRNA transcriptome capture, analysis, and comparison via high-throughput sequencing protocols supported by novel data-driven algorithms. Impact of algorithms and computational methods will be a data-driven processing pipeline applicable to any mRNA transcriptome, requiring only a reference genome, and without dependence on a priori gene structure annotations.

Molecular mechanisms that determine cell and tissue phenotypes, whether normal or pathological, are determined either directly or indirectly by differential patterns of gene expression. Biological and biomedical impact will be qualitative (structural) and quantitative (expression level) mRNA data on the entire equine transcriptome.

2011 Publications

Kikuchi M, Nakano Y, Nambo Y, Haneda S, Matsui M, Miyake Y, MacLeod JN, Nagaoka K, Imakawa K. Production of Calcium Maintenance Factor Stanniocalcin-1 (STC1) by the Equine Endometrium During the Early Pregnant Period. Journal of Reproduction and Development, 57:203-211, 2011.

Vanderman KS, Tremblay M, Zhu W, Shimojo M, Mienaltowski MJ, Coleman SJ, MacLeod JN. Brother of CDO (BOC) Expression in Articular Cartilage. Osteoarthritis and Cartilage, 19:435-438, 2011.

Go YY, Bailey E, Cook DG, Coleman SJ, Macleod JN, Chen KC, Timoney PJ, Balasuriya UB. Genome-Wide Association Study Among Four Horse Breeds Identifies a Common Haplotype Associated with the In Vitro CD3+ T Cell Susceptibility/Resistance to Equine Arteritis Virus Infection. Journal of Virology, [Epub ahead of print], 2011.

Hestand MS, Zeng Z, Wang K, Coleman SJ, Liu J, MacLeod JN. Transcriptome-Level Characteristics of Tissue Specific Exon Splicing. Ninth International Equine Genome Workshop. 2011. (abstract)

Coleman SJ, Zeng Z, Hestand MS, Liu J, MacLeod JN. Analysis of Unannotated Equine Transcripts Identified by RNA-Sequencing. Ninth International Equine Genome Workshop. 2011. (abstract)

MacLeod JN, Coleman SJ, Hestand MS, Liu J. Complexity of the Equine mRNA Transcriptome. 4th International Symposium on Animal Functional Genomics. 2011. (abstract)