Pancreatic ductal adenocarcinoma (PDAC) is a heterogeneous disease, and the granular PDAC categorization has offered little insight into the biologically and functionally important changes during disease progression for developing effective treatment regimes. Genomic and epigenomic data have been used to subcategorize PDAC into more meaningful and functionally important groups. These data can be used to understand unique biological and molecular tumor signatures to target for treatment or preventative education. The inability to finely stratify PDAC has contributed to the lack of identification of early detection biomarkers, the minimally effective performance of current treatments, and the unsuccessful implementation of personalized medicine approaches.
Dr. Rick Jansen (Masonic Cancer Center; MSI PI) and co-PI Dr. Sarah Munro (Bioinformatics Analyst and Core Bioinformatics Team Manager, MSI) are working on a project called “Identifying Gene Interacting Networks in PDAC Using Deconvolution and Integrative Multi-omics Analysis,” that will use tissue samples from 32 pancreatic cancer patients to expand characterization of the subset that contains a comprehensive and complete set of genetic and epigenetic data and that also includes detailed annotation with clinical history. The immediate goals of the project are to identify key gene expression, miRNA, and methylation markers that are important to the progression of PDAC using the tumor samples. Multi-omics approaches include non-negative matrix factorization (NMF), support factor machine (SVM), principal component analysis (PCA), and integrated directed random walk (iDRW). These methods to combined -omics datasets have led to improved reproducibility compared to single marker analyses, because they are tested using a larger number of samples collected from multiple research studies which provide a type of built-in validation. They also provide a filtering process that narrows down the markers that need to be tested individually. However, these methods are still plagued by several limitations, including false-positive predictions, missing contributing risk factor information, and non-standardized ways to infer pathway activity. Even so, a key finding across these -omics studies is the fact that there are important common pathways and core mutations that occur during PDAC development and progression. KRAS and TP53 are commonly mutated genes in PDAC. Patients with mutation in at least one of these genes have a significantly poorer prognosis and disease-free survival. TP53 expression also has added value to sequencing data allowing the ability to differentiate PDAC subtypes and predict PDAC prognosis. The project’s main objective is to use modifications of newer existing statistical methods (such as Deep IDA, SIDANet, and HIP) to identify interacting effects of genomic markers across multiple bulk genomic datasets.
Weighted gene correlation analysis will also be applied independently to each available data type from this study: RNA-Seq, miRNA-Seq, and DNA methylation data. This method, also known as weighted gene co-expression network analysis (WGCNA), involves transforming the biomolecular expression data into a network and then identifying groups of highly correlated genes. Using these groups of correlated genes the researchers can then test for statistical differences within the patient cohort using clinical metadata such as disease stage, age, and survival. Typically, WGCNA is most commonly performed using RNA-Seq data; this project will also apply WGCNA to methylation and miRNA data by linking these respective data sets to relevant genes. The goal is to use multiple layers of biomolecular evidence to robustly identify possible mechanisms that influence clinical outcomes.
The study will also analyze bulk RNA-seq using FFPE samples from 17 patients who also had a previous separate bulk RNA-seq performed and deposited in TCGA. The researchers use three different deconvolution methods to compare cell type proportions for these paired data. They use the R package EdgeR to adjust for batch effect when comparing mean expression values of key pancreatic cancer genes. They selected KRAS, TP53, SMAD4, and CDKN2A as these are the genes which are most commonly mutated in pancreatic cancer. They also selected ADAM9, BMI1, CBX4, and CDH1 as these genes have been validated to have altered gene expression in pancreatic cancer compared with adjacent normal tissue across multiple studies. They perform a survival analysis with the significant gene set to categorize the patients into high risk and low risk groups and test the concordance of patient categorization across the paired samples.
The project has the following objectives:
- Define important PDAC clinical outcome categorizations using RNA expression and methylation profiles. Genome-wide methylation and gene expression marker profiles have been created to subtype disease. They have been associated with behavioral factors such as smoking and obesity or clinical outcomes like survival. Multi-omics approaches and WGCNA analysis can be utilized to identify key gene/marker networks associated with risk factor subgroups of pancreatic cancer.
- Deconvolute bulk RNA-seq data to determine cell type distributions. The researchers are interested in determining cell type distributions within their bulk samples to determine if, for example, different distributions of T and B cells are associated with survival differences across our sample data.
- Determine variations in gene expression across paired pancreatic cancer samples. The researchers are interested in determining if key pancreatic cancer genes vary across two different samples of the same tumor. The long-term goal of this objective is to harness the power of genomic analysis to optimize biopsy decisions and ultimately improve patient outcomes.
This project recently received a DSI Small Seed Grant. The Seed Grant program is intended to promote, catalyze, accelerate, and advance U of M-based data science research so that U of M faculty and staff are well prepared to compete for longer term external funding opportunities. The program was updated in Summer 2024 to include three focus areas: Foundational Data Sciences; Digital Health and Personalized Health Care Delivery; and Agriculture and the Environment. The types of awards are Rapid Response Grants and new types, Awards for DSI Faculty Fellowship and Data Sets (Data as an Asset).
This project falls under the Digital Health and Personalized Health Care Delivery focus area.
Image description: Graphical abstract of the project.