Analysing RNA-sequence data for pancreatic ductal adenocarcinoma tissue samples to identify potential biomarkers

Date
2023-09
Journal Title
Journal ISSN
Volume Title
Publisher
University of the Witwatersrand, Johannesburg
Abstract
Pancreatic ductal adenocarcinoma (PDAC) accounts for approximately 90% of pancreatic cancer and is the fourth leading cause of death with a five-year survival rate of less than 10%. Patients are asymptomatic until detection is observed at a metastatic stage, hence contributing massively towards the high mortality rate. This study was conducted to explore PDAC and its two main subtypes, the classical and basal-like subtype, in an in-depth level via bioinformatic analysis. Bioinformatics is a computational approach to evaluate biological data by analysing omics data including genomic expression and proteomic sequences. A workflow consisting of programmes and web-tools was used to analyse PDAC RNA-sequence data. The sample sets were grouped according to tumour, stage, and subtype. The workflow began with quality control using FastQC and Trimmomatic. Alignment of sequencing files and counts were done through HISAT2 and HTSeq. The main component of this workflow was differential gene expression analysis to identify differentially expressed genes (DEGs), statistically significant genes, per compared conditions. WGCNA was used for co-expression analysis to identify the hub genes involved in regulating the biological network. Lastly, in-silico validation was done by using available web tools to support the findings of this workflow. The identified tumour genes included S100A11, PKM, GPRC5A, LAMC2 and ITGA2, which may represent as universal biomarkers as sample extraction was performed from data generated from individuals belonging to 8 different countries. KRT13 and IL6 were identified in the advanced stage and their role in cancer progression have been explored in this current study. The basal-like subtype had CAV1, DCVLD2 and TGFB2 genes that contribute to treatment resistance. The common dysregulated genes in the basal-like subtype and advanced stage were analysed to evaluate the link between subtype and stage which included WNT3A, TP63, KRT13 and IGF2BP. Coexpression analysis revealed hub genes for tumour (KIF4A, SPAG5, RRM2 and AURKA), basal-like subtype (BUB1, DEPDC1 and KIF14) and classical-subtype (PTPRN and CAMK2B). Through a machine learning model, recall, precision and accuracy scores per sample conditions for the DEGs were all above 94%. These potential biomarkers all have significant roles in promoting cancer progression, aggression and resistance. Hence, these may serve as a less invasive screening method for PDAC as DEGs were classified based on tissue or blood (extracellular vesicle) biomarkers. However, further wet laboratory validation is required for these biomarkers.
Description
A dissertation submitted in fulfilment of the requirements for the degree Master of Science, to the Faculty of Science, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg, 2023.
Keywords
Pancreatic cancer, Biomarkers, RNA- sequence, Bioinformatic tools, UCTD
Citation
Jamal, Khadija Sanober. (2023). Analysing RNA-sequence data for pancreatic ductal adenocarcinoma tissue samples to identify potential biomarkers. [Master's dissertation, University of the Witwatersrand, Johannesburg]. https://hdl.handle.net/10539/41905