Bioinformatic pipelines for transcriptome analyses :understanding gene expression in black South Africans with systemic sclerosis
No Thumbnail Available
Date
2019
Authors
Mpangase, Phelelani Thokozani
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The rate of raw sequence production through Next-Generation Sequencing (NGS) has
been growing exponentially due to improved technology and reduced costs. This has enabled researchers to answer many biological questions through “multi-omics” data analyses. Even though such data promises new insights into how biological systems function and understanding many disease mechanisms, computational analyses performed on such large datasets comes with its pitfalls. In many cases, analyses of raw sequencing data can be computationally intensive and involves a combination of many bioinformatic applications, which often require different file formats in-between the analyses. Bioinformatic and computational pipelines can overcome these issues and the tedious repetitive tasks associated with the analyses of sequencing data, facilitate reproducibility of results and sharing of workflows for common analyses.
The aim of this study was to develop robust portable and reproducible bioinformatic
pipelines for the automation of RNA sequencing (RNA-seq) data analyses. Using Nextflow
as a workflow management system and Singularity for application containerisation,
two bioinformatic workflows have been developed: rnaSeqCount (https://github.com/
phelelani/nf-rnaSeqCount) for mapping raw RNA-seq reads to a reference genome and
quantifying abundance of identified genomic features for differential gene expression analyses, and rnaSeqMetagen (https://github.com/phelelani/nf-rnaSeqMetagen) for performing metagenomic analyses on RNA-seq data. The RNA-seq data of black South
African patients affected with systemic sclerosis (SSc) and unaffected individuals from
the study by Frost et al. (2018) was used to illustrate the value of the workflows. SSc is
a rare autoimmune disorder in which abnormalities in the vascular and immune systems
result in the fibrosis of the connective tissue, skin and internal organs.The RNA-seq data validated the usefulness of the workflows and provided biological insights
into SSc in black South African populations through differential gene expression,
pathway and metagenomic analyses. A number of genes were down-regulated in the affected skin of SSc patients and supported findings from other studies. These genes play
potential roles in the identified down-regulated pathways associated with SSc, including
“toll-like receptor” and “chemokine signaling” pathways. Metagenomic analyses revealed
taxonomic classification of the de novo assembled unmapped reads, where more than one
species belonging to Arthrobacter, Bacillus, Brachybacterium, Dietzia and Pseudarthrobacter genera were present in the SSc patients but not in the unaffected individuals. Bioinformatic and computational pipelines for RNA-seq data analysis, from QC to sequence alignment and comparative analyses, will reduce analysis time, and increase accuracy and reproducibility of findings to promote transcriptome and meta-analysis research.
Description
A Thesis submitted to the Faculty of Health Sciences, University of the Witwatersrand,
Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy.
Johannesburg, 2019
Keywords
Citation
Mpangase, Phelelani Thokozani, Bioinformatic pipelines for transcriptome analyses:understanding gene expression in black South Africans with systemic sclerosis, University of the Witwatersrand, Johannesburg, <http://hdl.handle.net/10539/29858>