Bioinformatic pipelines for transcriptome analyses :understanding gene expression in black South Africans with systemic sclerosis

No Thumbnail Available

Date

2019

Authors

Mpangase, Phelelani Thokozani

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The rate of raw sequence production through Next-Generation Sequencing (NGS) has been growing exponentially due to improved technology and reduced costs. This has enabled researchers to answer many biological questions through “multi-omics” data analyses. Even though such data promises new insights into how biological systems function and understanding many disease mechanisms, computational analyses performed on such large datasets comes with its pitfalls. In many cases, analyses of raw sequencing data can be computationally intensive and involves a combination of many bioinformatic applications, which often require different file formats in-between the analyses. Bioinformatic and computational pipelines can overcome these issues and the tedious repetitive tasks associated with the analyses of sequencing data, facilitate reproducibility of results and sharing of workflows for common analyses. The aim of this study was to develop robust portable and reproducible bioinformatic pipelines for the automation of RNA sequencing (RNA-seq) data analyses. Using Nextflow as a workflow management system and Singularity for application containerisation, two bioinformatic workflows have been developed: rnaSeqCount (https://github.com/ phelelani/nf-rnaSeqCount) for mapping raw RNA-seq reads to a reference genome and quantifying abundance of identified genomic features for differential gene expression analyses, and rnaSeqMetagen (https://github.com/phelelani/nf-rnaSeqMetagen) for performing metagenomic analyses on RNA-seq data. The RNA-seq data of black South African patients affected with systemic sclerosis (SSc) and unaffected individuals from the study by Frost et al. (2018) was used to illustrate the value of the workflows. SSc is a rare autoimmune disorder in which abnormalities in the vascular and immune systems result in the fibrosis of the connective tissue, skin and internal organs.The RNA-seq data validated the usefulness of the workflows and provided biological insights into SSc in black South African populations through differential gene expression, pathway and metagenomic analyses. A number of genes were down-regulated in the affected skin of SSc patients and supported findings from other studies. These genes play potential roles in the identified down-regulated pathways associated with SSc, including “toll-like receptor” and “chemokine signaling” pathways. Metagenomic analyses revealed taxonomic classification of the de novo assembled unmapped reads, where more than one species belonging to Arthrobacter, Bacillus, Brachybacterium, Dietzia and Pseudarthrobacter genera were present in the SSc patients but not in the unaffected individuals. Bioinformatic and computational pipelines for RNA-seq data analysis, from QC to sequence alignment and comparative analyses, will reduce analysis time, and increase accuracy and reproducibility of findings to promote transcriptome and meta-analysis research.

Description

A Thesis submitted to the Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, 2019

Keywords

Citation

Mpangase, Phelelani Thokozani, Bioinformatic pipelines for transcriptome analyses:understanding gene expression in black South Africans with systemic sclerosis, University of the Witwatersrand, Johannesburg, <http://hdl.handle.net/10539/29858>

Collections

Endorsement

Review

Supplemented By

Referenced By