Automatic speech feature extraction using a convolutional restricted boltzmann machine
No Thumbnail Available
Date
2017
Authors
Anderson, David John
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Restricted Boltzmann Machines (RBMs) are a statistical learning concept that can
be interpreted as Arti cial Neural Networks. They are capable of learning, in an
unsupervised fashion, a set of features with which to describe a data set. Connected
in series RBMs form a model called a Deep Belief Network (DBN), learning abstract
feature combinations from lower layers. Convolutional RBMs (CRBMs) are a variation
on the RBM architecture in which the learned features are kernels that are convolved
across spatial portions of the input data to generate feature maps identifying if a feature
is detected in a portion of the input data. Features extracted from speech audio data
by a trained CRBM have recently been shown to compete with the state of the art
for a number of speaker identi cation tasks. This project implements a similar CRBM
architecture in order to verify previous work, as well as gain insight into Digital Signal
Processing (DSP), Generative Graphical Models, unsupervised pre-training of Arti cial
Neural Networks, and Machine Learning classi cation tasks. The CRBM architecture
is trained on the TIMIT speech corpus and the learned features veri ed by using them
to train a linear classi er on tasks such as speaker genetic sex classi cation and speaker
identi cation. The implementation is quantitatively proven to successfully learn and
extract a useful feature representation for the given classi cation tasks
Description
A dissertation submitted to the Faculty of Science, University of
the Witwatersrand, in fulfillment of the requirements for the degree
of Master of Science, 2017
Keywords
Citation
Anderson, David John (2017) Automatic speech feature extraction using a convolutional restricted boltzmann machine, University of the Witwatersrand, Johannesburg, <http://hdl.handle.net/10539/26165>