Helen Wright*, Michiel Postema, and Vered Aharonson

Towards a voice-based severity scale for
Parkinson’s disease monitoring
https://doi.org/10.1515/cdbme-2024-2168

Abstract: The unified Parkinson’s disease rating scale, used
to monitor the disease progression, is based on visual assess-
ments of motor symptoms. Vocal manifestations of Parkin-
son’s disease differ from the motor ones, specifically in their
rate of change with disease severity. As such, a different scale
is needed to provide the voice measures of the disease sever-
ity. This study employed a dataset of voice-quality features
from repeated recordings of Parkinson’s disease patients. The
changes of all voice features across the categories were eval-
uated using one-way analysis-of-variance and support vector
regression. Significant changes and marked non-linearly in-
creasing or decreasing trends were shown for all features, for
the three-categories scale. Significant changes and trends were
obtained in the 12-categories scale, but only for the mild cat-
egory and the severe category range of scores. The findings
imply a potential for voice-based monitoring for the early and
late severity stages of Parkinson’s disease that could be con-
tinuously used by patients and provide timely warnings of de-
terioration.

Keywords: Vocal features, disease monitoring, regression,
UPDRS.

1 Introduction

Currently available treatments for Parkinson’s disease cannot
cure the disease, but can alleviate symptoms and improve the
patient’s quality of life. The efficacy of all treatments, how-
ever, might be improved by timely detection of a change in the
disease severity [1]. A growing body of evidence highlights
discernible alterations in the vocal patterns of PD patients [2].

*Corresponding author: Helen Wright, School of Electrical and
Information Engineering, University of the Witwatersrand,
Johannesburg, 1 Jan Smuts Laan, 2001 Braamfontein, South
Africa.
e-mail: helen.wright@wits.ac.za
Michiel Postema, Department of Biomedical Technology, Faculty
of Medicine and Health Technology, Tampere University, Tampere,
Finland and School of Electrical and Information Engineering,
University of the Witwatersrand, Johannesburg, Braamfontein,
South Africa.
Vered Aharonson, School of Electrical and Information Engineer-
ing, University of the Witwatersrand, Johannesburg, South Africa
and University of Nicosia Medical School, Nicosia, Cyprus.

A method able to identify these changes would be beneficial
to both patients and clinicians, allowing small changes to be
assessed conveniently and noninvasively, and guiding person-
alised treatments and interventions. To achieve this, a quantita-
tive description of the perceived changes in PD speech patterns
through explainable low-level vocal features must be devel-
oped. This work lays the foundation for a voice-based Parkin-
son’s disease severity scale by mapping selected vocal feature
values against unified Parkinson’s disease rating scale (UP-
DRS) scores and assessing what relationships may be iden-
tified. In this way, the voice changes that occur as the disease
worsens can be seen, tracked and quantified.

2 Materials and Methods

The data used for this study were taken from the UCI
Parkinson’s Disease Telemonitoring Dataset [3]. The com-
plete dataset comprises features extracted from 5923 sustained
vowel phonations recorded from 42 PD patients at weekly in-
tervals over a period of 6 months. The chosen features include
voice quality measures which are traditionally used in speech
therapy for PD [4]. Additional signal features were calculated
to provide further insight into the complexity of the recorded
signals [5]. A correlation analysis revealed high correlations
between a number of features (correlation coefficients > 0.7)
and a subset of features was used in the current analysis: The
harmonics-to-noise ratio (HNR) is a measure of the amount
of non-periodic noise in a recorded signal. The pitch period
entropy (PPE) is calculated to quantify how well a stable
pitch can be maintained.The recurrent period density entropy
(RPDE) is an indicator of the voice signal’s deviation from
exact periodicity. The jitter to pitch period quotient 5 (PPQ5)
is a measure of the cycle-to-cycle variability of fundamental
frequency of the voice cycle. The shimmer to amplitude per-
turbation quotient 3 (APQ3) is a measure of the cycle-to-cycle
variability of the amplitude of the voice signal.

The UCI dataset includes UPDRS values, which were de-
termined through patient examination at the beginning, mid-
point and end of the data collection and recording process.
To estimate missing weekly UPDRS values, linear interpo-
lation was used. This association of UPDRS values to each
recording is a unique feature of this datatset, making it par-
ticularly suitable for the current preliminary longitudinal anal-

686

DE GRUYTER Current Directions in Biomedical Engineering 2024;10(4): 686-689

Open Access. ©2024 The Author(s), published by De Gruyter. This work is licensed under the
Creative Commons Attribution 4.0 International License.

H. Wright et al., Parkinson’s disease vocal severity assessment

ysis. Our study makes use of the motor UPDRS scores only,
taken from Part III of the UPDRS assessment, since this part
of UPDRS assessment includes speech-related questions. This
UPDRS score range will be referred to as UPDRS(III) in the
rest of this paper.

Two experiments were conducted to investigate how
speech feature values vary with increasing UPDRS(III) scores.
The range of UPDRS(III) scores in the dataset was from 5 to
40. In Experiment 1, this range was divided into three cate-
gories: early stage (scores from 5 - 15), middle stage (scores
from 16 - 32), and late stage (scores from 33 - 41). These cate-
gories were first proposed in [6, 7] and are based on perceptual
evaluation of PD speech recordings. In Experiment 2, the UP-
DRS(III) range was divided into 12 categories, with each cat-
egory representing a three-point interval. This interval corre-
sponds to the minimum number of data points required to cre-
ate distinct categories that can still be linked to the ones used
in Experiment 1, considering the available range of measure-
ments in the dataset. Categories 1 to 4 (scores 5 - 16) in Exper-
iment 2 correspond to the Early stage in Experiment 1. Cate-
gories 5 to 9 (scores 17 - 31) correspond to the Mild stage in
Experiment 1. Categories 10 to 12 (scores 32 - 41) correspond
to the Late stage in Experiment 1. While this division of the
UPDRS(III) range did not result in perfect UPDRS alignment
between the two sets of experimental groups, with a score of
16 falling into the "Early" category instead of the "Middle"
category, and similarly for a score of 32, this discrepancy was
allowed for this analysis as it ensured the integrity of the three-
point intervals.

Box-and-whisker plots were drawn for each category.
These were used to define the minimum and maximum values
for the distributions. Any datapoints lying outside those lim-
its were identified as outliers and removed from further anal-
ysis. The data distributions were portrayed using violin plots
and their mean, median, and standard deviation were calcu-
lated. The latter descriptive statistics were compared across
different UPDRS(III) categories using one-way analysis-of-
variance with Bonferroni correction, for all features. Statis-
tically significant changes, defined by a p-value of less than
0.05, were sought in each feature between the different UP-
DRS(III) categories. Regression analysis, using support vec-
tor regression, was conducted in both experiments to identify
trends in the distribution of each feature value with increasing
UPDRS(III) scores.

3 Results

In Experiment 1, all features showed statistically significant
differences between the low and middle categories, evidenced

(a)

(b)

Fig. 1: HNR feature value distributions and regression curves.
(a) shows the distribution of HNR values across three categories
of UPDRS(III) scores. (b) shows the distribution across 12 UP-
DRS(III) categories. In both cases, SVR curves show decreasing
trends.

(a)

(b)

Fig. 2: RPDE feature value distributions and regression curves. (a)
shows the distribution of RPDE values across three categories of
UPDRS(III) scores. (b) shows the distribution across 12 UPDRS(III)
categories. SVR curve shows trends that first increase, then de-
crease.

by p-values < 0.01. Three out of the five features showed sta-
tistically significant differences between the middle and high
categories, namely the shimmer, PPE and RPDE - the jitter and
HNR features did not. Regression curves were drawn between
the categories to identify trends in feature values with increas-
ing UPDRS(III) score. The harmonics-to-noise ratio (Figure
1a) showed a decreasing trend. Pitch period entropy (Figure
3a), jitter (Figure 4a) and shimmer (Figure 5a) all showed in-
creasing trends. The recurrent period density entropy showed

687

H. Wright et al., Parkinson’s disease vocal severity assessment

a trend that increases from the low to mid category and then
decreases from the mid to high categories (Figure 2a).

(a)

(b)

Fig. 3: PPE feature value distributions and regression curves.
(a) shows the distribution of PPE values across three categories
of UPDRS(III) scores. (b) shows the distribution across 12 UP-
DRS(III) categories. In both cases, SVR curves show increasing
trends.

(a)

(b)

Fig. 4: Jitter:PPQ5 feature value distributions and regression
curves. (a) shows the distribution of jitter values across three cat-
egories of UPDRS(III) scores. (b) shows the distribution across 12
UPDRS(III) categories. In both cases, SVR curves show increasing
trends.

The use of a smaller analysis window in Experiment 2
allowed feature changes to be analysed over a finer resolu-
tion of 12 UPDRS(III) categories. The p-values of the changes
in feature values between the twelve categories are given in
Table 1. None of the features analysed showed significant

(a)

(b)

Fig. 5: Shimmer:APQ3 feature value distributions and regression
curves. (a) shows the distribution shimmer values across three cat-
egories of UPDRS(III) scores. (b) shows the distribution across 12
UPDRS(III) categories. In both cases, SVR curves show increasing
trends.

changes between all adjacent categories. Instead, they exhib-
ited a fluctuating pattern with increasing and decreasing val-
ues. The harmonics-to-noise ratio, pitch period entropy and re-
current period density entropy showed statistically significant
differences between the first two and last three categories. The
jitter and shimmer features showed minor fluctuations in the
middle categories. Regression curves plotted for all features
showed the same trends as those from experiment 1. However,
they capture the small fluctuations between categories. The
harmonics-to-noise ratio shows a gradually decreasing trend
(Figure 1b). The pitch period entropy (Figure 3b), jitter (Fig-
ure 4b) and shimmer (Figure 5b) show gradually increasing
trends. The recurrent period density entropy shows a trend of
an increase in the early categories and then decreases in the
later categories, with minor fluctuations in the middle cate-
gories (Figure 2b).

4 Discussion and Conclusions

When the UPDRS(III) range was divided into three categories,
of low middle and high severity, each category showed a wide
range of values, with overlap between categories. This is true
for all the features analysed. However, the presence of statis-
tically significant differences between these distributions sug-
gests that these changes are measurable and, if observed over
time, may provide a method of monitoring long-term deteri-
oration. The large analysis windows used in this experiment,
however, may obscure slighter changes within each category.
The use of a smaller analysis window, as in Experiment 2, al-

688

H. Wright et al., Parkinson’s disease vocal severity assessment

Tab. 1: Statistical p-values calculated for experiment 2

Feature Name 1 - 2 2 - 3 3 - 4 4 - 5 5 - 6 6 - 7 7 - 8 8 - 9 9 - 10 10 - 11 11 - 12
HNR < 0.01 0.018 1.0 0.82 0.025 0.00048 1.0 0.073 0.0013 <0.01 1.0
PPE < 0.01 1.0 1.0 <0.01 0.11 1.0 1.0 0.0034 0.84 <0.01 1.0

RPDE < 0.01 1.0 0.00013 1.0 1.0 1.0 0.00011 0.0013 0.0091 0.094 1.0
Jitter:PPQ5 < 0.01 1.0 1.0 <0.01 0.11 1.0 1.0 <0.01 0.03 0.0045 1.0

Shimmer:APQ3 < 0.016 0.19 1.0 0.001 <0.01 <0.01 1.0 1.0 1.0 0.17 1.0

lows the changes to be observed in a finer resolution. The re-
sults highlight and confirm the gradual nature of the changes to
vocal features, and provide evidence that smaller UPDRS(III)
analysis windows are better for detailed analysis.

The combined findings of the 2 experiments show a trade-
off which must be considered when implementing these voice
features as biomarkers of changes in PD. The low, middle and
high stages are easy to interpret clinically and show consistent
differences between stages but are too broad and cannot show
small changes due to treatment, for example. Increasing the
number of stages to 12 exhibits smaller severity changes, but
only for the low and high severity stages, while the middle
severity stages showed no change or fluctuative changes.

Regression analysis conducted in both experiments
showed the changes in each feature with increasing UP-
DRS(III) score. The support vector regression techniques used
for this are effective for handling non-linear relationships and
are able to capture local trends. In this way, the feature value
changes are captured. They also capture the gradient of the
trends, which identifies the relationship between the features
and disease severity. The non-linear nature of the trends fur-
ther highlights the complexity of the relationship between PD
progression and speech. These findings align with previous
works which compared feature values between PD patients
and healthy control subjects, and extends it by plotting the lon-
gitudinal changes. They also support the use of speech in the
early- and late-stage assessment of PD severity.

While this work supports the use of voice for the moni-
toring of PD progression, it is as yet unverified. Repeating the
analysis on additional datasets would serve to verify the find-
ings and would also confirm the feature value ranges observed
here. Alternative UPDRS(III) groupings could also be investi-
gated. This would allow the optimal grouping to be identified,
especially for the middle severity ranges, and allow for better
alignment between the different experiment groups. This work
has focused on the phonatory aspects of speech, extracted
from sustained vowel phonations. Including other vocal fea-
tures, extracted from alternative speaking tasks, would allow
the changes in those features to be identified. These could
provide additional insight into the reported vocal changes and
may also be used as biomarkers.

A limitation of this study is the size of the dataset and
a lack of healthy control data for comparison. Extending the
analysis to a larger dataset would verify the findings reported
here and allow the feature values and changes to be better
quantified. A comparison with healthy control data would in-
dicate the vocal deterioration and confirm the differences re-
ported.

Author Statement
Research funding: This work was not supported financially.
Conflict of interest: Authors state no conflict of interest. Ethi-
cal approval: The research related to human data use complied
with all the relevant national regulations and institutional poli-
cies and has been approved by the University of the Witwater-
srand human research ethics committee (clearance certificate
numbers M221107).

References

[1] Armstrong MJ, Okun MS. Diagnosis and treatment of Parkin-
son disease: a review. JAMA 2020;323:548–60.

[2] Ma A, Lau KK, Thyagarajan D. Voice changes in Parkinson’s
disease: what are they telling us? J Clin Neurosci 2020;72:1–
7.

[3] Tsanas A, Little M. Parkinson’s Disease Telemonitor-
ing Dataset. UCI Machine Learning Repository, 2009.
Available from https://archive.ics.uci.edu/dataset/189/
parkinsons+telemonitoring

[4] Raphael LJ, Borden GJ, Harris KS. Speech Science Primer:
Physiology, Acoustics, and Perception of Speech. Philadel-
phia: Lippincott Williams & Wilkins; 2007.

[5] Tsanas A, Little M,McSharry P, Ramig L. Accurate telemon-
itoring of Parkinson’s disease progression by non-invasive
speech tests. IEEE Trans Biomed Eng 2010;57:884–93.

[6] Martínez-Martín P, Rodríguez-Blázquez C, Alvarez M,
Arakaki T, Arillo VC, Chaná P, et al. Parkinson’s disease
severity levels and MDS-Unified Parkinson’s Disease Rat-
ing Scale. Parkinsonism Relat Disord 2015;21(1):50–4.

[7] Sakar BE, Serbes G, Sakar CO. Analyzing the effectiveness
of vocal features in early telediagnosis of Parkinson’s dis-
ease. PLoS One 2017;12:e0182428.

689