Using big data for corporate brand analysis on the internet
No Thumbnail Available
Date
2020
Authors
Nkongolo, Mike
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
From a marketing perspective, a company’s reputation is a valuable, intangible
asset. This reputation greatly influences the company’s image. As a result, consumers
choose companies with a positive reputation and are willing to pay more for
their products or services. A positive corporate reputation can create a competitive
advantage and barriers to entry for competitors (Zhang et al., 2019). Improving a
company’s image creates a new interest in Reputation Analysis. However, Reputation
Analysis systems that focus on revealing the company’s image on the internet
are not effective in solving various problems, such as automatic and real-time Data
Collection, Feature Extraction, classification, and Visualization. Hence, this work
investigates the use of Machine Learning (ML) and Natural Language Processing
(NLP) to solve these types of problems. Artificial Neural Networks (ANN), Sentistrength,
and Bag-Of-Words (BOW) are introduced as classifiers-the Accuracy, Precision,
and Empirical error metrics have been used to implement and measure the
framework performance as in Jadav and Vaghela (2016). In general, the main difficulties
in using the Feature Extraction and classification approach for Reputation
Analysis are to minimize False positives (FP) and negatives (FN) and to maximize
Accuracy (Rehman et al., 2019).
This work describes a Brand/ Reputation Analysis framework that uses sentiment
contexts (retrieved from the web) to perform automated Reputation Analysis. The
framework is in four stages. The first performed web crawling based on a query
which is specified by the user. The second locates relevant information within textual
data using a Named Entity Recognition (NER). The third records relevant information
in a database for Feature Extraction and classification. Lastly, the framework
was used for Reputation Analysis. The datasets for training were from: WITS marketing
team, Sentistrength lexicon, and the Clueweb09. In testing the computational
framework, ANN and Sentistrength achieved competitive results comparing to the
works conducted by Shukri et al. (2015), Jadav and Vaghela (2016), and Rasool et al.
(2019). The results revealed that ANN achieved more than 90% Accuracy in demarcating
positive from negative comments of textual data. Particularly, the project does
sentiment analysis (SA) on Wits online content. As such, this research investigates
using sentiment analysis on online content from a University
Description
A dissertation submitted in fulfillment of the requirements for the degree of Master of Science in the School of Computer Science and Applied Mathematics, Faculty of Science, University of the Witwatersrand, Johannesburg, 2020