Using big data for corporate brand analysis on the internet

No Thumbnail Available

Date

2020

Authors

Nkongolo, Mike

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

From a marketing perspective, a company’s reputation is a valuable, intangible asset. This reputation greatly influences the company’s image. As a result, consumers choose companies with a positive reputation and are willing to pay more for their products or services. A positive corporate reputation can create a competitive advantage and barriers to entry for competitors (Zhang et al., 2019). Improving a company’s image creates a new interest in Reputation Analysis. However, Reputation Analysis systems that focus on revealing the company’s image on the internet are not effective in solving various problems, such as automatic and real-time Data Collection, Feature Extraction, classification, and Visualization. Hence, this work investigates the use of Machine Learning (ML) and Natural Language Processing (NLP) to solve these types of problems. Artificial Neural Networks (ANN), Sentistrength, and Bag-Of-Words (BOW) are introduced as classifiers-the Accuracy, Precision, and Empirical error metrics have been used to implement and measure the framework performance as in Jadav and Vaghela (2016). In general, the main difficulties in using the Feature Extraction and classification approach for Reputation Analysis are to minimize False positives (FP) and negatives (FN) and to maximize Accuracy (Rehman et al., 2019). This work describes a Brand/ Reputation Analysis framework that uses sentiment contexts (retrieved from the web) to perform automated Reputation Analysis. The framework is in four stages. The first performed web crawling based on a query which is specified by the user. The second locates relevant information within textual data using a Named Entity Recognition (NER). The third records relevant information in a database for Feature Extraction and classification. Lastly, the framework was used for Reputation Analysis. The datasets for training were from: WITS marketing team, Sentistrength lexicon, and the Clueweb09. In testing the computational framework, ANN and Sentistrength achieved competitive results comparing to the works conducted by Shukri et al. (2015), Jadav and Vaghela (2016), and Rasool et al. (2019). The results revealed that ANN achieved more than 90% Accuracy in demarcating positive from negative comments of textual data. Particularly, the project does sentiment analysis (SA) on Wits online content. As such, this research investigates using sentiment analysis on online content from a University

Description

A dissertation submitted in fulfillment of the requirements for the degree of Master of Science in the School of Computer Science and Applied Mathematics, Faculty of Science, University of the Witwatersrand, Johannesburg, 2020

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By