Knowledge-driven language modelling for text embedding and semantic similarity

Bhana, Nimesh

Knowledge-driven language modelling for text embedding and semantic similarity

Files

Nimesh Bhana 2371061 Research Report.pdf (3.23 MB)

Date

2023

Authors

Bhana, Nimesh

Abstract

Language Models such as BERT have grown in popularity due to their ability to be pre-trained and perform robustly on a wide range of Natural Language Processing tasks. Often seen as an evolution over traditional word embedding techniques, they are capable of producing representations of text, useful for tasks such as semantic similarity. However, state-of-the-art models often have high computational requirements and lack global context or domain knowledge which is required for complete language understanding. To address these limitations, an investigation of the benefits of knowledge incorporation into the fine-tuning stages of BERT is done. An existing K-BERT model, which enriches sentences with triples from a Knowledge Graph, is adapted for the English language and extended to inject contextually relevant information into sentences. Given the appropriate knowledge, Knowledge-enabled BERT (K-BERT) outperforms similar models, USE & SBERT, suited for text embedding and semantic similarity. Performance is based on the STS-B and ag_news_subset datasets. Knowledge ablation studies conducted indicate that injected knowledge causes noise. When this noise is minimised, we see statistically significant performance improvements for knowledge-driven tasks. Results show evidence that, given the appropriate task, modest injection, with relevant, high quality knowledge is most performant. However, achieving successful integration autonomously is non-trivial.

Description

A dissertation submitted in fulfilment of the requirements for the degree of Master of Science to the Faculty of Science, School of Computer Science and Applied Mathematics University of Witwatersrand, Johannesburg, 2023

URI

https://hdl.handle.net/10539/35731

Collections

ETD Collection

Full item page

Knowledge-driven language modelling for text embedding and semantic similarity

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By