Image captioning via multimodal embeddings

dc.contributor.authorAlgu, Shikash
dc.date.accessioned2023-11-10T06:28:59Z
dc.date.available2023-11-10T06:28:59Z
dc.date.issued2022
dc.descriptionResearch Report submitted in partial fulfilment of the requirements for the degree of Master of Science by coursework and research report in Artificial Intelligence to the Faculty of Science, University of the Witwatersrand, Johannesburg,
dc.description.abstractImage captioning is an ongoing problem in computer vision with the aim of generating semantically and syntactically correct captions. Vanilla image captioning models fail to capture the structural relationship between objects that are available in images. To overcome this problem, scene graphs (knowledge graphs) that describe the relationship between objects have been added to models and improve on results. Current image captioning models do not consider combining image features and scene graphs in a common latent space, before generating captions. Graph convolutional neural networks have been designed to capture dependency information and are showing promising results in computer vision. This research aimed to investigate whether the inclusion of scene graph and image features in a multimodal layer will improve on image captioning models. Results show that by including scene graph features, image captioning results improve based on the standard image captioning evaluation metrics. Qualitative analysis shows that by including scene graphs, the structural relationships between objects in captions improve.
dc.description.librarianPC(2023)
dc.facultyFaculty of Science
dc.identifier.urihttps://hdl.handle.net/10539/36947
dc.language.isoen
dc.schoolComputer Science and Applied Mathematics
dc.subjectImage captioning
dc.subjectMultimodal embeddings
dc.titleImage captioning via multimodal embeddings
dc.typeDissertation
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ALGU Shikash 2373769 MSc CWRR research report.pdf
Size:
2.45 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.43 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections