Classification of web resident sensor resources using latent semantic indexing and ontologies
Web resident sensor resource discovery plays a crucial role in the realisation of the Sensor Web. The vision of the Sensor Web is to create a web of sensors that can be manipulated and discovered in real time. A current research challenge in the sensor web is the discovery of relevant web sensor resources. The proposed approach towards solving the discovery problem is to implement a modified Latent Semantic Indexing(LSI) by making use of an ontology for classifying Web Resident Resources found in geospatial web portals. This research introduces a new method aimed at improving an information retrieval algorithm, infl uencing the vector decomposition by including a formal representation of the knowledge of the domain of interest. The aim is to bias the retrieval to better classify the resources of interest. The proposed method uses the domain knowledge, expressed in the ontology to improve the knowledge extraction by using the concept defi nitions and relationships in the ontology to create semantic links between documents. The clusters formed using the modified algorithm are analysed and performance measured by evaluating the inter-cluster distances and similarity measures within each cluster. The distances are expressed as Euclidean distances of vectors in n-dimensional latent space. The research focus is on investigating how the prior domain knowledge improves the clustering when k-means is used as the partitioning algorithm. It is observed that the modified extraction algorithm can isolate a group of documents that are used to populate the knowledge base, therefore resulting in improved storage of the documents that occur in the geospatial portal. Results found using the combination of ontology and LSI show that clusters are better separated and homogeneous clusters of more specific themes can be formed by hierarchical clustering.