An assessment of deviations from conditional independence in binary data fusion
No Thumbnail Available
Date
2011-07-07
Authors
Smit, Elsabe
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Data fusion is a data integration technique that provides a way to combine information from different sources through a set of common characteristics (variables), thereby creating a single, all-inclusive data source. The success of a fusion largely depends on the accuracy of the underlying assumptions about the relationship between the common variables and the variables unique to each individual data source. The most common model used to fuse data is based on the assumption of conditional independence, which states that the variables unique to each data set (say Y and Z) are independent given the common variables (say X). This analysis evaluates data fusion procedures for binary data under the assumption of conditional independence, and assesses how deviations from this assumption influence the success of the fusion. The degree of conditional independence present in the data is quantified using a function of entropy, namely the conditional mutual information. The impact of the deviation from conditional independence on the success of the fusion is evaluated using the results from a number of different statistical tests, such as the Chi-square goodness-of-fit test and the 3T-test for a correlation structure, in relation to the level of conditional independence in the data.