TY - GEN
T1 - Computing information gain for spatial data support
AU - Hong, Tao
AU - Samal, Ashok
AU - Soh, Leen Kiat
PY - 2008
Y1 - 2008
N2 - Widespread use of GPS devices and explosion of remotely sensed geospatial images along with cheap storage devices has resulted in vast amounts of data. More recently, with the advent of wireless technology, a large number of sensor networks have been deployed to monitor many human, biological and natural processes. This poses a challenge in many data rich application domains. The problem now is how best to choose the datasets to solve specific problems. Some of the datasets may be redundant and their inclusion in analysis may not only be time consuming, but may lead to erroneous conclusions. We propose the concept of data support as the basis for efficient, cost-effective and intelligent use of geospatial data in order to reduce uncertainty in the analysis and consequently in the results. Data support is defined as the process of determining the information utility of a data source to help decide which one to include or exclude to improve cost-effectiveness in existing data analysis. In this article we use mutual information as the basis of computing data support. The concept of mutual information is defined in information theory as a measure to compute information gain or loss between two disjoint datasets. We use this to compute the optimal datasets in specific applications. The effectiveness of the approach is demonstrated using an application in the hydrological analysis domain.
AB - Widespread use of GPS devices and explosion of remotely sensed geospatial images along with cheap storage devices has resulted in vast amounts of data. More recently, with the advent of wireless technology, a large number of sensor networks have been deployed to monitor many human, biological and natural processes. This poses a challenge in many data rich application domains. The problem now is how best to choose the datasets to solve specific problems. Some of the datasets may be redundant and their inclusion in analysis may not only be time consuming, but may lead to erroneous conclusions. We propose the concept of data support as the basis for efficient, cost-effective and intelligent use of geospatial data in order to reduce uncertainty in the analysis and consequently in the results. Data support is defined as the process of determining the information utility of a data source to help decide which one to include or exclude to improve cost-effectiveness in existing data analysis. In this article we use mutual information as the basis of computing data support. The concept of mutual information is defined in information theory as a measure to compute information gain or loss between two disjoint datasets. We use this to compute the optimal datasets in specific applications. The effectiveness of the approach is demonstrated using an application in the hydrological analysis domain.
KW - Information gain/loss
KW - Spatial data support
UR - http://www.scopus.com/inward/record.url?scp=70449730071&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449730071&partnerID=8YFLogxK
U2 - 10.1145/1463434.1463502
DO - 10.1145/1463434.1463502
M3 - Conference contribution
AN - SCOPUS:70449730071
SN - 9781605583235
T3 - GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems
SP - 431
EP - 434
BT - Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM GIS 2008
T2 - 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM GIS 2008
Y2 - 5 November 2008 through 7 November 2008
ER -