TY - JOUR
T1 - A dissimilarity function for geospatial polygons
AU - Joshi, Deepti
AU - Soh, Leen Kiat
AU - Samal, Ashok
AU - Zhang, Jing
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under Grants No. 0219970 and 0535255. We would like to thank David Marx for his valuable insight and feedback. We would also like to extend our thanks to Lei Fu, Bill Waltman, and Tao Hong for their assistance in data processing and preparation for this research.
Publisher Copyright:
© 2013, Springer-Verlag London.
PY - 2014/10/1
Y1 - 2014/10/1
N2 - Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. (Formula presented.)-medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.
AB - Similarity plays an important role in many data mining tasks and information retrieval processes. Most of the supervised, semi-supervised, and unsupervised learning algorithms depend on using a dissimilarity function that measures the pair-wise similarity between the objects within the dataset. However, traditionally most of the similarity functions fail to adequately treat all the spatial attributes of the geospatial polygons due to the incomplete quantitative representation of structural and topological information contained within the polygonal datasets. In this paper, we propose a new dissimilarity function known as the polygonal dissimilarity function (PDF) that comprehensively integrates both the spatial and the non-spatial attributes of a polygon to specifically consider the density, distribution, and topological relationships that exist within the polygonal datasets. We represent a polygon as a set of intrinsic spatial attributes, extrinsic spatial attributes, and non-spatial attributes. Using this representation of the polygons, PDF is defined as a weighted function of the distance between two polygons in the different attribute spaces. In order to evaluate our dissimilarity function, we compare and contrast it with other distance functions proposed in the literature that work with both spatial and non-spatial attributes. In addition, we specifically investigate the effectiveness of our dissimilarity function in a clustering application using a partitional clustering technique (e.g. (Formula presented.)-medoids) using two characteristically different sets of data: (a) Irregular geometric shapes determined by natural processes, i.e., watersheds and (b) semi-regular geometric shapes determined by human experts, i.e., counties.
KW - Dissimilarity function
KW - Polygonal clustering
KW - Polygons
KW - Regionalization
KW - Spatial data mining
UR - http://www.scopus.com/inward/record.url?scp=84908134709&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84908134709&partnerID=8YFLogxK
U2 - 10.1007/s10115-013-0666-2
DO - 10.1007/s10115-013-0666-2
M3 - Article
AN - SCOPUS:84908134709
SN - 0219-1377
VL - 41
SP - 153
EP - 188
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
IS - 1
ER -