A Privacy Preservation Masking Method to Support Business Collaboration
Resumo
This paper introduces a privacy preservation masking method to support business collaboration, called Dimensionality Reduction-Based Transformation (DRBT). This method relies on the intuition behind random projection to mask the underlying attribute values subject to cluster analysis. Using DRBT, data owners are able to find a solution that meets privacy requirements and guarantees valid clustering results. DRBT was validated taking into account five real datasets. The major features of this method are: a) it is independent of distance-based clustering algorithms; b) it has a sound mathematical foundation; and c) it does not require CPU-intensive operations.
Referências
Auer, J. W. (1991). Linear Algebra With Applications. Prentice-Hall Canada Inc., Scarborough, Ontario, Canada.
Berry, M. and Linoff, G. (1997). Data Mining Techniques for Marketing, Sales, and Customer Support. John Wiley and Sons, New York, USA.
Bingham, E. and Mannila, H. (2001). Random Projection in Dimensionality Reduction: Applications to Image and Text Data. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 245-250. San Francisco, CA, USA.
Blake, C. and Merz, C. (1998). UCI Repository of Machine Learning Databases, University of California, Irvine, Dept. of Information and Computer Sciences.
Caetano, T. S. (2004). Graphical Models and Point Set Matching. PhD thesis, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil.
Faloutsos, C. and Lin, K.-I. (1995). FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In Proc. of the 1995 ACM SIGMOD International Conference on Management of Data, pages 163-174. San Jose, CA, USA.
Fern, X. Z. and Brodley, C. E. (2003). Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach. In Proc. of the 20th International Conference on Machine Learning (ICML 2003). Washington DC, USA.
Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition. 2nd. Edition. Academic Press.
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, CA.
Johnson, W. B. and Lindenstrauss, J. (1984). Extensions of Lipshitz Mapping Into Hilbert Space. In Proc. of the Conference in Modern Analysis and Probability, pages 189-206. volume 26 of Contemporary Mathematics.
Kaski, S. (1999). Dimensionality Reduction by Random Mapping. In Proc. of the International Joint Conference on Neural Networks, pages 413-418. Anchorage, Alaska.
Kruskal, J. B. and Wish, M. (1978). Multidimensional Scaling. Sage Publications, Beverly Hills, CA, USA.
Larsen, B. and Aone, C. (1999). Fast and Effective Text Mining Using Linear-Time Document Clustering. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 16-22. San Diego, CA, USA.
Macqueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. In Proc. of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pages 281-297. Berkeley: University of California Press, Vol. 1.
Meregu, S. and Ghosh, J. (2003). Privacy-Preserving Distributed Clustering Using Generative Models. In Proc. of the 3rd IEEE International Conference on Data Mining (ICDM'03), pages 211-218. Melbourne, Florida, USA.
Oliveira, S. R. M. and Zaïane, O. R. (2004). Privacy-Preserving Clustering by Object Similarity-Based Representation and Dimensionality Reduction Transformation. In Proc. of the Workshop on Privacy and Security Aspects of Data Mining (PSADM'04) in conjunction with the Fourth IEEE International Conference on Data Mining (ICDM'04), pages 21-30. Brighton, UK.
Pinkas, B. (2002). Cryptographic Techniques For Privacy-Preserving Data Mining. SIGKDD Explorations, 4(2):12-19.
Vaidya, J. and Clifton, C. (2003). Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data. In Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowlegde Discovery and Data Mining, pages 206-215. Washington, DC, USA.