A Co-occurrence Based Approach for Mining Overlapped Co-clusters in Binary Data

Resumo


Co-clustering is a specific type of clustering that addresses the problem of simultaneously clustering objects and attributes of a data matrix. Although general clustering techniques find non-overlapping co-clusters, finding possible overlaps between co-clusters can reveal embedded patterns in the data that the disjoint clusters cannot discover. The overlapping co-clustering approaches proposed in the literature focus on finding global overlapped co-clusters and they might overlook interesting local patterns that are not necessarily identified as global co-clusters. Discovering such local co-clusters increases the granularity of the analysis, and therefore more specific patterns can be captured. This is the objective of the present paper, which proposes the new Overlapped Co-Clustering (OCoClus) method for finding overlapped co-clusters on binary data, including both global and local patterns. This is a non-exhaustive method based on the co-occurrence of attributes and objects in the data. Another novelty of this method is that it is driven by an objective cost function that can automatically determine the number of co-clusters. We evaluate the proposed approach on publicly available datasets, both real and synthetic data, and compare the results with a number of baselines. Our approach shows better results than the baseline methods on synthetic data and demonstrates its efficacy in real data.
Palavras-chave: Co-clustering, Overlapped co-clusters, Binary data
Publicado
29/11/2021
SANTOS, Yuri Santa Rosa Nassar dos; SANTIAGO, Rafael; PEREGO, Raffaele; SCHALY, Matheus Henrique; ALVARES, Luis Otávio; RENSO, Chiara; BOGORNY, Vania. A Co-occurrence Based Approach for Mining Overlapped Co-clusters in Binary Data. In: BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 10. , 2021, Online. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2021 . ISSN 2643-6264.