Graph Condensation for Text Classification

René Vieira Santin; Diego Minatel; Nícolas Roque dos Santos; Solange Oliveira Rezende

René Vieira Santin USP
Diego Minatel USP
Nícolas Roque dos Santos USP
Solange Oliveira Rezende USP

Resumo

TextGCN is a graph-based model that achieves strong text classification performance by effectively capturing corpus-level relationships between documents and words. However, its scalability is limited by the high computational cost of processing large graphs, particularly in environments with restricted resources. One way to handle this limitation is to apply Graph Condensation (GCond), which generates much smaller synthetic graphs while preserving key information. GCond has demonstrated the ability to achieve performance comparable to the original graphs when combined with other graph neural network architectures, such as GCN, SGC, and GraphSAGE. Its application with TextGCN remains unexplored in the literature. This paper addresses this gap and proposes integrating TextGCN and GCond for scalable text classification. We experimentally evaluated our proposal on three benchmark datasets by considering three metrics–accuracy, memory usage, and training time–and compared the performance on the full original graphs against various levels of graph reduction, with the smallest condensed graph retaining only 0.02% of the nodes. The experimental results indicate that the reduced graphs perform similarly to the originals. In the best-case scenario, the most condensed synthetic graph generated by GCond was up to 20 times faster to train and consumed approximately 315,000 times less memory than the original graph, with only a two-percentage-point drop in accuracy.