Feature Selection through Biclustering to Identify Specific Language Impairment

  • Marta D. M. Noronha Pontifícia Universidade Católica de Minas Gerais
  • Luis E. Zárate Pontifícia Universidade Católica de Minas Gerais


Failure to express yourself verbally is a condition that affects nearly 7% of children worldwide, known as specific language impairment. The diagnosis is complex, involving specialists such as speech therapists and pediatricians. The dataset used in this work has many attributes and imbalanced data, which can harm knowledge discovery. We used biclustering to identify clusters that characterize children with speech problems and those with typical development. We propose selecting attributes through the significance analysis of biclusters, which enhanced the F-score and accuracy in models generated by using 90% of instances from the training dataset, compared to results from the original data.

Palavras-chave: Biclustering, Classification, Data mining, Speech signal processing, Specific language impairment


Bowen, C. Brown’s stages of syntactic and morphological development. [link], 1998.

Busygin, S., Prokopyev, O., and Pardalos, P. Feature selection for consistent biclustering via fractional 0-1 programming. Journal of combinatorial optimization 10 (1): 7–21, 2005.

Cheng, Y. and Church, G. M. Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, California, USA, pp. 93–103, 2000.

Eren, K., Deveci, M., Küçüktunç, O., and Çatalyürek, Ü. V. A comparative analysis of biclustering algorithms for gene expression data. Briefings in bioinformatics 14 (3): 279–292, 2012.

Gabani, K., Solorio, T., Liu, Y., Hassanali, K.-n., and Dollaghan, C. A. Exploring a corpus-based approach for detecting language impairment in monolingual english-speaking children. Artificial Intelligence in Medicine 53 (3): 161–170, 2011.

Gillam, R. B., Cowan, N., and Marler, J. A. Information processing by school-age children with specific language impairment: Evidence from a modality effect paradigm. Journal of Speech, Language, and Hearing Research 41 (4): 913–926, 1998.

Hayward, D., Schneider, P., and Gillam, R. B. Age and task-related effects on young children’s understanding of a complex picture story. Alberta Journal of Educational Research 55 (1): 54–72, 2009.

Hinneburg, A., Aggarwal, C. C., and Keim, D. A. What is the nearest neighbor in high dimensional spaces? In Proceedings of the 26th International Conference on Very Large Data Bases. VLDB ’00. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 506–515, 2000.

Huang, G., Cheng, A., and Gao, Y. Machine learning improvements to the accuracy of predicting specific language impairment. In 2022 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML). IEEE, Xi’an, China, pp. 553–566, 2022.

Huang, Q., Jin, L., and Tao, D. An unsupervised feature ranking scheme by discovering biclusters. In 2009 IEEE International Conference on Systems, Man and Cybernetics. IEEE, San Antonio, TX, USA, pp. 4970–4975, 2009.

Madeira, S. C. and Oliveira, A. L. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics 1 (1): 24–45, Jan., 2004.

Noronha, M. D., Henriques, R., Madeira, S. C., and Zárate, L. E. Impact of metrics on biclustering solution and quality: A review. Pattern Recognition vol. 127, pp. 108612, 2022.

Sharma, Y. and Singh, B. K. One-dimensional convolutional neural network and hybrid deep-learning paradigm for classification of specific language impaired children using their speech. Computer Methods and Programs in Biomedicine vol. 213, pp. 106487, 2022.

Zhao, H., Liew, A., Wang, D., and Yan, H. Biclustering analysis for pattern discovery: Current techniques, comparative studies and applications. Current Bioinformatics 7 (1): 43–55, 3, 2012.
NORONHA, Marta D. M.; ZÁRATE, Luis E.. Feature Selection through Biclustering to Identify Specific Language Impairment. In: SYMPOSIUM ON KNOWLEDGE DISCOVERY, MINING AND LEARNING (KDMILE), 11. , 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023 . p. 121-128. ISSN 2763-8944. DOI: https://doi.org/10.5753/kdmile.2023.232858.