A methodology for automatic composition of polyphonic music

Angelica Abadia Paulista Ribeiro; João Luís Garcia Rosa

doi:10.5753/kdmile.2025.247513

Angelica Abadia Paulista Ribeiro Universidade de São Paulo (USP) http://orcid.org/0000-0001-5326-8770
João Luís Garcia Rosa Universidade de São Paulo (USP) https://orcid.org/0000-0002-9491-9434

DOI: https://doi.org/10.5753/kdmile.2025.247513

Resumo

Automatic music composition presents challenges that exceed those of picture or video synthesis, requiring precise temporal modeling and the synchronization of several instrumental voices with unique dynamics. This study presents three Generative Adversarial Network (GAN) architectures—the Improvisation, Composition, and Hybrid models—for the automated generation of polyphonic notation for multiple instruments. Utilizing a corpus of over 100,000 symphonic music pieces in Wave, MIDI, and MusicXML formats (including a dataset curated by the primary author), the models include completely extracted MIDI features derived from the conversion of all Wave files to MIDI. Assessment via quantitative intra- and inter-instrument metrics, supplemented by a musician-centric study, demonstrates that the models can create music from inception, produce coherent four-measure excerpts, and engage in human-machine co-composition to transform single-instrument melodies into comprehensive arrangements.

Palavras-chave: computer music, neural Networks, machine learning

Referências

Groove dataset. [link], 2019. Acessado: 29-09-2019.

Maestro dataset. [link], 2019. Acessado: 29-09-2019.

Cheston, H., Bance, R., and Harrison, P. Deconstructing jazz piano style using machine learning. arXiv preprint arXiv:2504.05009 , 2025.

Chowdhury, S. R., Biswas, S., Nandy, S., Maity, S. K., and Chatterjee, D. Music generation using deep learning. Power Devices and Internet of Things for Intelligent System Design, 2025.

Chu, H., Urtasun, R., and Fidler, S. Song from pi: A musically plausible network for pop music generation. arXiv preprint arXiv:1611.03477 , 2016.

Chuan, C.-H. and Herremans, D. Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. In Thirty-second AAAI conference on artificial intelligence, 2018.

Dhar, A. and Victor, A. Neural harmony: Advancing polyphonic music generation and genre classification through lstm-based networks. In 2024 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). IEEE, pp. 1–6, 2024.

Dong, H.-W., Hsiao, W.-Y., and Yang, Y.-H. Pypianoroll: Open source python package for handling multitrack pianoroll, 2018.

Garaudé, A. d. Méthode complète de chant: oeuv. 40. A la Classe de Chant de l’Auteur, 1811.

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative adversarial networks. arXiv preprint arXiv:1406.2661 , 2019.

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of wasserstein gans. In Advances in neural information processing systems. pp. 5767–5777, 2017.

Harte, C., Sandler, M., and Gasser, M. Detecting harmonic change in musical audio. In Proceedings of the 1st ACM workshop on Audio and music computing multimedia. ACM, pp. 21–26, 2006.

Herremans, D. and Chew, E. Morpheus: generating structured music with constrained patterns and tension. IEEE Transactions on Affective Computing, 2017.

Mascarenhas, M. 120 Músicas Favoritas Para Piano. Irmãos Vitale, 1961.

McCallum, M. C. Unsupervised learning of deep features for music segmentation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 346–350, 2019.

Mogren, O. C-rnn-gan: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904 , 2016.

Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 , 2015.

Raffel, C. Learning-based methods for comparing sequences, with applications to audio-to-midi alignment and matching. Ph.D. thesis, Columbia University, 2016.

Raffel, C. Midi dataset. [link], 2019. Acessado: 29-09-2019.

Richard, G., Lostanlen, V., Yang, Y.-H., and Müller, M. Model-based deep learning for music information research: Leveraging diverse knowledge sources to enhance explainability, controllability, and resource efficiency [special issue on model-based and data-driven audio signal processing]. IEEE Signal Processing Magazine 41 (6): 51–59, 2025.

Saito, M., Matsumoto, E., and Saito, S. Temporal generative adversarial nets with singular value clipping. In Proceedings of the IEEE International Conference on Computer Vision. pp. 2830–2839, 2017.

Vondrick, C., Pirsiavash, H., and Torralba, A. Generating videos with scene dynamics. In Advances In Neural Information Processing Systems. pp. 613–621, 2016.

Yang, L.-C., Chou, S.-Y., and Yang, Y.-H. Midinet: A convolutional generative adversarial network for symbolic domain music generation. arXiv preprint arXiv:1703.10847 , 2017.

Yu, L., Zhang, W., Wang, J., and Yu, Y. Seqgan: Sequence generative adversarial nets with policy gradient. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.