NounBank.DS: a Lexical Repository of Nominal Frames from Stock Market Tweets in Brazilian Portuguese
Abstract
This paper describes NounBank.DS, a project that provides argument structure for instances of predicate nouns in DANTEStocks, a Dependency-Analyzed corpus of stock market Tweets in Portuguese. NounBank.DS is part of a larger effort to add additional semantic layers of annotation to DANTEStocks. This and other annotation projects taken together should lead to the creation of better tools for the automatic analysis of tweets on the stock market. This paper describes the NounBank.DS project in detail, including its specifications and the process involved in creating the resource.References
Akbik, A., Chiticariu, L., Danilevsky, M., Li, Y., Vaithyanathan, S., and Zhu, H. (2015). Generating high quality proposition Banks for multilingual semantic role labeling. In Zong, C. and Strube, M., editors, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 397–407, Beijing, China. Association for Computational Linguistics.
Barbosa, B. K. d. S. (2024). Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português. Msc dissertation, Universidade Federal de São Carlos (UFSCar), São Carlos, Brazil.
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D., and Xia, F. (2009). A multi-representational and multi-layered treebank for Hindi/Urdu. In Stede, M., Huang, C.-R., Ide, N., and Meyers, A., editors, Proceedings of the Third Linguistic Annotation Workshop (LAW III), pages 186–189, Suntec, Singapore. Association for Computational Linguistics.
Borba, F. d. S. (2002). Dicionário de usos do português do Brasil.
de Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2):255–308.
Deveikyte, J., Geman, H., Piccari, C., and Provetti, A. (2022). A sentiment analysis approach to the prediction of market volatility. Frontiers in Artificial Intelligence, 5.
Di-Felippo, A., das Graças Nunes, M., and Barbosa, B. (2024a). A dependency treebank of tweets in brazilian portuguese: Syntactic annotation issues and approach. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 192–201, Porto Alegre, RS, Brasil. SBC.
Di-Felippo, A., Postali, C., Ceregatto, G., Gazana, L., Silva, E., Roman, N., and Pardo, T. (2021). Descrição preliminar do corpus dantestocks: Diretrizes de segmentação para anotação segundo universal dependencies. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 335–343, Porto Alegre, RS, Brasil. SBC.
Di-Felippo, A., Roman, N., Barbosa, B., and Pardo, T. (2024b). Genipapo a multigenre dependency parser for brazilian portuguese. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 257–266, Porto Alegre, RS, Brasil. SBC.
Di-Felippo, A., Roman, N. T., Pardo, T. A. S., and Panta de Moura, L. (2022). The dantestocks corpus: An analysis of the distribution of universal dependencies-based part of speech tags.
Duran, M. S. and Aluísio, S. (2015). Automatic generation of a lexical resource to support semantic role labeling in Portuguese. In Palmer, M., Boleda, G., and Rosso, P., editors, Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, pages 216–221, Denver, Colorado. Association for Computational Linguistics.
Duran, M. S. and Aluísio, S. M. (2011). Propbank-br: a Brazilian Portuguese corpus annotated with semantic role labels. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology.
Duran, M. S., Martins, J. P., and Aluísio, S. M. (2013). Um repositório de verbos para a anotação de papéis semânticos disponível na web (a verb repository for semantic role labeling available in the web) [in Portuguese]. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.
Fleiss, J. L. (1981). The measurement of interrater agreement. In Statistical Methods for Rates and Proportions, pages 212–236. John Wiley, New York, 2nd edition.
Haverinen, K., Kanerva, J., Kohonen, S., Missilä, A., Ojala, S., Viljanen, T., Laippala, V., and Ginter, F. (2015). The finnish proposition bank. Language Resources and Evaluation, 49(4):907–926.
Jindal, I., Rademaker, A., Ulewicz, M., Linh, H., Nguyen, H., Tran, K.-N., Zhu, H., and Li, Y. (2022). Universal Proposition Bank 2.0. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1700–1711, Marseille, France. European Language Resources Association.
Jurafsky, D. and Martin, J. H. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 3rd edition. Online manuscript released August 20, 2024.
Li, X., Chen, H., Liu, C., Li, J., Zhang, M., Yu, J., and Zhang, M. (2025). Llms can also do well! breaking barriers in semantic role labeling via large language models.
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
Meyers, A. (2007). Annotation guidelines for nombank noun argument structure for propbank. Technical report, Tech Report – New York University.
Mirzaei, A. and Moloodi, A. (2016). Persian Proposition Bank. In Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3828–3835, Portorož, Slovenia. European Language Resources Association (ELRA).
Moeller, S., Wagner, I., Palmer, M., Conger, K., and Myers, S. (2020). The Russian PropBank. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5995–6002, Marseille, France. European Language Resources Association.
Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., and Zeman, D. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.
O’Gorman, T., Pradhan, S., Palmer, M., Bonn, J., Conger, K., and Gung, J. (2018). The new Propbank: Aligning Propbank with AMR through POS unification. In Calzolari, N., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., and Tokunaga, T., editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Palmer, M., Gildea, D., and Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71–106.
Palmer, Martha, Ryu, Shijong, Choi, Jinyoung, Yoon, Sinwon, and Jeon, Yeongmi (2006). Korean propbank.
Pardo, T., Duran, M., Lopes, L., Felippo, A., Roman, N., and Nunes, M. (2021). Porttinari a large multi-genre treebank for brazilian portuguese. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 1–10, Porto Alegre, RS, Brasil. SBC.
Pradhan, S., Bonn, J., Myers, S., Conger, K., O’gorman, T., Gung, J., Wright-bettner, K., and Palmer, M. (2022). PropBank comes of Age—Larger, smarter, and more diverse. In Nastase, V., Pavlick, E., Pilehvar, M. T., Camacho-Collados, J., and Raganato, A., editors, Proceedings of the 11th Joint Conference on Lexical and Computational Semantics, pages 278–288, Seattle, Washington. Association for Computational Linguistics.
Pradhan, S., Moschitti, A., Xue, N., Ng, H. T., Björkelund, A., Uryupina, O., Zhang, Y., and Zhong, Z. (2013). Towards robust linguistic analysis using OntoNotes. In Hockenmaier, J. and Riedel, S., editors, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 143–152, Sofia, Bulgaria. Association for Computational Linguistics.
Scandarolli, C. L., Di-Felippo, A., Roman, N. T., and Pardo, T. A. S. (2023). Tipologia de fenômenos ortográficos e lexicais em cgu: o caso dos tweets do mercado financeiro. In Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana STIL. SBC.
Silva, E., Pardo, T., Roman, N., and Di-Fellipo, A. (2021). Universal dependencies for tweets in brazilian portuguese: Tokenization and part of speech tagging. In Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional, pages 434–445, Porto Alegre, RS, Brasil. SBC.
Taylor, A., Marcus, M. P., and Santorini, B. (2003). The penn treebank: An overview. In Abeillé, A., editor, Treebanks: Building and Using Parsed Corpora, pages 5–22. Springer, Dordrecht.
Voskaki, R., Tziafa, E., and Annidou, K. (2016). Description of predicative nouns in a modern greek financial corpus. In Selected Papers of the 21st International Symposium on Theoretical and Applied Linguistics (ISTAL), pages 488–503.
Şahin, G. G. and Adalı, E. (2017). Annotation of semantic roles for the turkish proposition bank. Language Resources and Evaluation, 52(3):673–706.
Barbosa, B. K. d. S. (2024). Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português. Msc dissertation, Universidade Federal de São Carlos (UFSCar), São Carlos, Brazil.
Bhatt, R., Narasimhan, B., Palmer, M., Rambow, O., Sharma, D., and Xia, F. (2009). A multi-representational and multi-layered treebank for Hindi/Urdu. In Stede, M., Huang, C.-R., Ide, N., and Meyers, A., editors, Proceedings of the Third Linguistic Annotation Workshop (LAW III), pages 186–189, Suntec, Singapore. Association for Computational Linguistics.
Borba, F. d. S. (2002). Dicionário de usos do português do Brasil.
de Marneffe, M.-C., Manning, C. D., Nivre, J., and Zeman, D. (2021). Universal Dependencies. Computational Linguistics, 47(2):255–308.
Deveikyte, J., Geman, H., Piccari, C., and Provetti, A. (2022). A sentiment analysis approach to the prediction of market volatility. Frontiers in Artificial Intelligence, 5.
Di-Felippo, A., das Graças Nunes, M., and Barbosa, B. (2024a). A dependency treebank of tweets in brazilian portuguese: Syntactic annotation issues and approach. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 192–201, Porto Alegre, RS, Brasil. SBC.
Di-Felippo, A., Postali, C., Ceregatto, G., Gazana, L., Silva, E., Roman, N., and Pardo, T. (2021). Descrição preliminar do corpus dantestocks: Diretrizes de segmentação para anotação segundo universal dependencies. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 335–343, Porto Alegre, RS, Brasil. SBC.
Di-Felippo, A., Roman, N., Barbosa, B., and Pardo, T. (2024b). Genipapo a multigenre dependency parser for brazilian portuguese. In Anais do XV Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 257–266, Porto Alegre, RS, Brasil. SBC.
Di-Felippo, A., Roman, N. T., Pardo, T. A. S., and Panta de Moura, L. (2022). The dantestocks corpus: An analysis of the distribution of universal dependencies-based part of speech tags.
Duran, M. S. and Aluísio, S. (2015). Automatic generation of a lexical resource to support semantic role labeling in Portuguese. In Palmer, M., Boleda, G., and Rosso, P., editors, Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, pages 216–221, Denver, Colorado. Association for Computational Linguistics.
Duran, M. S. and Aluísio, S. M. (2011). Propbank-br: a Brazilian Portuguese corpus annotated with semantic role labels. In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology.
Duran, M. S., Martins, J. P., and Aluísio, S. M. (2013). Um repositório de verbos para a anotação de papéis semânticos disponível na web (a verb repository for semantic role labeling available in the web) [in Portuguese]. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology.
Fleiss, J. L. (1981). The measurement of interrater agreement. In Statistical Methods for Rates and Proportions, pages 212–236. John Wiley, New York, 2nd edition.
Haverinen, K., Kanerva, J., Kohonen, S., Missilä, A., Ojala, S., Viljanen, T., Laippala, V., and Ginter, F. (2015). The finnish proposition bank. Language Resources and Evaluation, 49(4):907–926.
Jindal, I., Rademaker, A., Ulewicz, M., Linh, H., Nguyen, H., Tran, K.-N., Zhu, H., and Li, Y. (2022). Universal Proposition Bank 2.0. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Odijk, J., and Piperidis, S., editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1700–1711, Marseille, France. European Language Resources Association.
Jurafsky, D. and Martin, J. H. (2025). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models. 3rd edition. Online manuscript released August 20, 2024.
Li, X., Chen, H., Liu, C., Li, J., Zhang, M., Yu, J., and Zhang, M. (2025). Llms can also do well! breaking barriers in semantic role labeling via large language models.
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330.
Meyers, A. (2007). Annotation guidelines for nombank noun argument structure for propbank. Technical report, Tech Report – New York University.
Mirzaei, A. and Moloodi, A. (2016). Persian Proposition Bank. In Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 3828–3835, Portorož, Slovenia. European Language Resources Association (ELRA).
Moeller, S., Wagner, I., Palmer, M., Conger, K., and Myers, S. (2020). The Russian PropBank. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5995–6002, Marseille, France. European Language Resources Association.
Nivre, J., de Marneffe, M.-C., Ginter, F., Hajič, J., Manning, C. D., Pyysalo, S., Schuster, S., Tyers, F., and Zeman, D. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., and Piperidis, S., editors, Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4034–4043, Marseille, France. European Language Resources Association.
O’Gorman, T., Pradhan, S., Palmer, M., Bonn, J., Conger, K., and Gung, J. (2018). The new Propbank: Aligning Propbank with AMR through POS unification. In Calzolari, N., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Hasida, K., Isahara, H., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., Piperidis, S., and Tokunaga, T., editors, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
Palmer, M., Gildea, D., and Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71–106.
Palmer, Martha, Ryu, Shijong, Choi, Jinyoung, Yoon, Sinwon, and Jeon, Yeongmi (2006). Korean propbank.
Pardo, T., Duran, M., Lopes, L., Felippo, A., Roman, N., and Nunes, M. (2021). Porttinari a large multi-genre treebank for brazilian portuguese. In Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pages 1–10, Porto Alegre, RS, Brasil. SBC.
Pradhan, S., Bonn, J., Myers, S., Conger, K., O’gorman, T., Gung, J., Wright-bettner, K., and Palmer, M. (2022). PropBank comes of Age—Larger, smarter, and more diverse. In Nastase, V., Pavlick, E., Pilehvar, M. T., Camacho-Collados, J., and Raganato, A., editors, Proceedings of the 11th Joint Conference on Lexical and Computational Semantics, pages 278–288, Seattle, Washington. Association for Computational Linguistics.
Pradhan, S., Moschitti, A., Xue, N., Ng, H. T., Björkelund, A., Uryupina, O., Zhang, Y., and Zhong, Z. (2013). Towards robust linguistic analysis using OntoNotes. In Hockenmaier, J. and Riedel, S., editors, Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 143–152, Sofia, Bulgaria. Association for Computational Linguistics.
Scandarolli, C. L., Di-Felippo, A., Roman, N. T., and Pardo, T. A. S. (2023). Tipologia de fenômenos ortográficos e lexicais em cgu: o caso dos tweets do mercado financeiro. In Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana STIL. SBC.
Silva, E., Pardo, T., Roman, N., and Di-Fellipo, A. (2021). Universal dependencies for tweets in brazilian portuguese: Tokenization and part of speech tagging. In Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional, pages 434–445, Porto Alegre, RS, Brasil. SBC.
Taylor, A., Marcus, M. P., and Santorini, B. (2003). The penn treebank: An overview. In Abeillé, A., editor, Treebanks: Building and Using Parsed Corpora, pages 5–22. Springer, Dordrecht.
Voskaki, R., Tziafa, E., and Annidou, K. (2016). Description of predicative nouns in a modern greek financial corpus. In Selected Papers of the 21st International Symposium on Theoretical and Applied Linguistics (ISTAL), pages 488–503.
Şahin, G. G. and Adalı, E. (2017). Annotation of semantic roles for the turkish proposition bank. Language Resources and Evaluation, 52(3):673–706.
Published
2025-09-29
How to Cite
BARBOSA, Bryan K. S.; DI FELIPPO, Ariani.
NounBank.DS: a Lexical Repository of Nominal Frames from Stock Market Tweets in Brazilian Portuguese. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2025
.
p. 29-41.
DOI: https://doi.org/10.5753/stil.2025.37811.
