DANTEStocks-AMR Under Construction: Advances and Challenges in Semantic Annotation of Financial Tweets

  • Gabriel Ceregatto UFSCar
  • Ariani Di Felippo UFSCar

Abstract


Abstract Meaning Representation (AMR) is a formalism widely used in Natural Language Processing (NLP) to represent the meaning of utterances as directed graphs. This work presents the pioneering annotation of DANTEStocks-AMR, a corpus of 4,048 financial tweets in Portuguese, previously annotated with Universal Dependencies (UD). The AMR graphs are being built semi-automatically by adapting both the original English guidelines and those for Portuguese, considering the corpus's specificities and leveraging UD annotations. The paper discusses corpus-specific features that required adjustments to the AMR model and presents statistical data from DANTEStocks-AMR (v. 1), with one quarter of the tweets annotated.

References

Anchiêta, R. and Pardo, T.A.S (2018) “Towards AMR-BR: A SemBank for Brazilian Portuguese Language”, In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan. ELRA.

Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griffitt, K., Hermjakob, U., Knight, K., Palmer, M., Schneider, N. and Xue, N. (2013). “Abstract Meaning Representation for Sembanking”, In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (LAW-ID), Sofia, Bulgaria, p. 178–186.

Barbosa, B.K.S. (2024). Descrição sintático-semântica de nomes predicadores em tweets do mercado financeiro em português. Dissertação de Mestrado, Universidade Federal de São Carlos, São Carlos/SP.

Bateman, J., Matthiessen, C., Nanri, K. and Zeng, L. (1991). “The re-use of linguistic resources across languages in multilingual generation components”, In: Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI), V. 2, p. 966–971.

Brown, T., et al. (2020). Language models are few-shot learners. In Advances in neural information processing systems, 33, pages 1877–1901.

de Marneffe, M-C., Dozat, T., Silveira, N., Hajič, J., Manning, C.D., McDonald, R. and Nivre, J. (2021). Universal Dependencies. In Computational Linguistics, 47(2), pages 1-54.

Di Felippo, A., Nunes, M.G.V., and Barbosa, B.K.S. (2024a). “A dependency treebank of tweets in Brazilian Portuguese: syntactic annotation issues and approach”, In: Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology (STIL), p. 192–201, Porto Alegre, RS, Brasil: SBC.

Duran, M.S., Martins, J.P. e Aluísio, S.M. (2013) “Um repositório de verbos para a anotação de papeis semânticos disponível na web”, In: Anais do 9º Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL), p. 168–172, Fortaleza, CE, Brasil: SBC.

Heinecke, J. (2023). “metAMoRphosED, a graphical editor for Abstract Meaning Representation”, In Proceedings of the 19th Joint ACL-ISO Workshop on Interoperable Semantics (ISA), p. 27–32, Nancy, France: ACL.

Inácio, M.L., Cabezudo, M.A.S., Ramisch, R., Di Felippo, A. and Pardo, T.A.S. (2023) The AMR-PT corpus and the semantic annotation of challenging sentences from journalistic and opinion texts. In DELTA: Documentação e Estudos em Linguística Teórica e Aplicada, 39(3). DOI: 10.1590/1678-460X202339355159

Liu, Y., Deng, G., Xu, Z., Li, Y., Zheng, Y., Zhang, Y., Zhao, L., Wang, K., Zhang, T. and Liu, Y. (2023). “Jailbreaking ChatGPT via prompt engineering: an empirical study”. arXiv preprint arXiv:2305.13860.

May, J.; and Priyadarshi, J. (2017). “SemEval-2017 Task 9: Abstract Meaning Representation parsing and generation”. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval), p. 536–545. Vancouver, Canada.

O’Gorman, T., Regan, M., Griffitt, K., Hermjakob, U., Knight, K. and Palmer, M. (2018) “AMR beyond the sentence: the multi-sentence AMR corpus”, In: Proceedings of the 27th International Conference on Computational Linguistics (COLING), p. 3693–3702, Santa Fe, New Mexico, USA: ACL.

Palmer, M., Gildea, D. and Kingsbury, P. (2005) The Proposition Bank: an annotated corpus of semantic roles, In Computational Linguistics, 31(1), pages 71–106.

Plutchik, R. and Kellerman, H. (1986) “Emotion: Theory, Research and Experience”, New York: Academic Press.

Sanguinetti, M. C. et al. (2023). Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations. In Language Resources Evaluation, 57, pages 493–544.

Scandarolli, C.L., A. Di Felippo, N.T. Roman, and Pardo, T.A.S. (2023). “Tipologia de fenômenos ortográficos e lexicais em CGU: o caso dos tweets do mercado financeiro”, In: Anais do 14º Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana (STIL), p. 240-248, Belo Horizonte, MG, Brasil: SBC.

Seno, E., H. Caseli, M. Inácio, R. Anchiêta, and Ramisch, R. (2022). Xpta: um parser AMR para o Português baseado em uma abordagem entre línguas. In Linguamática, 14, pages 49–68.

Silva, F.J.V., Roman, N.T. and Carvalho, A.M.B.R. (2020). Stock market tweets annotated with emotions. In Corpora, 15(3), pages 343–354.

Sobrevilla Cabezudo, M. A. and Pardo, T.A.S (2019) “Towards a General Abstract Meaning Representation Corpus for Brazilian Portuguese”, In: Proceedings of the 13th Linguistic Annotation Workshop (LAW), p. 236–244, Florence, Italy: ACL.

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. arXiv preprint arXiv:2201.11903.

Wein, S., and J. Bonn (2023). “Comparing umr and cross-lingual adaptations of AMR”, In: Proceedings of the 4th International Workshop on Designing Meaning Representations (DMR), p. 23–33, Nancy, France. ACL.

Zerbinati, M. M., Roman, N. T., and Di-Felippo, A. (2024). “A corpus of stock market tweets annotated with named entities”, In: Proceedings of the 16th International Conference on Computational Processing of Portuguese (PROPOR), V. 1, p. 276–284, Santiago de Compostela, Espanha. ACL.
Published
2025-09-29
CEREGATTO, Gabriel; DI FELIPPO, Ariani. DANTEStocks-AMR Under Construction: Advances and Challenges in Semantic Annotation of Financial Tweets. In: BRAZILIAN SYMPOSIUM IN INFORMATION AND HUMAN LANGUAGE TECHNOLOGY (STIL), 16. , 2025, Fortaleza/CE. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2025 . p. 608-617. DOI: https://doi.org/10.5753/stil.2025.37863.