Desafios do Processamento de Línguas Naturais
Resumo
Este artigo apresenta o desafio do Processamento de Línguas Naturais e, em especial, da Língua Portuguesa, no âmbito da Ciência da Computação e suas disciplinas. Questões relacionadas ao processamento da língua se associam aos desafios do acesso ao conhecimento, da gestão da informação em grandes volumes de dados e dos problemas complexos e interdisciplinares da modelagem computacional de sistemas artificiais, naturais e sócio-culturais. O processamento da língua portuguesa constitui demanda crucial para o acesso participativo e universal do cidadão brasileiro ao conhecimento. A Computação, nesse cenário, é chamada ao papel principal.Referências
Abreu, S. C. et al. (2007) Summit: um corpus anotado com informações discursivas visando sumarização automática. In: V TIL, 2007, Rio de Janeiro. Congresso da SBC, 2007 (a ser publicado).
Abreu, S. C.; Vieira, R. (2006) Learning Portuguese Discourse-new References. In: IFIP 19th World Computer Congress, TC-12 IFIP AI 2006 Stream. Berlin: Springer, 2006. v. 217. p. 267-276.
Aluísio, S.M. et al. (2003) The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation. In: Proceedings of Corpus Linguistics, Vol. 16, pp.14-21.
Arcoverde, J.M.A.; Nunes, M.G.V.; Scardua, W. (2006) Using noun phrases for local analysis in automatic query expansion. Cross Language Evaluation Forum – CLEF 2006. Alicante, ES. Taller Digital, pp.1-4.
Bick, E. (2000). The Parsing System Palavras - Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Famework. Aarhus: Aarhus University Press.
Chaves, M.S.; Strube de Lima, V.L. (2004) Looking for Similarity between Ontological Structures. In: Branco, A.; Mendes, A.; Ribeiro, R. (Org.). Language Technology for Portuguese: shallow processing tools and resources. Lisboa: Edições. Colibri, v.1 pp.1-14.
Chklovski, T. and Pantel, P. (2004) VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04). Barcelona.
Cimiano, P., Hotho, A., Staab, S. (2005) Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research (JAIR) 24: 305-339.
Coelho J. C. B. et al. (2006) Resolving Nominal Anaphora. In: PROPOR - Workshop for Processing of Portuguese Language, Itatiaia. Lecture Notes in Artificial Intelligence 3960. Berlin: SPRINGER. pp. 160-169.
Dias-da-Silva, B.C.; Di Felippo, A.; Hasegawa, R. (2006) Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations. In: PROPOR - Workshop for Processing of Portuguese Language, Itatiaia. Lecture Notes in Artificial Intelligence 3960. Berlin: SPRINGER. pp.120-130.
Dorr, B.J.; Jordan, P.W.; Benoit, J.W. (2000). A Survey of Current Paradigms in Machine Translation. In: M. Zelkowitz (Ed.) Advances in Computers, Vol.49, pp.1-68. Academic Press, London.
Fellbaum, C. (Ed.) (1998) Wordnet: an electronic lexical database. Cambridge, MIT Press, 1998.
Feng-Yang Kuo et al. (2004) An investigation of effort-accuracy trade-off and the impact of self-efficacy on Web searching behaviors. Decision Support Systems, v.37 n.3, pp.331-342.
Gamallo, P. et al. (2005) Using Syntax-based methods for extracting semantic information. Linguistica Computazionale, Pisa-Roma, v.XXII, n.IV, pp.201-229.
Gonzalez, M.A.I.; Strube de Lima, V.L. (2004) Redefinig traditional lexical semantic relations with Qualia information. Palavra (PUCRJ), Rio de Janeiro - RJ, v.12, pp.25-36.
Gonzalez, M.A.I. (2005) Termos e relacionamentos em evidência na recuperação de informação. Tese de Doutoramento. UFRGS.
Gonzalez, M.A.I.; Strube de Lima, V.L.; Lima, J.V. (2006) Tools for nominalization: an alternative for lexical normalization. In: PROPOR Workshop for Processing of Portuguese Language, Itatiaia. Lecture Notes in Artificial Intelligence 3960. Berlin: SPRINGER. pp.100-109.
Grosz, B. and Sidner, C. (1986). Attention, Intentions, and the Structure of Discourse. Computational Linguistics, Vol. 12, N. 3.
Hirschman, L. et al. (2005). Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 2005, 6 (Suppl 1):S1 DOI: 10.1186/1471-2105-6-S1-S1. [link].
Kingsbury, P. and Palmer, M. (2002). From Treebank to PropBank. In: Proceedings of the 3rd Int. Conf. on Language Resources and Evaluation, Las Palmas.
Knight, K. and Marcu, D. (2005). Machine Translation in Year 2004. In: Proceedings of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp.18-23. Philadelphia, PA.
Koppel, M., Shtrimberg, I. (2004) Good News or Bad News? Let the Market Decide. In: Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text. Palo Alto: AAAI Press. pp. 86-88.
Leite, D.S. et al. (2007) Extractive Automatic Summarization: Does more linguistic knowledge make a difference? In: Proceedings of the Workshop on TextGraphs-2 Graph-Based Algorithms for Natural Language Processing (associado ao HLT/NAACL 2007), Rochester, USA. v. 1. pp. 17-24.
Mani, I. (2001). Automatic Summarization. John Benjamins Pub. Co. Amsterdam.
Mann, W.C., Thompson, S.A. (1987). Rhetorical Structure Theory: A Theory of Text Organization. Technical Report ISI/RS-87-190.
Manning, C. and Schütze, H. (1999) Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press.
Marcu, D., Carlson, L. and Watanabe, M. (2000). The Automatic Translation of Discourse Structures. In: Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'2000), Seattle, Washington.
Marcus, M.; Santorini, B.; Marcinkiewicz, M.A. (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, Vol.19, N. 2, pp.313-330.
Martins, R. T.; Hasegawa, R.; Nunes, M.G.V. (2003) Curupira: a functional parser for Brazilian Portuguese. In: PROPOR - International Workshop on the Computational Processing of Portuguese, Faro. Lecture Notes in Computer Science 2721 Berlin: SPRINGER.
Nunes, M.G.V. et al. (1996) (In Portuguese) Development of a parser for Brazilian Portuguese. In: Proceedings of the 2nd Workshop on Computational Processing of Written and Spoken Portuguese. Curitiba: CEFET-PR. pp.71-80.
Och, F.J. et al. (2004). A Smorgasbord of Features for Statistical Machine Translation. In the Proceedings of HLT/NAACL.
Pardo, T.A.S.; Rino, L.H.M.; Nunes, M.G.V. (2003). GistSumm: A Summarization Tool Based on a New Extractive Method. In: PROPOR - International Workshop on the Computational Processing of Portuguese, Faro. Lecture Notes in Computer Science 2721. Berlin: SPRINGER. pp. 210-218.
Pardo, T.A.S.; Nunes, M.G.V. (2006). DiZer – an Automatic Discourse Analyzer for Brazilian Portuguese. In: Proceedings of the V Best MSc Dissertation/PhD Thesis Contest – CTDIA. Ribeirão Preto-SP, Brazil.
Pustejovsky, J. (1996) The Generative Lexicon, MIT Press. Cambridge, MA.
Rino, L.H.M. et al. (2004). A Comparison of Automatic Summarization Systems for Brazilian Portuguese Texts. In: Proceedings of the 17th Brazilian Symposium on Artificial Intelligence – SBIA. Lecture Notes in Artificial Intelligence 3171. Berlin: SPRINGER. pp.235-244.
Santos, C.N. (2005) Aprendizado de máquina na identificação de sintagmas nominais: o caso do português brasileiro. Dissertação de Mestrado. IME-RJ.
Silva, J. P. M. et al. (2006) Exploring molecular networks using MONETontology. Genetics and Molecular Research, v. 5, n. 1, pp. 182-192.
Singh, P. et al. (2002). Open Mind Common Sense: Knowledge acquisition from the general public. In: Proceedings of the First Int. Conf. on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems. Lecture Notes in Computer Science. Heidelberg: Springer-Verlag.
Staab, S., Studer, R. (Eds.) (2004) Handbook on Ontologies. International Handbooks on Information Systems, Springer Verlag.
Strube de Lima, V., Abrahão, P.R.C., Paraboni, I. (1997) Approaching the dictionary in the implementation of a natural language processing system: toward a distributed structure. International Informatics Series 8. Baeza-Yates, R. (Ed.) Fourth South American Workshop on String Processing, WSP’97, Valparaíso – Chile. Carleton University Press, Ottawa – Canada.
Tavares, O.L. et al. (2006) O Sistema Falibras-MT como Ferramenta de Apoio Pedagógico. In: Anais do IV Congresso Ibero-Americano Sobre Tecnologias de Apoio a Portadores de Deficiência, Vitória. v. II. pp. CO-109-CO-112.
Voorhees, E. (1999) Natural Language Processing and Information Retrieval. In: Pazienza, M.T. (Ed.) Information Extraction: Towards Scalable, Adaptable Systems, Lecture Notes in Artificial Intelligence 1714. Berlin: SPRINGER.
Vossen, P. (2004) Ontologies. In: Mitkov, R. (Ed.) The Oxford handbook of Computational Linguistics. Oxford: Oxford University Press. pp.464-82.
Abreu, S. C.; Vieira, R. (2006) Learning Portuguese Discourse-new References. In: IFIP 19th World Computer Congress, TC-12 IFIP AI 2006 Stream. Berlin: Springer, 2006. v. 217. p. 267-276.
Aluísio, S.M. et al. (2003) The Lacio-Web Project: overview and issues in Brazilian Portuguese corpora creation. In: Proceedings of Corpus Linguistics, Vol. 16, pp.14-21.
Arcoverde, J.M.A.; Nunes, M.G.V.; Scardua, W. (2006) Using noun phrases for local analysis in automatic query expansion. Cross Language Evaluation Forum – CLEF 2006. Alicante, ES. Taller Digital, pp.1-4.
Bick, E. (2000). The Parsing System Palavras - Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Famework. Aarhus: Aarhus University Press.
Chaves, M.S.; Strube de Lima, V.L. (2004) Looking for Similarity between Ontological Structures. In: Branco, A.; Mendes, A.; Ribeiro, R. (Org.). Language Technology for Portuguese: shallow processing tools and resources. Lisboa: Edições. Colibri, v.1 pp.1-14.
Chklovski, T. and Pantel, P. (2004) VerbOcean: Mining the Web for Fine-Grained Semantic Verb Relations. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04). Barcelona.
Cimiano, P., Hotho, A., Staab, S. (2005) Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis. Journal of Artificial Intelligence Research (JAIR) 24: 305-339.
Coelho J. C. B. et al. (2006) Resolving Nominal Anaphora. In: PROPOR - Workshop for Processing of Portuguese Language, Itatiaia. Lecture Notes in Artificial Intelligence 3960. Berlin: SPRINGER. pp. 160-169.
Dias-da-Silva, B.C.; Di Felippo, A.; Hasegawa, R. (2006) Methods and Tools for Encoding the WordNet.Br Sentences, Concept Glosses, and Conceptual-Semantic Relations. In: PROPOR - Workshop for Processing of Portuguese Language, Itatiaia. Lecture Notes in Artificial Intelligence 3960. Berlin: SPRINGER. pp.120-130.
Dorr, B.J.; Jordan, P.W.; Benoit, J.W. (2000). A Survey of Current Paradigms in Machine Translation. In: M. Zelkowitz (Ed.) Advances in Computers, Vol.49, pp.1-68. Academic Press, London.
Fellbaum, C. (Ed.) (1998) Wordnet: an electronic lexical database. Cambridge, MIT Press, 1998.
Feng-Yang Kuo et al. (2004) An investigation of effort-accuracy trade-off and the impact of self-efficacy on Web searching behaviors. Decision Support Systems, v.37 n.3, pp.331-342.
Gamallo, P. et al. (2005) Using Syntax-based methods for extracting semantic information. Linguistica Computazionale, Pisa-Roma, v.XXII, n.IV, pp.201-229.
Gonzalez, M.A.I.; Strube de Lima, V.L. (2004) Redefinig traditional lexical semantic relations with Qualia information. Palavra (PUCRJ), Rio de Janeiro - RJ, v.12, pp.25-36.
Gonzalez, M.A.I. (2005) Termos e relacionamentos em evidência na recuperação de informação. Tese de Doutoramento. UFRGS.
Gonzalez, M.A.I.; Strube de Lima, V.L.; Lima, J.V. (2006) Tools for nominalization: an alternative for lexical normalization. In: PROPOR Workshop for Processing of Portuguese Language, Itatiaia. Lecture Notes in Artificial Intelligence 3960. Berlin: SPRINGER. pp.100-109.
Grosz, B. and Sidner, C. (1986). Attention, Intentions, and the Structure of Discourse. Computational Linguistics, Vol. 12, N. 3.
Hirschman, L. et al. (2005). Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 2005, 6 (Suppl 1):S1 DOI: 10.1186/1471-2105-6-S1-S1. [link].
Kingsbury, P. and Palmer, M. (2002). From Treebank to PropBank. In: Proceedings of the 3rd Int. Conf. on Language Resources and Evaluation, Las Palmas.
Knight, K. and Marcu, D. (2005). Machine Translation in Year 2004. In: Proceedings of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp.18-23. Philadelphia, PA.
Koppel, M., Shtrimberg, I. (2004) Good News or Bad News? Let the Market Decide. In: Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text. Palo Alto: AAAI Press. pp. 86-88.
Leite, D.S. et al. (2007) Extractive Automatic Summarization: Does more linguistic knowledge make a difference? In: Proceedings of the Workshop on TextGraphs-2 Graph-Based Algorithms for Natural Language Processing (associado ao HLT/NAACL 2007), Rochester, USA. v. 1. pp. 17-24.
Mani, I. (2001). Automatic Summarization. John Benjamins Pub. Co. Amsterdam.
Mann, W.C., Thompson, S.A. (1987). Rhetorical Structure Theory: A Theory of Text Organization. Technical Report ISI/RS-87-190.
Manning, C. and Schütze, H. (1999) Foundations of Statistical Natural Language Processing, Cambridge, MA: MIT Press.
Marcu, D., Carlson, L. and Watanabe, M. (2000). The Automatic Translation of Discourse Structures. In: Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'2000), Seattle, Washington.
Marcus, M.; Santorini, B.; Marcinkiewicz, M.A. (1993). Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, Vol.19, N. 2, pp.313-330.
Martins, R. T.; Hasegawa, R.; Nunes, M.G.V. (2003) Curupira: a functional parser for Brazilian Portuguese. In: PROPOR - International Workshop on the Computational Processing of Portuguese, Faro. Lecture Notes in Computer Science 2721 Berlin: SPRINGER.
Nunes, M.G.V. et al. (1996) (In Portuguese) Development of a parser for Brazilian Portuguese. In: Proceedings of the 2nd Workshop on Computational Processing of Written and Spoken Portuguese. Curitiba: CEFET-PR. pp.71-80.
Och, F.J. et al. (2004). A Smorgasbord of Features for Statistical Machine Translation. In the Proceedings of HLT/NAACL.
Pardo, T.A.S.; Rino, L.H.M.; Nunes, M.G.V. (2003). GistSumm: A Summarization Tool Based on a New Extractive Method. In: PROPOR - International Workshop on the Computational Processing of Portuguese, Faro. Lecture Notes in Computer Science 2721. Berlin: SPRINGER. pp. 210-218.
Pardo, T.A.S.; Nunes, M.G.V. (2006). DiZer – an Automatic Discourse Analyzer for Brazilian Portuguese. In: Proceedings of the V Best MSc Dissertation/PhD Thesis Contest – CTDIA. Ribeirão Preto-SP, Brazil.
Pustejovsky, J. (1996) The Generative Lexicon, MIT Press. Cambridge, MA.
Rino, L.H.M. et al. (2004). A Comparison of Automatic Summarization Systems for Brazilian Portuguese Texts. In: Proceedings of the 17th Brazilian Symposium on Artificial Intelligence – SBIA. Lecture Notes in Artificial Intelligence 3171. Berlin: SPRINGER. pp.235-244.
Santos, C.N. (2005) Aprendizado de máquina na identificação de sintagmas nominais: o caso do português brasileiro. Dissertação de Mestrado. IME-RJ.
Silva, J. P. M. et al. (2006) Exploring molecular networks using MONETontology. Genetics and Molecular Research, v. 5, n. 1, pp. 182-192.
Singh, P. et al. (2002). Open Mind Common Sense: Knowledge acquisition from the general public. In: Proceedings of the First Int. Conf. on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems. Lecture Notes in Computer Science. Heidelberg: Springer-Verlag.
Staab, S., Studer, R. (Eds.) (2004) Handbook on Ontologies. International Handbooks on Information Systems, Springer Verlag.
Strube de Lima, V., Abrahão, P.R.C., Paraboni, I. (1997) Approaching the dictionary in the implementation of a natural language processing system: toward a distributed structure. International Informatics Series 8. Baeza-Yates, R. (Ed.) Fourth South American Workshop on String Processing, WSP’97, Valparaíso – Chile. Carleton University Press, Ottawa – Canada.
Tavares, O.L. et al. (2006) O Sistema Falibras-MT como Ferramenta de Apoio Pedagógico. In: Anais do IV Congresso Ibero-Americano Sobre Tecnologias de Apoio a Portadores de Deficiência, Vitória. v. II. pp. CO-109-CO-112.
Voorhees, E. (1999) Natural Language Processing and Information Retrieval. In: Pazienza, M.T. (Ed.) Information Extraction: Towards Scalable, Adaptable Systems, Lecture Notes in Artificial Intelligence 1714. Berlin: SPRINGER.
Vossen, P. (2004) Ontologies. In: Mitkov, R. (Ed.) The Oxford handbook of Computational Linguistics. Oxford: Oxford University Press. pp.464-82.
Publicado
30/06/2007
Como Citar
LIMA, Vera Lúcia Strube de; NUNES, Maria das Graças Volpe; VIEIRA, Renata.
Desafios do Processamento de Línguas Naturais. In: SEMINÁRIO INTEGRADO DE SOFTWARE E HARDWARE (SEMISH), 34. , 2007, Rio de Janeiro/RJ.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2007
.
p. 2202-2216.
ISSN 2595-6205.
