WebFeatures: A Web Tool to Extract Features from Collaborative Content

Aline C. Pinto; Beatriz S.  Silva; Priscilla R. M.  Carmo; Raphael L. A.  Lima; Larisse S. P.  Amorim; Rubio T. C.  Viana; Daniel H.  Dalip; Poliana A. C. Oliveira

doi:10.5753/webmedia_estendido.2020.13071

Aline C. Pinto CEFET-MG
Beatriz S. Silva CEFET-MG
Priscilla R. M. Carmo CEFET-MG
Raphael L. A. Lima CEFET-MG
Larisse S. P. Amorim CEFET-MG
Rubio T. C. Viana CEFET-MG
Daniel H. Dalip CEFET-MG
Poliana A. C. Oliveira CEFET-MG

DOI: https://doi.org/10.5753/webmedia_estendido.2020.13071

Resumo

The production from collaborative web content has grown in recent years. Thus, exploring the quality of these data repositories has also become relevant. This work proposes to develop a tool called WebFeature. Such system allows one to manage, extract, and share quality related feature sets from text, graph and article review. To accomplish this, diff erent types of metrics were implemented based on structure, style, and readability of the texts. In order to evalu- ate the WebFeature applicability, we presented a scenario with its main functionalities (creation of a feature set, extraction of features from a known dataset, and publishing the feature set). Our demon- stration shows that this framework can be useful for extracting features automatically, supporting quality prediction of collabo- rative contents, analyzing text characterization, and improving research reproducibility.

Referências

Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, and GiladMishne. 2008. Finding high-quality content in social media. In Proceedings of the2008 international conference on web search and data mining. ACM, 183–194.

Matheus Araujo, Joao P Diniz, Lucas Bastos, Elias Soares, Miller Ferreira, FilipeRibeiro, and Fabrıcio Benevenuto. 2016. iFeel 2.0: A Multilingual Benchmarking System for Sentence-Level Sentiment Analysis. In Tenth International AAAI Conference on Web and Social Media.

C. Bigonha, T. N. Cardoso, M. M. Moro, V. Almeida, and Marcos André Gonçalves.2010. Detecting Evangelists and Detractors on Twitter. In Anais do Simpósio Brasileiro de Sistemas Multimídia e Web. Belo Horizonte, Minas Gerais, Brazil,107–114.

Wladmir C Brandão, Rodrygo LT Santos, Nivio Ziviani, Edleno S Moura, and Altigran S Silva. 2014. Learning to Expand Queries using Entities. Journal of the Association for Information Science and Technology 65, 9 (2014), 1870–1883.

Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, and Pável Calado. 2011.Automatic assessment of document quality in web collaborative digital libraries. Journal of Data and Information Quality (JDIQ) 2, 3 (2011), 14.

Daniel H. Dalip, Marcos André Gonçalves, Marco Cristo, and Pável Calado. 2017.A general multiview framework for assessing the quality of collaboratively cre-ated content on web 2.0. Journal of the Association for Information Science andTechnology 68, 2 (2017), 286–308. https://doi.org/10.1002/asi.23650

R. Flesch. 1948. A New Readability Yardstick. Journal of Applied Psychology(1948), 221–235.

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. 2009. The WEKA data mining software: an update. ACMSIGKDD explorations newsletter 11, 1 (2009), 10–18.

G. Harry McLaughlin. 1969. SMOG grading: A new readability formula. Journal of Reading (1969), 639–646.

Thomas M. Mitchell. 1997. Machine Learning. McGraw-Hill Higher Education.

Sandy Ressler. 1993. Perspectives on Electronic Publishing: Standards, Solutions, and More. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

E. A. Smith and R. J. Senter. 1967. Automated Readability Index. Aerospace Medical Division (1967).