ABSTRACT
The web has become a large knowledge provider for society, allowing people to not just consume information but also produce it. Collaborative documents bring some significant advantages and decentralization, but they also raise questions concerning its quality. In this work, we explore the quality assessment on collaborative documents using these documents' topics. The proposed approach improved in 3.2% the accuracy of quality assesment of Wikipedia content. Then, the main contribution in this paper is an analysis of how we can use topic modelling in order to improve quality prediction performance.
- Maik Anderka, Benno Stein, and Nedim Lipka. 2012. Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. In Proc. of the 35th SIGIR (SIGIR '12). ACM, New York, NY, USA, 981--990. Google ScholarDigital Library
- R. Arun, V. Suresh, C. E. Veni Madhavan, and M. N. Narasimha Murthy. 2010. On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations. In Advances in Knowledge Discovery and Data Mining, Mohammed J. Zaki, Jeffrey Xu Yu, B. Ravindran, and Vikram Pudi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 391--402.Google Scholar
- David M. Blei and Jon D. McAuliffe. 2007. Supervised Topic Models. In Proceedings of the 20th International Conference on NIPS (NIPS'07). Curran Associates Inc., USA, 121--128. http://dl.acm.org/citation.cfm?id=2981562.2981578Google Scholar
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937Google Scholar
- Joshua E. Blumenstock. 2008. Size Matters: Word Count As a Measure of Quality on Wikipedia. In Proc. of the 17th WWW (WWW '08). ACM, New York, NY, USA, 1095--1096. Google ScholarDigital Library
- Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107 -- 117. Proc. of the 7h WWW. Google ScholarDigital Library
- Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, and Pável Calado. 2011. Automatic Assessment of Document Quality in Web Collaborative Digital Libraries. J. Data and Information Quality 2, 3, Article 14 (Dec. 2011), 30 pages. Google ScholarDigital Library
- Quang-Vinh Dang and Claudia-Lavinia Ignat. 2016. Measuring Quality of Collaboratively Edited Documents: The Case of Wikipedia. In 2016 IEEE 2nd CIC. 266--275. Google ScholarCross Ref
- Gabriel De la Calzada and Alex Dekhtyar. 2010. On Measuring the Quality of Wikipedia Articles. In Proc. of the 4th WICOW (WICOW '10). ACM, New York, NY, USA, 11--18. Google ScholarDigital Library
- Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, and Pável Calado. 2009. Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia. In Proc. of the 9th ACM/IEEE-CS JCDL (JCDL '09). ACM, New York, NY, USA, 295--304. Google ScholarDigital Library
- Aaron Halfaker, R. Stuart Geiger, Jonathan T. Morgan, and John Riedl. 2013. The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist 57, 5 (2013), 664--688.Google ScholarCross Ref
- Alexa Internet. 2019. The top 500 sites on the web. (2019). Retrieved June 21, 2019 from https://www.alexa.com/topsitesGoogle Scholar
- Sara Javanmardi and Cristina Lopes. 2010. Statistical Measure of Quality in Wikipedia. In Proc. of the 1st SOMA (SOMA '10). ACM, New York, NY, USA, 132--138. Google ScholarDigital Library
- Jonathan Leo and Jeffrey Lacasse. 2014. Wikipedia vs peer-reviewed medical literature for information about the 10 most costly medical conditions. 114 (10 2014), 761--4.Google Scholar
- Nedim Lipka and Benno Stein. 2010. Identifying Featured Articles in Wikipedia: Writing Style Matters. In Proc. of the 19th WWW (WWW '10). ACM, New York, NY, USA, 1147--1148. Google ScholarDigital Library
- Alex Primo. 2006. O aspecto relacional das interações na Web 2.0 1. 9 (01 2006).Google Scholar
- Rodrigo R. do Carmo, Anísio M. Lacerda, and Daniel H. Dalip. 2017. A Majority Voting Approach for Sentiment Analysis in Short Texts Using Topic Models. In Proceedings of the 23rd Brazillian Symposium on WebMedia (WebMedia '17). ACM, New York, NY, USA, 449--455. Google ScholarDigital Library
- E. A. Smith, R. J. Senter, and Air Force Aerospace Medical Research Laboratory (U.S.). 1967. Automated Readability Index. Aerospace Medical Research Laboratories. https://books.google.com.br/books?id=HejUGwAACAAJGoogle Scholar
- Yu Suzuki. 2015. Quality Assessment of Wikipedia Articles Using h-index. JIP 23 (2015), 22--30.Google ScholarCross Ref
- Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, Heidelberg.Google Scholar
- Yanxiang Xu and Tiejian Luo. 2011. Measuring article quality in Wikipedia: Lexical clue model. IEEE Symposium on Web Society (10 2011), 141--146. Google ScholarCross Ref
- Jun Zhu, Amr Ahmed, and Eric P. Xing. 2009. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification. In Proceedings of the 26th Annual ICML (ICML '09). ACM, New York, NY, USA, 1257--1264. Google ScholarDigital Library
Index Terms
- Quality assessment of Wikipedia content using topic models
Recommendations
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Identifying Sentence-Level Semantic Content Units with Topic Models
DEXA '10: Proceedings of the 2010 Workshops on Database and Expert Systems ApplicationsStatistical approaches to document content modeling typically focus either on broad topics or on discourse-level subtopics of a text. We present an analysis of the performance of probabilistic topic models on the task of learning sentence-level topics ...
Towards Topic Trend Prediction on a Topic Evolution Model with Social Connection
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01Hot topics are usually those breaking news discussed most at online forums, especially microblogging systems, such as twitter, which helps to learn user concentration and public opinion. This paper focuses on the problem of predicting emerging hot ...
Comments