skip to main content
10.1145/3323503.3360628acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
short-paper

Quality assessment of Wikipedia content using topic models

Authors Info & Claims
Published:29 October 2019Publication History

ABSTRACT

The web has become a large knowledge provider for society, allowing people to not just consume information but also produce it. Collaborative documents bring some significant advantages and decentralization, but they also raise questions concerning its quality. In this work, we explore the quality assessment on collaborative documents using these documents' topics. The proposed approach improved in 3.2% the accuracy of quality assesment of Wikipedia content. Then, the main contribution in this paper is an analysis of how we can use topic modelling in order to improve quality prediction performance.

References

  1. Maik Anderka, Benno Stein, and Nedim Lipka. 2012. Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. In Proc. of the 35th SIGIR (SIGIR '12). ACM, New York, NY, USA, 981--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Arun, V. Suresh, C. E. Veni Madhavan, and M. N. Narasimha Murthy. 2010. On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations. In Advances in Knowledge Discovery and Data Mining, Mohammed J. Zaki, Jeffrey Xu Yu, B. Ravindran, and Vikram Pudi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 391--402.Google ScholarGoogle Scholar
  3. David M. Blei and Jon D. McAuliffe. 2007. Supervised Topic Models. In Proceedings of the 20th International Conference on NIPS (NIPS'07). Curran Associates Inc., USA, 121--128. http://dl.acm.org/citation.cfm?id=2981562.2981578Google ScholarGoogle Scholar
  4. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937Google ScholarGoogle Scholar
  5. Joshua E. Blumenstock. 2008. Size Matters: Word Count As a Measure of Quality on Wikipedia. In Proc. of the 17th WWW (WWW '08). ACM, New York, NY, USA, 1095--1096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107 -- 117. Proc. of the 7h WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, and Pável Calado. 2011. Automatic Assessment of Document Quality in Web Collaborative Digital Libraries. J. Data and Information Quality 2, 3, Article 14 (Dec. 2011), 30 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Quang-Vinh Dang and Claudia-Lavinia Ignat. 2016. Measuring Quality of Collaboratively Edited Documents: The Case of Wikipedia. In 2016 IEEE 2nd CIC. 266--275. Google ScholarGoogle ScholarCross RefCross Ref
  9. Gabriel De la Calzada and Alex Dekhtyar. 2010. On Measuring the Quality of Wikipedia Articles. In Proc. of the 4th WICOW (WICOW '10). ACM, New York, NY, USA, 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, and Pável Calado. 2009. Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia. In Proc. of the 9th ACM/IEEE-CS JCDL (JCDL '09). ACM, New York, NY, USA, 295--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Aaron Halfaker, R. Stuart Geiger, Jonathan T. Morgan, and John Riedl. 2013. The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist 57, 5 (2013), 664--688.Google ScholarGoogle ScholarCross RefCross Ref
  12. Alexa Internet. 2019. The top 500 sites on the web. (2019). Retrieved June 21, 2019 from https://www.alexa.com/topsitesGoogle ScholarGoogle Scholar
  13. Sara Javanmardi and Cristina Lopes. 2010. Statistical Measure of Quality in Wikipedia. In Proc. of the 1st SOMA (SOMA '10). ACM, New York, NY, USA, 132--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jonathan Leo and Jeffrey Lacasse. 2014. Wikipedia vs peer-reviewed medical literature for information about the 10 most costly medical conditions. 114 (10 2014), 761--4.Google ScholarGoogle Scholar
  15. Nedim Lipka and Benno Stein. 2010. Identifying Featured Articles in Wikipedia: Writing Style Matters. In Proc. of the 19th WWW (WWW '10). ACM, New York, NY, USA, 1147--1148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alex Primo. 2006. O aspecto relacional das interações na Web 2.0 1. 9 (01 2006).Google ScholarGoogle Scholar
  17. Rodrigo R. do Carmo, Anísio M. Lacerda, and Daniel H. Dalip. 2017. A Majority Voting Approach for Sentiment Analysis in Short Texts Using Topic Models. In Proceedings of the 23rd Brazillian Symposium on WebMedia (WebMedia '17). ACM, New York, NY, USA, 449--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. A. Smith, R. J. Senter, and Air Force Aerospace Medical Research Laboratory (U.S.). 1967. Automated Readability Index. Aerospace Medical Research Laboratories. https://books.google.com.br/books?id=HejUGwAACAAJGoogle ScholarGoogle Scholar
  19. Yu Suzuki. 2015. Quality Assessment of Wikipedia Articles Using h-index. JIP 23 (2015), 22--30.Google ScholarGoogle ScholarCross RefCross Ref
  20. Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, Heidelberg.Google ScholarGoogle Scholar
  21. Yanxiang Xu and Tiejian Luo. 2011. Measuring article quality in Wikipedia: Lexical clue model. IEEE Symposium on Web Society (10 2011), 141--146. Google ScholarGoogle ScholarCross RefCross Ref
  22. Jun Zhu, Amr Ahmed, and Eric P. Xing. 2009. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification. In Proceedings of the 26th Annual ICML (ICML '09). ACM, New York, NY, USA, 1257--1264. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Quality assessment of Wikipedia content using topic models

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web
        October 2019
        537 pages
        ISBN:9781450367639
        DOI:10.1145/3323503

        Copyright © 2019 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 October 2019

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate270of873submissions,31%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader