short-paper

Quality assessment of Wikipedia content using topic models

Authors:
Lauro C. J. Santos

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
View Profile

,
Taís Christofani

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
View Profile

,
Ismael S. Silva

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
View Profile

,
Daniel H. Dalip

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

Centro Federal de Educação Tecnológica de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
View Profile

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the WebOctober 2019Pages 249–252https://doi.org/10.1145/3323503.3360628

Published:29 October 2019Publication History

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

Pages 249–252

ABSTRACT

The web has become a large knowledge provider for society, allowing people to not just consume information but also produce it. Collaborative documents bring some significant advantages and decentralization, but they also raise questions concerning its quality. In this work, we explore the quality assessment on collaborative documents using these documents' topics. The proposed approach improved in 3.2% the accuracy of quality assesment of Wikipedia content. Then, the main contribution in this paper is an analysis of how we can use topic modelling in order to improve quality prediction performance.

References

Maik Anderka, Benno Stein, and Nedim Lipka. 2012. Predicting Quality Flaws in User-generated Content: The Case of Wikipedia. In Proc. of the 35th SIGIR (SIGIR '12). ACM, New York, NY, USA, 981--990. Google ScholarDigital Library
R. Arun, V. Suresh, C. E. Veni Madhavan, and M. N. Narasimha Murthy. 2010. On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations. In Advances in Knowledge Discovery and Data Mining, Mohammed J. Zaki, Jeffrey Xu Yu, B. Ravindran, and Vikram Pudi (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 391--402.Google Scholar
David M. Blei and Jon D. McAuliffe. 2007. Supervised Topic Models. In Proceedings of the 20th International Conference on NIPS (NIPS'07). Curran Associates Inc., USA, 121--128. http://dl.acm.org/citation.cfm?id=2981562.2981578Google Scholar
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. J. Mach. Learn. Res. 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937Google Scholar
Joshua E. Blumenstock. 2008. Size Matters: Word Count As a Measure of Quality on Wikipedia. In Proc. of the 17th WWW (WWW '08). ACM, New York, NY, USA, 1095--1096. Google ScholarDigital Library
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107 -- 117. Proc. of the 7h WWW. Google ScholarDigital Library
Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, and Pável Calado. 2011. Automatic Assessment of Document Quality in Web Collaborative Digital Libraries. J. Data and Information Quality 2, 3, Article 14 (Dec. 2011), 30 pages. Google ScholarDigital Library
Quang-Vinh Dang and Claudia-Lavinia Ignat. 2016. Measuring Quality of Collaboratively Edited Documents: The Case of Wikipedia. In 2016 IEEE 2nd CIC. 266--275. Google ScholarCross Ref
Gabriel De la Calzada and Alex Dekhtyar. 2010. On Measuring the Quality of Wikipedia Articles. In Proc. of the 4th WICOW (WICOW '10). ACM, New York, NY, USA, 11--18. Google ScholarDigital Library
Daniel H. Dalip, Marcos A. Gonçalves, Marco Cristo, and Pável Calado. 2009. Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia. In Proc. of the 9th ACM/IEEE-CS JCDL (JCDL '09). ACM, New York, NY, USA, 295--304. Google ScholarDigital Library
Aaron Halfaker, R. Stuart Geiger, Jonathan T. Morgan, and John Riedl. 2013. The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline. American Behavioral Scientist 57, 5 (2013), 664--688.Google ScholarCross Ref
Alexa Internet. 2019. The top 500 sites on the web. (2019). Retrieved June 21, 2019 from https://www.alexa.com/topsitesGoogle Scholar
Sara Javanmardi and Cristina Lopes. 2010. Statistical Measure of Quality in Wikipedia. In Proc. of the 1st SOMA (SOMA '10). ACM, New York, NY, USA, 132--138. Google ScholarDigital Library
Jonathan Leo and Jeffrey Lacasse. 2014. Wikipedia vs peer-reviewed medical literature for information about the 10 most costly medical conditions. 114 (10 2014), 761--4.Google Scholar
Nedim Lipka and Benno Stein. 2010. Identifying Featured Articles in Wikipedia: Writing Style Matters. In Proc. of the 19th WWW (WWW '10). ACM, New York, NY, USA, 1147--1148. Google ScholarDigital Library
Alex Primo. 2006. O aspecto relacional das interações na Web 2.0 1. 9 (01 2006).Google Scholar
Rodrigo R. do Carmo, Anísio M. Lacerda, and Daniel H. Dalip. 2017. A Majority Voting Approach for Sentiment Analysis in Short Texts Using Topic Models. In Proceedings of the 23rd Brazillian Symposium on WebMedia (WebMedia '17). ACM, New York, NY, USA, 449--455. Google ScholarDigital Library
E. A. Smith, R. J. Senter, and Air Force Aerospace Medical Research Laboratory (U.S.). 1967. Automated Readability Index. Aerospace Medical Research Laboratories. https://books.google.com.br/books?id=HejUGwAACAAJGoogle Scholar
Yu Suzuki. 2015. Quality Assessment of Wikipedia Articles Using h-index. JIP 23 (2015), 22--30.Google ScholarCross Ref
Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, Berlin, Heidelberg.Google Scholar
Yanxiang Xu and Tiejian Luo. 2011. Measuring article quality in Wikipedia: Lexical clue model. IEEE Symposium on Web Society (10 2011), 141--146. Google ScholarCross Ref
Jun Zhu, Amr Ahmed, and Eric P. Xing. 2009. MedLDA: Maximum Margin Supervised Topic Models for Regression and Classification. In Proceedings of the 26th Annual ICML (ICML '09). ACM, New York, NY, USA, 1257--1264. Google ScholarDigital Library

Index Terms

Quality assessment of Wikipedia content using topic models
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Document topic models

Recommendations

Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Read More
Identifying Sentence-Level Semantic Content Units with Topic Models
DEXA '10: Proceedings of the 2010 Workshops on Database and Expert Systems Applications

Statistical approaches to document content modeling typically focus either on broad topics or on discourse-level subtopics of a text. We present an analysis of the performance of probabilistic topic models on the task of learning sentence-level topics ...
Read More
Towards Topic Trend Prediction on a Topic Evolution Model with Social Connection
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

Hot topics are usually those breaking news discussed most at online forums, especially microblogging systems, such as twitter, which helps to learn user concentration and public opinion. This paper focuses on the problem of predicting emerging hot ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web
October 2019
537 pages
ISBN:9781450367639
DOI:10.1145/3323503
General Chairs:
Joel dos Santos
CEFET/RJ
,
Débora Christina Muchaluat Saade
UFF
,
Maria da Graça C. Pimentel
University of Sao Paulo, Brazil
,
Alessandra Alaniz Macedo
University of Sao Paulo, Brazil
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automatic quality assessment
information quality
latent dirichlet allocation
machine learning
topic prediction
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate270of873submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 73
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quality assessment of Wikipedia content using topic models

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Joint sentiment/topic model for sentiment analysis

Identifying Sentence-Level Semantic Content Units with Topic Models

Towards Topic Trend Prediction on a Topic Evolution Model with Social Connection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Quality assessment of Wikipedia content using topic models

WebMedia '19: Proceedings of the 25th Brazillian Symposium on Multimedia and the Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Joint sentiment/topic model for sentiment analysis

Identifying Sentence-Level Semantic Content Units with Topic Models

Towards Topic Trend Prediction on a Topic Evolution Model with Social Connection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media