DynWebStats - A Framework for Determining Dynamic and Up-to-date Web Indicators
Resumo
It has been broadly discussed over the last years about the growth and popularity of the Internet and, more specifically, about the World Wide Web and its services and applications. Despite being common sense, acquiring indicators about this growth and characteristics of the whole Web, or event parts of it, is a big challenge, which can be explained by some factors: (1) the constant and dynamical evolution of the Web in many dimensions, that is, any analysis becomes obsolete instantly as soon as it’s ready; (2) the great volume of data that is necessary to generate indicators, which is usually disruptive in terms of bandwidth and storage. There are also problems related to ethics and network viability of the crawl; and (3) the coverage and newness to generate indicators, whether indicators about domains or Web pages. This paper presents a new methodology for generating dynamic Web indicators, which consider Web pages changes, both in terms of its modifications and its creation or deletion. This methodology provides a rational crawling and offers a measure of the quality of the indicators. In order to validate it, we run a simulation that uses a dataset with 8.690 Web pages that were downloaded daily for 134 days. The results show that it’s possible to crawl a greater universe of Web pages and still keep indicators between acceptable levels of confidence, turning it possible to have a snapshot of this universe as close to reality as possible.
Palavras-chave:
Web, Coleta e Análise de Dados, Indicadores Dinâmicos, Caracterização de dados, Retrato da Web
Publicado
08/11/2016
Como Citar
GUERRA, Israel; MEIRA JR., Wagner ; PEREIRA, Adriano César Machado; SANTA, Diogo Marques; DINIZ, Vagner; GANZELI, Heitor; PITTA, Marcelo; BARBOSA, Alexandre.
DynWebStats - A Framework for Determining Dynamic and Up-to-date Web Indicators. In: BRAZILIAN SYMPOSIUM ON MULTIMEDIA AND THE WEB (WEBMEDIA), 22. , 2016, Teresina.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2016
.
p. 247-254.