Uma Investigação acerca da Conectividade da Web Brasileira

  • Cristina Murta CEFET-MG
  • Valter Lima Jr. CEFET-MG
  • Adriano Pereira UFMG

Abstract


This paper presents an analysis of the connectivity and the topological structure of the Brazilian Web and the Brazilian Government’s official Web, made from two datasets recently collected by Web crawlers. The samples taken encompass about 7% of the Web domains officially registered in the country. The collected data were filtered and transformed into graphs, which were analyzed according to various metrics. The results indicate that the Brazilian Web contains a strongly connected component which includes only 11% of its vertices. There is a wide disparity in the density of connections internal to Web sites and connections between Web sites. The analysis of the results show that the connectivity of the Brazilian Web is low.

References

Barabási, A. L. and Albert, R. (1999). Emergence of Scaling in Random Networks. Science, 286(509):1–11.

Barabási, A.-L., Albert, R., and Jeong, H. (2000a). Scale-free Characteristics of Random Networks: The Topology of the World-Wide Web. Physica A, 281:69–77.

Barabási, A.-L., Albert, R., Jeong, H., and Bianconi, G. (2000b). Power-law Distribution of the World Wide Web. Science, 287:1–2.

Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., and Hwang, D. (2006). Complex networks: Structure and Dynamics. Physics Reports, 424:175–308.

Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. (2000). Graph Structure in the Web. Computer Networks, 33:309–320.

Bush, V. and Wang, J. (1945). As We May Think. Atlantic Monthly, 176:101–108.

Chayes, J. (2013). Mathematics of Web Science: Structure, Dynamics and Incentives. Philosophical Transactions of the Royal Society A, 371:1–4.

Donato, D., Laura, L., Leonardi, S., and Millozzi, S. (2007). Graph Mining: Laws, Generators, and Algorithms. ACM Computing Surveys, 7(1):1–25.

Gomes, D. and Silva, M. J. (2003). A Characterization of the Portuguese Web. 3rd ECDL Workshop on Web Archives, pages 1–14.

Google (2008). We knew the Web was big... http://googleblog.blogspot.com.br/2008/07/we-knew-web-was-big.html. Acesso em 29 janeiro 2014.

Jack, P. and Binns, A. (2012). Heritrix - Internet Archive Webteam Confluence. https://webarchive.jira.com/wiki/display/Heritrix/Heritrix. Acesso em 31 julho 2013.

Kumar, R., Raghavan, P., Rajagopalan, S., Sivakumar, D., Tompkins, A., and Upfal, E. (2000). The Web as a Graph. Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 1–10.

Modesto, M., Álvaro Pereira Jr., Ziviani, N., Castillo, C., and Baeza-Yates, R. (2005). Um Novo Retrato da Web Brasileira. Anais do XXVI SEMISH, pages 2005–2017.

Pajek (2013). Pajek - Program for Large Network Analysis. http://pajek.imfm.si/doku.php?id=pajek. Acesso em 12 dezembro 2013.

Veloso, E. A., de Moura, E. S., Golgher, P. B., da Silva, A. S., Almeida, R. B., Laender, A. H. F., Ribeiro-Neto, B., and Ziviani, N. (2000). Um Retrato da Web Brasileira. Anais do XXI SEMISH, pages 1–10.

Vitali, S., Glattfelder, J. B., and Battiston, S. (2011). The Network of Global Corporate Control. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0025995. Acesso em 19 junho 2013.

Wikipedia (2004). The World Wide Web. [Online; acesso em 2 de abril de 2014].
Published
2014-07-28
MURTA, Cristina; LIMA JR., Valter; PEREIRA, Adriano. Uma Investigação acerca da Conectividade da Web Brasileira. In: INTEGRATED SOFTWARE AND HARDWARE SEMINAR (SEMISH), 41. , 2014, Brasília. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2014 . p. 13-24. ISSN 2595-6205.