Siphoning Hidden-Web Data through Keyword-Based Interfaces: Retrospective

Luciano Barbosa; Juliana Freire

doi:10.5753/jidm.2010.951

Siphoning Hidden-Web Data through Keyword-Based Interfaces: Retrospective

Authors

Luciano Barbosa AT&T Labs - Research
Juliana Freire University of Utah

DOI:

https://doi.org/10.5753/jidm.2010.951

Abstract

In this paper, we proposed the first, fully-automatic approach to crawling the Hidden Web through
keyword-based interfaces. Our crawler uses an algorithm for automatically deriving a series of
keyword-based queries whose goal is to obtain high coverage while minimizing the costs. In other
words, our goal is to retrieve as much of the hidden contents as possible while minimizing the number
of required queries. The intuition behind our algorithm is that, by obtaining samples of the hidden
contents in a online database or document collection, we are able to discover keywords that have high
frequency. Then, by using these high-frequency keywords we are able to construct queries that return
a large number of answers.

Downloads

Published

2010-05-27

How to Cite

Barbosa, L., & Freire, J. (2010). Siphoning Hidden-Web Data through Keyword-Based Interfaces: Retrospective. Journal of Information and Data Management, 1(1), 145. https://doi.org/10.5753/jidm.2010.951

Download Citation

Issue

Vol. 1 No. 1: Inaugural Issue

Section

Regular Papers

Siphoning Hidden-Web Data through Keyword-Based Interfaces: Retrospective

Authors

DOI:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Metrics: