Siphoning Hidden-Web Data through Keyword-Based Interfaces

Authors

  • Luciano Barbosa AT&T Labs - Research
  • Juliana Freire University of Utah

DOI:

https://doi.org/10.5753/jidm.2010.950

Abstract

In this paper, we study the problem of automating the retrieval of data hidden behind simple search interfaces that accept keyword-based queries. Our goal is to automatically retrieve all available results (or, as many as possible). We propose a new approach to siphon hidden data that automatically generates a small set of representative keywords and builds queries which lead to high coverage. We evaluate our algorithms over several real Web sites. Preliminary results indicate our approach is effective: coverage of over 90% is obtained for most of the sites considered.

Downloads

Download data is not yet available.

Downloads

Published

2010-05-27

How to Cite

Barbosa, L., & Freire, J. (2010). Siphoning Hidden-Web Data through Keyword-Based Interfaces. Journal of Information and Data Management, 1(1), 133. https://doi.org/10.5753/jidm.2010.950

Issue

Section

Regular Papers