TY - JOUR AU - Barbosa, Luciano AU - Freire, Juliana PY - 2010/05/27 Y2 - 2024/03/29 TI - Siphoning Hidden-Web Data through Keyword-Based Interfaces: Retrospective JF - Journal of Information and Data Management JA - JIDM VL - 1 IS - 1 SE - Regular Papers DO - 10.5753/jidm.2010.951 UR - https://sol.sbc.org.br/journals/index.php/jidm/article/view/951 SP - 145 AB - In this paper, we proposed the first, fully-automatic approach to crawling the Hidden Web through<br />keyword-based interfaces. Our crawler uses an algorithm for automatically deriving a series of<br />keyword-based queries whose goal is to obtain high coverage while minimizing the costs. In other<br />words, our goal is to retrieve as much of the hidden contents as possible while minimizing the number<br />of required queries. The intuition behind our algorithm is that, by obtaining samples of the hidden<br />contents in a online database or document collection, we are able to discover keywords that have high<br />frequency. Then, by using these high-frequency keywords we are able to construct queries that return<br />a large number of answers. ER -