ENoW - News Data Extractor from the Web
Abstract
Data available on the Web plays a determining role in decision-making both in personal and corporate life. Collecting and storing this data in a structured model helps integrate them with other sources and then use the dataset in various applications, such as event detection and sentiment monitoring. Online newspapers are essential sources of information, accessed daily by thousands of people. To facilitate the exploration of this data, this paper presents ENoW - News Data Extractor from the Web. ENoW receives search strings as input and stores in a relational database data extracted from the news as well as their full content. The system was implemented in Python, using Web scraping techniques. The demonstration comprises the three main functionalities of the tool: newspaper registration, project registration and news extraction.
Keywords:
Sensor monitoring, Web Scraping, News extraction
References
Bansal, A., Chaudhury, S., Roy, S. D., and Srivastava, J. (2014). Newspaper article extraction using hierarchical fixed point model. In 2014 11th IAPR International Workshop on Document Analysis Systems, pages 257–261. IEEE.
Franceschini, R., Rosi, A., Catani, F., and Casagli, N. (2022). Exploring a landslide inventory created by automated web data mining: the case of italy. Landslides, 19(4).
Johnson, J. A. (2014). From open data to information justice. Ethics and Information Technology, 16:263–274.
Krotov, V., Johnson, L., and Silva, L. (2020). Tutorial: Legality and ethics of web scraping. Communications of the Association for Information Systems.
Park, E., Park, J., and Hu, M. (2021). Tourism demand forecasting with online news data mining. Annals of Tourism Research, 90:103273.
Salem, H. and Mazzara, M. (2020). Pattern matching-based scraping of news websites. In Journal of Physics: Conference Series, page 012011. IOP Publishing.
Sarr, E. N., Ousmane, S., and Diallo, A. (2018). Factextract: automatic collection and aggregation of articles and journalistic factual claims from online newspaper. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pages 336–341. IEEE.
Franceschini, R., Rosi, A., Catani, F., and Casagli, N. (2022). Exploring a landslide inventory created by automated web data mining: the case of italy. Landslides, 19(4).
Johnson, J. A. (2014). From open data to information justice. Ethics and Information Technology, 16:263–274.
Krotov, V., Johnson, L., and Silva, L. (2020). Tutorial: Legality and ethics of web scraping. Communications of the Association for Information Systems.
Park, E., Park, J., and Hu, M. (2021). Tourism demand forecasting with online news data mining. Annals of Tourism Research, 90:103273.
Salem, H. and Mazzara, M. (2020). Pattern matching-based scraping of news websites. In Journal of Physics: Conference Series, page 012011. IOP Publishing.
Sarr, E. N., Ousmane, S., and Diallo, A. (2018). Factextract: automatic collection and aggregation of articles and journalistic factual claims from online newspaper. In 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), pages 336–341. IEEE.
Published
2023-09-25
How to Cite
REIPS, Lisiane; MUSICANTE, Martin; VARGAS-SOLAR, Genoveva; POZO, Aurora T. R.; HARA, Carmem S..
ENoW - News Data Extractor from the Web. In: DEMOS AND APPLICATIONS - BRAZILIAN SYMPOSIUM ON DATABASES (SBBD), 38. , 2023, Belo Horizonte/MG.
Anais [...].
Porto Alegre: Sociedade Brasileira de Computação,
2023
.
p. 78-83.
DOI: https://doi.org/10.5753/sbbd_estendido.2023.232480.
