Using Active Mediators and Passive Extractors Inside Materialized Data Integration Systems

  • Paulo V. M. Cardoso UFSM
  • Flavio F. Franzin UFSM
  • Sergio L. S. Mergen UFSM

Resumo


Materialized data integration architectures are generally composed by active extractors and passive mediators, where the extractors have the goal of forwarding to the mediator all identified objects that may be of interest. This one hand communication road may lead to the difficulty of identifying relevant objects (specially if the source is not well structured) and a waste of energy analyzing irrelevant objects and possibly extracting them. Also, there is a processing overhead associated with the deduplication of objects that are already mapped. In this paper we propose an extension of the integration architecture that allows mediators to play a more active role, guiding passive extractors as to what objects need to be extracted and how they can be identified. We present a case study that shows how passive and active extractors can coexist under the same data integration system.

Referências

Anter, S., Zellou, A., and Idri, A. (2016). Retrieving and materializing data in hybrid mediators. International Journal of Applied Engineering Research, 11(3):2128–2134.

Baumgartner, R., Flesca, S., and Gottlob, G. (2001). Visual web information extraction with lixto. In VLDB, volume 1, pages 119–128.

Cardoso, P., Peripolli, G., Franzin, F., and Mergen, S. (2015). Ambiente colaborativo para identificação de espécies. In Anais do XIII Simpósio de Informática (SIRC), pages 68–73. Centro Universitário Franciscano.

Dong, H. and Hussain, F. K. (2014). Self-adaptive semantic focused crawler for mining services information discovery. Industrial Informatics, IEEE Transactions on, 10(2):1616–1626.

Meng, X., Wang, H., Hu, D., and Li, C. (2003). A supervised visual wrapper generator for web-data extraction. In Computer Software and Applications Conference, 2003. COMPSAC 2003. Proceedings. 27th Annual International, pages 657–662. IEEE.

Mora, C., Tittensor, D. P., Adl, S., Simpson, A. G., and Worm, B. (2011). How many species are there on earth and in the ocean? PLoS Biol, 9(8):e1001127.

Safran, M. S., Althagafi, A., and Che, D. (2012). Improving relevance prediction for focused web crawlers. In Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on, pages 161–166. IEEE.

Senellart, P., Mittal, A., Muschick, D., Gilleron, R., and Tommasi, M. (2008). Automatic wrapper induction from hidden-web sources with domain knowledge. In Proceedings of the 10th ACM workshop on Web information and data management, pages 9–16. ACM.

Xiang, Z.-L., Yu, X.-R., and Kang, D.-K. (2015). Wrapper induction of news information for feeding to social networking service on smartphone. In Advanced Communication Technology (ICACT), 2015 17th International Conference on, pages 292–295. IEEE.
Publicado
04/07/2016
CARDOSO, Paulo V. M.; FRANZIN, Flavio F.; MERGEN, Sergio L. S. . Using Active Mediators and Passive Extractors Inside Materialized Data Integration Systems. In: CONCURSO DE TRABALHOS DE INICIAÇÃO CIENTÍFICA DA SBC (CTIC-SBC), 35. , 2016, Porto Alegre. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 51-60.