Workflows Científicos com Apoio de Bases de Conhecimento em Tempo Real

  • Victor S. Bursztyn UFRJ / ECM Brazil Research & Development Center
  • Jonas Dias ECM Brazil Research & Development Center
  • Marta Mattoso UFRJ

Resumo


One major challenge in large-scale experiments is the analytical capacity to contrast ongoing results with domain knowledge. We approach this challenge by constructing a domain-specific knowledge base, which is queried during workflow execution. We introduce K-Chiron, an integrated solution that combines a state-of-the-art automatic knowledge base construction (KBC) system to Chiron, a well-established workflow engine. In this work we experiment in the context of Political Sciences to show how KBC may be used to improve human-in-the-loop (HIL) support in scientific experiments. While HIL in traditional domain expert supervision is done offline, in K-Chiron it is done online, i.e. at runtime. We achieve results in less laborious ways, to the point of enabling a breed of experiments that could be unfeasible with traditional HIL. Finally, we show how provenance data could be leveraged with KBC to enable further experimentation in more dynamic settings.


 

Referências

Dados Abertos - Legislativo (2016). http://www2.camara.leg.br/transparencia/dados-abertos/dadosabertos-legislativo/dados-abertos-legislativo, [accessed on Apr 7].

Davidson, S. B. and Freire, J. (2008). Provenance and Scientific Workflows: Challenges and Opportunities. In Proceedings of the 2008 ACM SIGMOD

Dias, J., Guerra, G., Rochinha, F., et al. (may 2015). Data-centric iteration in dynamic workflows. Future Generation Computer Systems, v. 46, p. 114–126.

Dias, J., Ogasawara, E., Oliveira, D., et al. (2011). Supporting Dynamic Parameter Sweep in Adaptive and User-Steered Workflow. In WORKS ’11. ACM.

Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1), p. 100–108.

Hu, X., Sun, N., Zhang, C., & Chua, T. S. (nov 2009). Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 919-928).

Hu, X., Zhang, X., Lu, C., Park, E. K., & Zhou, X. (june 2009). Exploiting Wikipedia as external knowledge for document clustering. In Proceedings of the 15th ACM SIGKDD (pp. 389-396).

Jagadish, H. V., Gehrke, J., Labrinidis, A., et al. (1 jul 2014). Big data and its technical challenges. Communications of the ACM, v. 57, n. 7, p. 86–94.

Manning, C. D., Raghavan, P. and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

Mattoso, M., Dias, J., Ocaña, K. A. C. S., et al. (may 2015). Dynamic steering of HPC scientific workflows: A survey. Future Generation Computer Systems, v. 46, p. 100–113. Natural Language Toolkit (2016). http://www.nltk.org/, [accessed on Apr 7].

Navarro, G. (mar 2001). A Guided Tour to Approximate String Matching. ACM Computing Surveys, v. 33, n. 1, p. p 31–88.

Ogasawara, E., Dias, J., Silva, V., et al. (2013). Chiron: A Parallel Engine for Algebraic Scientific Workflows. Concurrency and Computation, v. 25, n. 16, p. 2327–2341.

Ré, C., Sadeghian, A. A., Shan, Z., et al. (23 jul 2014). Feature Engineering for Knowledge Base Construction. arXiv:1407.6439 [cs], Scikit Learn (2016). http://scikit-learn.org/stable/, [accessed on Apr 7].

W.F.J., B. (2007). Human-in-the-loop’simulation: the right tool for port design. In Port Technology International.
Publicado
04/07/2016
BURSZTYN, Victor S.; DIAS, Jonas; MATTOSO, Marta. Workflows Científicos com Apoio de Bases de Conhecimento em Tempo Real. In: BRAZILIAN E-SCIENCE WORKSHOP (BRESCI), 10. , 2016, Porto Alegre. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 269-276. ISSN 2763-8774. DOI: https://doi.org/10.5753/bresci.2016.10009.