Watershed: A High Performance Distributed Stream Processing System

  • Thatyene Louise Alves de Souza Ramos UFMG
  • Rodrigo Silva Oliveira UFMG
  • Ana Paula de Carvalho UFMG
  • Renato Antonio Celso Ferreira UFMG
  • Wagner Meira Jr. UFMG

Resumo


The task of extracting information from datasets that become larger at a daily basis, such as those collected from the web, is an increasing challenge, but also provides more interesting insights and analysis. Current analyses went beyond content and now focus on tracking and understanding users' relationships and interactions. Such computation is intensive both in terms of the processing demand imposed by the algorithms and also the sheer amount of data that has to handled. In this paper we introduce Watershed, a distributed computing framework designed to support the analysis of very large data streams online and in real-time. Data are obtained from streams by the system's processing components, transformed, and directed to other streams, creating large flows of information. The processing components are decoupled from each other and their connections are strictly data-driven. They can be dynamically inserted and removed, providing an environment in which it is feasible that different applications share intermediate results or cooperate to a global purpose. Our experiments demonstrate the flexibility in creating a set of data analysis algorithms and their composition into a powerful stream analysis environment.
Palavras-chave: Parallel processing, Libraries, XML, Distributed databases, Computer architecture, Data analysis, Distributed systems, Data-driven architectures, Stream processing, High-performance computing, Dynamic application topology
Publicado
26/10/2011
RAMOS, Thatyene Louise Alves de Souza; OLIVEIRA, Rodrigo Silva; CARVALHO, Ana Paula de; FERREIRA, Renato Antonio Celso; MEIRA JR., Wagner. Watershed: A High Performance Distributed Stream Processing System. In: INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 23. , 2011, Vitória/ES. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2011 . p. 191-198.