CARAMEL: Ecosystem for Big Social Data
Resumo
Context: A large volume of data produced in social media is analyzed through different perspectives. Much effort goes into retrieving and processing the data, maintaining the necessary infrastructure, and building and sharing the foundation between actors with different roles. These challenges are observed in data ecosystems. Problem: The central systems to support data analysis from social networks have some restrictions (data collection, sharing, reuse, etc.). Data collection and analysis require technical skills that some users need and do not have, impacting the quality of inferences, accounting, and conclusions. Solution: We propose an architecture for “Big Social Data” ecosystems considering the collaborative construction of data extraction and sharing mechanisms. IS Theory: This proposal is related to “knowledge-based theory,” as much knowledge can be inferred from social data. It also supports the Externalization and Combination steps of the Organizational knowledge creation model. Method: We observe aspects related to data analysis, considering the reuse of the mechanisms created and the sharing of bases that can run and be stored in a distributed way to meet even instantaneous analysis. Results: The architecture was implemented to work in a distributed way, contains a collector and a filter and allows data sharing. A data collection test was conducted during the 2022 presidential elections in Brazil. Contributions: The main contribution is the architecture of a Big Social Data Ecosystem, focused on the evolution of social data analysis that also observes the interoperability between distributed solutions. The technological contributions are an instance of this architecture for the cloud, social media data collectors, and datasets of the 2022 election in Brazil.
Referências
R. S. P. Boscarioli, C.; Araujo, R. M.; Maciel. 2017. I GranDSI-BR – Grand Research Challenges in Information Systems in Brazil 2016-2026. Brazilian Computer Society (SBC). 184 pages. http://www2.sbc.org.br/ce-si//arquivos/GranDSI-BR_Ebook-Final.pdf
Elasticsearch. 2022. O que é o Elasticsearch? https://www.elastic.co/pt/what-is/elasticsearch
Tiago Cruz França, Fabrício Firmino de FARIA, Fabio Medeiros Rangel, Claudio Miceli de FARIAS, and Jonice Oliveira. 2014. Big Social Data: Princípios sobre coleta, tratamento e análise de dados sociais. XXIX Simpósio Brasileiro de Banco de Dados–SBBD 14 (2014).
Eduardo Hargreaves, Eduardo F. Mangabeira, Jonice Oliveira, Tiago C. Franca, and Daniel S. Mcnasche. 2020. Facebook News Feed personalization filter: a case study during the Brazilian elections. (dec 2020), 615–618. https://doi.org/10.1109/ASONAM49781.2020.9381301
Rodrigo Laigner, Yongluan Zhou, Marcos Antonio Vaz Salles, Yijian Liu, and Marcos Kalinowski. 2021. Data management in microservices: State of the practice, challenges, and research directions. arXiv preprint arXiv:2103.00170 (2021).
Silas P Lima Filho, Jonice Oliveira, and Monica Ferreira da Silva. 2020. Detection of Depression Symptoms using Social Media Data. Simpósio Brasileiro de Banco de Dados (SBBD) 2020 (2020).
Sam Newman. 2021. Building microservices. “O'Reilly Media, Inc.”.
Claus Pahl. 2015. Containerization and the PaaS Cloud. IEEE Cloud Computing 2, 3 (may 2015), 24–31. https://doi.org/10.1109/MCC.2015.51
Rajiv Ranjan, Boualem Benatallah, Schahram Dustdar, and Michael P. Papazoglou. 2015. Cloud Resource Orchestration Programming: Overview, Issues, and Directions. IEEE Internet Computing 19, 5 (sep 2015), 46–56. https://doi.org/10.1109/MIC.2015.20
Douglas Rehem, Jonice Oliveira, Tiago França, Walkir Brito, and Claudia Motta. 2016. News recommendation based on tweets for understanding of opinion variation and events. In Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, New York, NY, USA, 1182–1185. https://doi.org/10.1145/2851613.2851988
David De Roure, Carole Goble, Jiten Bhagat, Don Cruickshank, Antoon Goderis, Danius Michaelides, and David Newman. 2008. myExperiment: Defining the Social Virtual Research Environment. In 2008 IEEE Fourth International Conference on eScience. 182–189. https://doi.org/10.1109/eScience.2008.86
Marcelo Iury S. Oliveira, Glória de Fátima Barros Lima, and Bernadette Farias Lóscio. 2019. Investigations into Data Ecosystems: a systematic mapping study. Knowledge and Information Systems 61, 2 (nov 2019), 589–630. https://doi.org/10.1007/s10115-018-1323-6
Johannes Thones. 2015. Microservices. IEEE Software 32, 1 (jan 2015), 116–116. https://doi.org/10.1109/MS.2015.11
Andrea Tosatto, Pietro Ruiu, and Antonio Attanasio. 2015. Container-Based Orchestration in Cloud: State of the Art and Challenges. In 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems. IEEE, 70–75. https://doi.org/10.1109/CISIS.2015.35
Xiaoguang Wang, Qingyu Duan, and Mengli Liang. 2021. Understanding the process of data reuse: An extensive review. Journal of the Association for Information Science and Technology 72, 9 (sep 2021), 1161–1182. https://doi.org/10.1002/asi.24483