KoopaML, a Machine Learning platform for medical data analysis





Machine Learning, Data Analysis, Machine Learning Pipelines, Learning Platform, Health


Machine Learning allows facing complex tasks related to data analysis with big datasets. This Artificial Intelligence branch allows not technical contexts to get benefits related to data processing and analysis. In particular, in medicine, medical professionals are increasingly interested in Machine Learning to identify patterns in clinical cases and make predictions regarding health issues. However, many do not have the necessary programming or technological skills to perform these tasks. Many different tools focus on developing Machine Learning pipelines, from libraries for developers and data scientists to visual tools for experts or platforms to learn. However, we have identified some requirements in the medical context that raise the need to create a customized platform adapted to end-user found in this context. This work describes the design process and the first version of KoopaML, an ML platform to bridge the data science gaps of physicians while automatizing Machine Learning pipelines. The platform is focused on enhanced interactivity to improve the engagement of physicians while still providing all the benefits derived from the introduction of Machine Learning pipelines in medical departments, as well as integrated ongoing training during the use of the tool’s features.


Download data is not yet available.


Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., . . . Zheng, X. (2016). TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation OSDI 16 (pp. 265-283). USENIX Association. [link]

Anil, R., Capan, G., Drost-Fromm, I., Dunning, T., Friedman, E., Grant, T., Quinn, S., Ranjan, P., Schelter, S., & Yılmazel, Ö. (2020). Apache Mahout: Machine Learning on Distributed Dataflow Systems. Journal of Machine Learning Research, 21(127), 1-6. [link]

Berthold, M. R., Cebron, N., Dill, F., Gabriel, T. R., Kötter, T., Meinl, T., Ohl, P., Thiel, K., & Wiswedel, B. (2009). KNIME - the Konstanz information miner: version 2.0 and beyond. SIGKDD Explor. Newsl., 11(1), 26–31. https://doi.org/10.1145/1656274.1656280

Bisong, E. (2019a). Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners (pp. 59-64). Apress. https://doi.org/10.1007/978-1-4842-4470-8_7

Bisong, E. (2019b). Kubeflow and Kubeflow Pipelines. In Building Machine Learning and Deep Learning Models on Google Cloud Platform (pp. 671-685). Apress. https://doi.org/10.1007/978-1-4842-4470-8_46

Bjaoui, M., Sakly, H., Said, M., Kraiem, N., & Bouhlel, M. S. (2020). Depth insight for data scientist with RapidMiner « an innovative tool for AI and big data towards medical applications» Proceedings of the 2nd International Conference on Digital Tools & Uses Congress, Virtual Event, Tunisia. https://doi.org/10.1145/3423603.3424059

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B., & Varoquaux, G. e. (2013). API design for machine learning software: experiences from the scikit-learn project ECML PKDD Workshop: Languages for Data Mining and Machine Learning,

Burns, B., Beda, J., & Hightower, K. (2017). Kubernetes: Up & Running. Dive into the Future of Infrastructure. O’Really Media.

C. Weyerer, J., & F. Langer, P. (2019). Garbage in, garbage out: The vicious cycle of ai-based discrimination in the public sector. Proceedings of the 20th Annual International Conference on Digital Government Research, Dubai, United Arab Emirates.

Carroll, J. (2000). Making use: Scenario-based design of human-computer interactions. The MIT Press.

Cooper, A. (1999). The Inmates Are Running the Asylum: Why High Tech Products Drive Us Crazy and How to Restore the Sanity. Sams.

Cope, S. (2020). Focus Groups: Are They Right for You? Digital.gov. Retrieved March 12 from [link]

Fardoun, H., González-González, C. S., Collazos, C. A., & Yousef, M. (2020). Exploratory Study in Iberoamerica on the Teaching-Learning Process and Assessment Proposal in the Pandemic Times. Education in the Knowledge Society 21. https://doi.org/10.14201/eks.23437

Ferrer, X., van Nuenen, T., Such, J. M., Coté, M., & Criado, N. (2021). Bias and Discrimination in AI: a cross-disciplinary perspective. IEEE Technology and Society Magazine, 40(2), 72-80. https://doi.org/10.1109/MTS.2021.3056293

Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., & Witten, I. H. (2009). Weka-A Machine Learning Workbench for Data Mining. In O. Maimon & L. Rokach (Eds.), Data Mining and Knowledge Discovery Handbook. Springer. https://doi.org/10.1007/978-0-387-09823-4_66

García-Holgado, A., & García-Peñalvo, F. J. (2017). A Metamodel Proposal for Developing Learning Ecosystems. In P. Zaphiris & A. Ioannou (Eds.), Learning and Collaboration Technologies. Novel Learning Ecosystems. 4th International Conference, LCT 2017. Held as Part of HCI International 2017, Vancouver, BC, Canada, July 9–14, 2017. Proceedings, Part I (Vol. 10295, pp. 100-109). Springer International Publishing. https://doi.org/10.1007/978-3-319-58509-3_10

García-Holgado, A., & García-Peñalvo, F. J. (2019). Validation of the learning ecosystem metamodel using transformation rules. Future Generation Computer Systems, 91, 300-310. https://doi.org/10.1016/j.future.2018.09.011

García-Holgado, A., Vázquez-Ingelmo, A., Alonso, J., García-Peñalvo, F. J., Sampedro-Gómez, J., Sánchez-Puente, A., Vicente-Palacios, V., Dorado-Díaz, P. I., & Sánchez, P. L. (2021). User-centered design approach for a machine learning platform for medical purpose. In P. Ruiz, V. Agredo Delgado, & A. Kawamoto (Eds.), 7th Iberoamerican Workshop, HCI-COLLAB 2021, Sao Paulo, Brazil (September 8–10, 2021) (Vol. 1478, pp. 237-249). Springer. https://doi.org/10.1007/978-3-030-92325-9_18

García-Peñalvo, F. J., Corell, A., Abella-García, V., & Grande-de-Prado, M. (2020). Online Assessment in Higher Education in the Time of COVID-19. Education in the Knowledge Society, 21. https://doi.org/10.14201/eks.23086

García-Peñalvo, F. J., Corell, A., Rivero-Ortega, R., Rodríguez-Conde, M. J., & Rodríguez-García, N. (2021). Impact of the COVID-19 on Higher Education: An Experience-Based Approach. In F. J. García-Peñalvo (Ed.), Information Technology Trends for a Global and Interdisciplinary Research Community (pp. 1-18). IGI Global.

García-Peñalvo, F. J., Vázquez-Ingelmo, A., García-Holgado, A., Sampedro-Gómez, J., Sánchez-Puente, A., Vicente-Palacios, V., Dorado-Díaz, P. I., & Sánchez, P. L. (2021). Application of Artificial Intelligence algorithms within the medical context for non-specialized users: the CARTIER-IA platform. International Journal of Interactive Multimedia and Artificial Intelligence, 6(6), 46-53. https://doi.org/10.9781/ijimai.2021.05.005

González Izard, S., Sánchez Torres, R., Alonso Plaza, Ó., Juanes Méndez, J. A., & García-Peñalvo, F. J. (2020). Nextmed: Automatic Imaging Segmentation, 3D Reconstruction, and 3D Model Visualization Platform Using Augmented and Virtual Reality. Sensors (Basel, Switzerland), 20(10), 2962. https://doi.org/10.3390/s20102962

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1), 10–18. https://doi.org/10.1145/1656274.1656278

Hoffman, S. (2021). The Emerging Hazard of AI‐Related Health Care Discrimination. Hastings Center Report, 51(1), 8-9. https://doi.org/10.1002/hast.1203

Izard, S. G., Juanes, J. A., García Peñalvo, F. J., Estella, J. M. G., Ledesma, M. J. S., & Ruisoto, P. (2018). Virtual Reality as an Educational and Training Tool for Medicine. Journal of Medical Systems, 42(3), 50. https://doi.org/10.1007/s10916-018-0900-2

Krueger, R. A., & Casey, M. A. (2014). Focus Groups: A Practical Guide for Applied Research. Sage publications.

Kuhn, K. (2000). Problems and Benefits of Requirements Gathering With Focus Groups: A Case Study. International Journal of Human–Computer Interaction, 12(3-4), 309-325. https://doi.org/10.1080/10447318.2000.9669061

Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., van der Laak, J. A. W. M., van Ginneken, B., & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Med Image Anal, 42, 60-88. https://doi.org/10.1016/j.media.2017.07.005

McCormick, K., & Salcedo, J. (2017). IBM SPSS Modeler essentials: Effective techniques for building powerful data mining and predictive analytics solutions. Packt Publishing Ltd.

Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., & Owen, S. (2016). MLlib: Machine Learning in Apache Spark. The Journal of Machine Learning Research, 17(1), 1235-1241.

Miller, D. D. (2019). The medical AI insurgency: what physicians must know about data to practice with intelligent machines. npj Digital Medicine, 2(1), 62. https://doi.org/10.1038/s41746-019-0138-5

Nature Materials. (2019). Ascent of machine learning in medicine. Nature Materials, 18(5), 407-407. https://doi.org/10.1038/s41563-019-0360-1

Nielsen, L. (2003). Constructing the user. In C. Stephanidis & J. Jacko (Eds.), Human-computer interaction: theory and practice (Part 2) (Vol. 2, pp. 430-434). CRC Press.

Nielsen, L. (2004). Engaging Personas and Narrative Scenarios. Samfundslitteratur.

Nielsen, L. (2013a). Personas. In The Encyclopedia of Human-Computer Interaction. The Interaction Design Foundation. [link]

Nielsen, L. (2013b). Personas - User Focused Design. Springer. https://doi.org/10.1007/978-1-4471-4084-9

Pernice, K. (2016). UX Prototypes: Low Fidelity vs. High Fidelity. [link]

Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine Learning in Medicine. New England Journal of Medicine, 380(14), 1347-1358. https://doi.org/10.1056/NEJMra1814259

Rodríguez-García, J. D., Moreno-León, J., Román-González, M., & Robles, G. (2021). Evaluation of an Online Intervention to Teach Artificial Intelligence with LearningML to 10-16-Year-Old Students. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education (pp. 177–183). Association for Computing Machinery. https://doi.org/10.1145/3408877.3432393

Sampedro-Gómez, J., Dorado-Díaz, P. I., Vicente-Palacios, V., Sánchez-Puente, A., Jiménez-Navarro, M., San Roman, J. A., Galindo-Villardón, P., Sanchez, P. L., & Fernández-Avilés, F. (2020). Machine Learning to Predict Stent Restenosis Based on Daily Demographic, Clinical, and Angiographic Characteristics. Canadian Journal of Cardiology, 36(10), 1624-1632. https://doi.org/10.1016/j.cjca.2020.01.027

Scikit-Learn. (2020). Choosing the right estimator - Scikit-Learn documentation. [link]

Spruit, M., & Lytras, M. (2018). Applied data science in patient-centric healthcare: Adaptive analytic systems for empowering physicians and patients. Telematics and Informatics, 35(4), 643-653. https://doi.org/https://doi.org/10.1016/j.tele.2018.04.002

Vázquez-Ingelmo, A., Alonso, J., García-Holgado, A., García-Peñalvo, F. J., Sampedro-Gómez, J., Sánchez-Puente, A., Vicente-Palacios, V., Dorado-Díaz, P. I., & Sánchez, P. L. (2021). Usability Study of CARTIER-IA: A Platform for Medical Data and Imaging Management. In P. Zaphiris & A. Ioannou (Eds.), Learning and Collaboration Technologies: New Challenges and Learning Experiences. 8th International Conference, LCT 2021, Held as Part of the 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part I (pp. 374-384). Springer. https://doi.org/10.1007/978-3-030-77889-7_26

Wachter, S., Mittelstadt, B., & Russell, C. (2021). Why fairness cannot be automated: Bridging the gap between EU non-discrimination law and AI. Computer Law & Security Review, 41, 105567. https://doi.org/10.2139/ssrn.3547922

Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., & Stoica, I. (2016). Apache Spark: a unified engine for big data processing. Commun. ACM, 59(11), 56–65. https://doi.org/10.1145/2934664




How to Cite

GARCÍA-HOLGADO, A.; VÁZQUEZ-INGELMO, A.; ALONSO-SÁNCHEZ, J.; GARCÍA-PEÑALVO, F. J.; THERÓN, R.; SAMPEDRO-GÓMEZ, J.; SÁNCHEZ-PUENTE, A.; VICENTE-PALACIOS, V.; DORADO-DÍAZ, P. I.; SÁNCHEZ, P. L. KoopaML, a Machine Learning platform for medical data analysis. Journal on Interactive Systems, Porto Alegre, RS, v. 13, n. 1, p. 154–165, 2022. DOI: 10.5753/jis.2022.2574. Disponível em: https://sol.sbc.org.br/journals/index.php/jis/article/view/2574. Acesso em: 14 apr. 2024.



Regular Paper