A pipeline for tabular dataset formation from unstructured data provided by ACR Appropriateness Criteria guidelines

  • Anderson A. Eduardo HIAE
  • Rafael M. Loureiro HIAE
  • Adriano Tachibana HIAE
  • Pedro V. Netto HIAE
  • Tatiana F. de Almeida HIAE
  • André Pires HIAE

Abstract


Among the current data-centric technologies, clinical decision support systems (CDSS) figure out as one of the most promising for healthcare. Despite the technological advances facilitating its implementation, the maintenance of knowledge base for CDSS remains open to improvements. Here, we argue that the Appropriateness Criteria provided by ACR guidelines can be used as an open data-source that, combined with appropriate algorithms, can push forward basic research and technological developments regarding knowledge bases for CDSS. Therefore, we developed a pipeline capable of forming tabular datasets from ACR guidelines, stored in a web site in textual PDF files. We also experimentally demonstrate that the proposed pipeline successfully recovers the interested contents, and the best composition, in terms of its component algorithms, is discussed. Future research focused on algorithms flexibility in the face of PDF template updates could improve our work.

References

Akturk, C. (2021). "Bibliometric analysis of clinical decision support sys- tems". In Acta Informatica Pragensia 10(1), pages 61-74. doi: 10. 18267/J.AIP.146.

Barchard, K. A. and Pace, L. A. (2011). "Preventing human error: The impact of data entry methods on data accuracy and statistical results". In Computers in Human Behavior 27(2011), pages 1834-1839. doi: 10.1016/j.chb.2011.04.004.

Baviskar, D. et al. (2021). "Efficient automated processing of the unstructured documents using Artificial Intelligence: A systematic literature review and future directions". In IEEE Access 9(2021), pages 72894-72936. doi: 10.1109/ACCESS.2021.3072900.

Bellatreche, L., Valduriez P. and Morzy T. (2018). "Advances in Databases and Information Systems". In Information Systems Frontiers 20(2018), pages 1-6. doi: 10.1007/s10796017-9819-2.

Doyle, D. et al. (2019). "Clinical decision support for high-cost imaging: A randomized clinical trial". In Plos One 14(3-2019), e0213373. doi: 10.1371/journal.pone.0213373.

Harman, D. (2019). "Information Retrieval: The Early Years". In Foundations and Trends in Information Retrieval 13(5), pages 425-577. doi: 10.1561/1500000065.

Geewook, K. et al. (2021). "Donut: Document Understanding Transformer without OCR". In ArXiv (Nov. 2021). doi: 10.48550/arxiv.2111.15664. url: http://arxiv.org/abs/2111.15664.

Greenes, R. A. et al. (2018). "Clinical decision support models and frameworks: Seeking to address research issues underlying implementation successes and failures". In Journal of Biomedical Informatics 78, pages 134-143. doi: https://doi.org/10.1016/J.JBI.2017.12.005

Parsania, V. and Jani, N. (2015). "Reviewing and Modeling Clinical Decision Support System". In International Journal of Technology and Science 7 (Dec. 2015), pp. 15-17.

Roh, Y. et al. (2021). "A Survey on Data Collection for Machine Learning: A Big Data-AI Integration Perspective". In IEEE Transactions on Knowledge and Data Engineering 33(4), pages 1328-1347. doi: https://doi.org/10.1109/TKDE.2019.2946162

Shiffman, R. N. (1997). "Representation of Clinical Practice Guidelines in Conventional and Augmented Decision Tables". In Journal of the American Medical Informatics Association 4(5), pages 382-393. doi: https://doi.org/10.1136/jamia.1997.0040382

Shiffman, R. N. and Greenes, R. A. (1994). "Improving Clinical Guidelines with Logic and Decision-table Techniques". In Medical Decision Making 14(3), pages 245-254. doi: https://doi.org/10.1177/0272989X940140030

Sutton, R. T. et al. (2020). "An overview of clinical decision support systems: benefits, risks, and strategies for success". In NPJ Digital Medicine 3(1), pages 17-29. doi: 10.1038/s41746-020-0221-y.

Towbin, A. J. (2019). "Collecting Data to Facilitate Change". In Journal of the American College of Radiology 16(2019), pages 1248-1253. doi: 10.1016/j.jacr.2019.05.032.

Zhang, Q. and Segall, R. S. (2008). "Web mining: a survey of current research, techniques, and software". In International Journal of Information Technology & Decision Making 7(2008), pages 683-720. doi: 10.1142/S0219622008003150.9.

Yin, A. L. et al. (2022). "Comparing automated vs. manual data collection for COVID-specific medications from electronic health records". In International Journal of Medical Informatics 157, page 104622. doi: https://doi.org/10.1016/j.ijmedinf.2021.104622

Zhang, Y., Chen, M. and Liu, L. (2015). "A review on text mining". In 6th IEEE International Conference on Software Engineering and Service Science (ICSESS), pages 681-685. doi: 10.1109/ICSESS.2015.7339149.10.
Published
2022-06-07
EDUARDO, Anderson A.; LOUREIRO, Rafael M.; TACHIBANA, Adriano; NETTO, Pedro V.; ALMEIDA, Tatiana F. de; PIRES, André. A pipeline for tabular dataset formation from unstructured data provided by ACR Appropriateness Criteria guidelines. In: BRAZILIAN SYMPOSIUM ON COMPUTING APPLIED TO HEALTH (SBCAS), 22. , 2022, Teresina. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2022 . p. 168-177. ISSN 2763-8952. DOI: https://doi.org/10.5753/sbcas.2022.222497.