A pipeline for tabular dataset formation from unstructured data provided by ACR Appropriateness Criteria guidelines
Abstract
Among the current data-centric technologies, clinical decision support systems (CDSS) figure out as one of the most promising for healthcare. Despite the technological advances facilitating its implementation, the maintenance of knowledge base for CDSS remains open to improvements. Here, we argue that the Appropriateness Criteria provided by ACR guidelines can be used as an open data-source that, combined with appropriate algorithms, can push forward basic research and technological developments regarding knowledge bases for CDSS. Therefore, we developed a pipeline capable of forming tabular datasets from ACR guidelines, stored in a web site in textual PDF files. We also experimentally demonstrate that the proposed pipeline successfully recovers the interested contents, and the best composition, in terms of its component algorithms, is discussed. Future research focused on algorithms flexibility in the face of PDF template updates could improve our work.
References
Barchard, K. A. and Pace, L. A. (2011). "Preventing human error: The impact of data entry methods on data accuracy and statistical results". In Computers in Human Behavior 27(2011), pages 1834-1839. doi: 10.1016/j.chb.2011.04.004.
Baviskar, D. et al. (2021). "Efficient automated processing of the unstructured documents using Artificial Intelligence: A systematic literature review and future directions". In IEEE Access 9(2021), pages 72894-72936. doi: 10.1109/ACCESS.2021.3072900.
Bellatreche, L., Valduriez P. and Morzy T. (2018). "Advances in Databases and Information Systems". In Information Systems Frontiers 20(2018), pages 1-6. doi: 10.1007/s10796017-9819-2.
Doyle, D. et al. (2019). "Clinical decision support for high-cost imaging: A randomized clinical trial". In Plos One 14(3-2019), e0213373. doi: 10.1371/journal.pone.0213373.
Harman, D. (2019). "Information Retrieval: The Early Years". In Foundations and Trends in Information Retrieval 13(5), pages 425-577. doi: 10.1561/1500000065.
Geewook, K. et al. (2021). "Donut: Document Understanding Transformer without OCR". In ArXiv (Nov. 2021). doi: 10.48550/arxiv.2111.15664. url: http://arxiv.org/abs/2111.15664.
Greenes, R. A. et al. (2018). "Clinical decision support models and frameworks: Seeking to address research issues underlying implementation successes and failures". In Journal of Biomedical Informatics 78, pages 134-143. doi: https://doi.org/10.1016/J.JBI.2017.12.005
Parsania, V. and Jani, N. (2015). "Reviewing and Modeling Clinical Decision Support System". In International Journal of Technology and Science 7 (Dec. 2015), pp. 15-17.
Roh, Y. et al. (2021). "A Survey on Data Collection for Machine Learning: A Big Data-AI Integration Perspective". In IEEE Transactions on Knowledge and Data Engineering 33(4), pages 1328-1347. doi: https://doi.org/10.1109/TKDE.2019.2946162
Shiffman, R. N. (1997). "Representation of Clinical Practice Guidelines in Conventional and Augmented Decision Tables". In Journal of the American Medical Informatics Association 4(5), pages 382-393. doi: https://doi.org/10.1136/jamia.1997.0040382
Shiffman, R. N. and Greenes, R. A. (1994). "Improving Clinical Guidelines with Logic and Decision-table Techniques". In Medical Decision Making 14(3), pages 245-254. doi: https://doi.org/10.1177/0272989X940140030
Sutton, R. T. et al. (2020). "An overview of clinical decision support systems: benefits, risks, and strategies for success". In NPJ Digital Medicine 3(1), pages 17-29. doi: 10.1038/s41746-020-0221-y.
Towbin, A. J. (2019). "Collecting Data to Facilitate Change". In Journal of the American College of Radiology 16(2019), pages 1248-1253. doi: 10.1016/j.jacr.2019.05.032.
Zhang, Q. and Segall, R. S. (2008). "Web mining: a survey of current research, techniques, and software". In International Journal of Information Technology & Decision Making 7(2008), pages 683-720. doi: 10.1142/S0219622008003150.9.
Yin, A. L. et al. (2022). "Comparing automated vs. manual data collection for COVID-specific medications from electronic health records". In International Journal of Medical Informatics 157, page 104622. doi: https://doi.org/10.1016/j.ijmedinf.2021.104622
Zhang, Y., Chen, M. and Liu, L. (2015). "A review on text mining". In 6th IEEE International Conference on Software Engineering and Service Science (ICSESS), pages 681-685. doi: 10.1109/ICSESS.2015.7339149.10.
