Knowledge Discovery for Patient Survival in a Clinical Discharge Dataset Using Causal Graph Ontological Framework
DOI:
https://doi.org/10.59461/ijdiic.v4i1.162Keywords:
Clinical text Analysis, Causality, Causal assumption, Causal graph, Causal graph validationAbstract
Knowledge mining from clinical datasets is a critical task in healthcare as well as other fields. While the existing methods, such as randomized controlled trials (RCT) and other automatic machine extraction, have been helpful, they have become increasingly insufficient to keep pace with time, and robust models are required for clinical decisions. In this paper, we present a new method to address this challenge by using the Causal graph ontological model. Our study used a semi-structured textual clinical discharge dataset from the Statewide Planning and Research Cooperative System (SPARCS) to design and validate the patient survival rate assumptions from the dataset. We extracted the clinical information and organized it according to medically relevant fields for decision-making (Diseases, confounders, treatment, and the survival rate). The initial assumptions model was validated using the conditional independent test (CIT) criteria. The outputs of the LocalTest validation showed that the conceptual assumptions of the causal graph hold since the Pearson correlation coefficient ranges between -1 and 1, the p-value was (>0.05), and the confidence intervals of 95% and 25% were satisfied. Furthermore, we used Shapley values to perform sensitivity analysis on the features. Our analysis showed that two variables, such as gender and diseases, contributed little to the survival rate prediction. Our study concludes that the combination of causal graph ontological framework and sensitivity analysis to discover knowledge from the clinical text could help improve the quality of clinical decisions in the text, remove bias in the assumption in medical applications, and serve as a premise for modelling causal data for natural Language machine learning predictions.
Downloads
References
M. bargavi S. K. and S. M., “Artificial Intelligence and Medical Information Modeling,” 2022, pp. 1–11. doi: 10.4018/978-1-6684-4580-8.ch001.
H. Alkattan, S. K. Towfek, and M. Y. Shams, “Tapping into Knowledge: Ontological Data Mining Approach for Detecting Cardiovascular Disease Risk Causes Among Diabetes Patients,” J. Artif. Intell. Metaheuristics, vol. 4, no. 1, pp. 08-15, 2023, doi: 10.54216/JAIM.040101.
G. T. Ayem, A. S. Nsang, B. I. Igoche, and G. Naankang, “Design and Validation of Structural Causal Model: A focus on SENSE-EGRA Datasets,” Int. J. Adv. Sci. Comput. Eng., vol. 5, no. 3, pp. 257–268, Dec. 2023, doi: 10.62527/ijasce.5.3.177.
K. Benson and A. J. Hartz, “A Comparison of Observational Studies and Randomized, Controlled Trials,” N. Engl. J. Med., vol. 342, no. 25, pp. 1878–1886, Jun. 2000, doi: 10.1056/NEJM200006223422506.
J. Yang, S. C. Han, and J. Poon, “A survey on extraction of causal relations from natural language text,” Knowl. Inf. Syst., vol. 64, no. 5, pp. 1161–1186, May 2022, doi: 10.1007/s10115-022-01665-w.
P. W. G. Tennant et al., “Use of directed acyclic graphs (DAGs) to identify confounders in applied health research: review and recommendations,” Int. J. Epidemiol., vol. 50, no. 2, pp. 620–632, May 2021, doi: 10.1093/ije/dyaa213.
H. Lyu et al., “LLM-Rec: Personalized Recommendation via Prompting Large Language Models,” in Findings of the Association for Computational Linguistics: NAACL 2024, Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 583–612. doi: 10.18653/v1/2024.findings-naacl.39.
M. Piccininni, S. Konigorski, J. L. Rohmann, and T. Kurth, “Directed acyclic graphs and causal thinking in clinical risk prediction modeling,” BMC Med. Res. Methodol., vol. 20, no. 1, p. 179, Dec. 2020, doi: 10.1186/s12874-020-01058-z.
S. Gopalakrishnan, V. Z. Chen, W. Dou, G. Hahn-Powell, S. Nedunuri, and W. Zadrozny, “Text to Causal Knowledge Graph: A Framework to Synthesize Knowledge from Unstructured Business Texts into Causal Graphs,” information, vol. 14, no. 7, p. 367, Jun. 2023, doi: 10.3390/info14070367.
A. Sobrino, C. Puente, and J. Á. Olivas, “Mining Temporal Causal Relations in Medical Texts,” 2018, pp. 449–460. doi: 10.1007/978-3-319-67180-2_44.
G. Nordon, G. Koren, V. Shalev, B. Kimelfeld, U. Shalit, and K. Radinsky, “Building Causal Graphs from Medical Literature and Electronic Medical Records,” Proc. AAAI Conf. Artif. Intell., vol. 33, no. 01, pp. 1102–1109, Jul. 2019, doi: 10.1609/aaai.v33i01.33011102.
G. Zhao, W. Gu, W. Cai, Z. Zhao, X. Zhang, and J. Liu, “MLEE: A method for extracting object-level medical knowledge graph entities from Chinese clinical records,” Front. Genet., vol. 13, Jul. 2022, doi: 10.3389/fgene.2022.900242.
J. Gao, X. Luo, and H. Wang, “Chinese causal event extraction using causality‐associated graph neural network,” Concurr. Comput. Pract. Exp., vol. 34, no. 3, Feb. 2022, doi: 10.1002/cpe.6572.
I. Y. Chen, M. Agrawal, S. Horng, and D. Sontag, “Robustly Extracting Medical Knowledge from EHRs: A Case Study of Learning a Health Knowledge Graph,” in Biocomputing 2020, WORLD SCIENTIFIC, Dec. 2019, pp. 19–30. doi: 10.1142/9789811215636_0003.
H. Wu, W. Shi, and M. D. Wang, “Developing a novel causal inference algorithm for personalized biomedical causal graph learning using meta machine learning,” BMC Med. Inform. Decis. Mak., vol. 24, no. 1, p. 137, May 2024, doi: 10.1186/s12911-024-02510-6.
M. C. Vonk, N. Malekovic, T. Bäck, and A. V. Kononova, “Disentangling causality: assumptions in causal discovery and inference,” Artif. Intell. Rev., vol. 56, no. 9, pp. 10613–10649, Sep. 2023, doi: 10.1007/s10462-023-10411-9.
A. Ankan, I. M. N. Wortel, and J. Textor, “Testing Graphical Causal Models Using the R Package ‘dagitty,’” Curr. Protoc., vol. 1, no. 2, Feb. 2021, doi: 10.1002/cpz1.45.
D. Valizade, F. Schulz, and C. Nicoara, “Towards a Paradigm Shift: How Can Machine Learning Extend the Boundaries of Quantitative Management Scholarship?,” Br. J. Manag., vol. 35, no. 1, pp. 99–114, Jan. 2024, doi: 10.1111/1467-8551.12678.
R. Pryzant, D. Card, D. Jurafsky, V. Veitch, and D. Sridhar, “Causal Effects of Linguistic Properties,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Stroudsburg, PA, USA: Association for Computational Linguistics, 2021, pp. 4095–4109. doi: 10.18653/v1/2021.naacl-main.323.
M. Hussain et al., “A practical approach towards causality mining in clinical text using active transfer learning,” J. Biomed. Inform., vol. 123, p. 103932, Nov. 2021, doi: 10.1016/j.jbi.2021.103932.
D. Sridhar and D. M. Blei, “Causal inference from text: A commentary,” Sci. Adv., vol. 8, no. 42, Oct. 2022, doi: 10.1126/sciadv.ade6585.
A. Molak, “Causal Inference and Discovery in Python: Unlock the secrets of Modern Causal Machine Learning with DoWhy, EconML, PyTorch, and More,” Packt Publ. Ltd, 2023.
K. Yu et al., “Causality-based Feature Selection,” ACM Comput. Surv., vol. 53, no. 5, pp. 1–36, Sep. 2021, doi: 10.1145/3409382.
B. I. Igoche, O. Matthew, P. Bednar, and A. Gegov, “Integrating Structural Causal Model Ontologies with LIME for Fair Machine Learning Explanations in Educational Admissions,” J. Comput. Theor. Appl., vol. 2, no. 1, pp. 65–85, Jun. 2024, doi: 10.62411/jcta.10501.
S. Tang et al., “Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset,” Sci. Rep., vol. 11, no. 1, p. 8366, Apr. 2021, doi: 10.1038/s41598-021-87762-2.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Omachi Okolo, B.Y Baha, M.D Philemon

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

