Detection of Health Insurance Fraud through the Application of Artificial Intelligence Algorithms
Authors: Dr. Mojtaba Farokh
Coordinator: Sirous Sharifi, Dr. Nasrin Hezar-Moghadam, Dr. Abbas Rad, Dr. Mehdi Riyahi Far, Alireza Norouzi, Alireza Emami Fard, Zahra Teymoori
Abstract
In the era of digital transformation, fraud detection in financial institutions—particularly within the health insurance sector—has become increasingly critical. This study develops a smart, modular, and scalable framework for health insurance fraud detection using structured claims data, leveraging a hybrid of unsupervised learning algorithms, namely Isolation Forest and K-Means. The framework is designed to identify fraudulent and abusive behaviors independent of actor or service type and to adapt to dynamic and complex environments where fraud patterns evolve over time.
The proposed framework consists of four integrated modules. First, a knowledge-driven module defines the fraud framework and its related features by incorporating insights from insurance and medical experts. This module guides the identification of relevant fraud characteristics and informs feature extraction. Second, a two-stage data warehouse is developed to handle the large volume of claims data and high computational requirements. In the first-stage warehouse, an ETL (Extract–Transform–Load) process ingests claims data, addresses quality issues, removes inconsistencies, and prepares the data for feature extraction. In the second-stage warehouse, relevant features for fraud detection are extracted and selected in collaboration with domain experts. To bridge expert knowledge and algorithmic analysis, a simulation framework allows the medical-insurance team to describe, analyze, and visualize abnormal behaviors, producing a documented list of twenty key features covering actors, products/services, and fraud-specific attributes.
The third module, the fraud detection engine, first partitions data into normal and anomalous clusters using Isolation Forest, and subsequently identifies fraudulent cases with K-Means, forming the hybrid algorithm K-IF. This combination leverages the discriminative power of Isolation Forest and the precise clustering of K-Means, enhancing detection accuracy. The fourth module consists of visualization tools and a managerial dashboard, providing dynamic analysis, interaction, and real-time updates for decision-makers.
Extensive experiments on multiple labeled datasets demonstrate that K-IF outperforms conventional algorithms—including Isolation Forest, LOF, OCSVM, EE, DBSCAN, K-Means, and Autoencoder—in terms of detection accuracy, robustness to contamination rates, edge-case identification, and computational efficiency. Application of the framework to real-world data from a health insurance company confirms its strong anomaly-detection capabilities in practical settings.
Additional contributions include the extraction of expert-validated, scalable features that facilitate not only detection of individual fraud cases but also network-level risk analysis among actors. The proposed framework has been implemented as a software package for private insurance firms, providing advanced analytical tools that significantly improve decision-making processes while minimizing the need for manual intervention. Overall, this study offers a comprehensive, data-driven, and expert-informed solution for fraud detection in health insurance, integrating algorithmic rigor with practical usability and adaptability to complex, evolving fraud scenarios.
Keywords: health insurance fraud; fraud detection; artificial intelligence; unsupervised learning; isolation forest; K-Means; Hybrid Algorithm; feature extraction; simulation framework; expert knowledge; dashboard; insurance analytics; Iran.