Department of Family Medicine and Population Health, Division of Epidemiology, Virginia Commonwealth University
Erin Britton, MPH
Department of Family Medicine and Population Health, Division of Epidemiology, Virginia Commonwealth University
Jacquelyn Ferrance, MPH
Department of Family Medicine and Population Health, Division of Epidemiology, Virginia Commonwealth University
Anton Kuzel, MD, MHPE
Alan Dow, MD, MSHA
Department of Internal Medicine, Virginia Commonwealth University
Background: Improving health and controlling healthcare costs requires better tools for predicting future health needs across populations. We sought to identify factors associated with transitioning of enrollees in an indigent care program from an intermediate cost segment to a high cost segment of this population.
Methods: We analyzed data from 9,624 enrollees of the Virginia Coordinated Care program between 2010 and 2013. Each fiscal year included all enrollees who were classified in an intermediate cost segment in the preceding year and also enrolled in the program in the following year. Using information from the preceding year, we built logistic regression models to identify the individuals in the top 10% of expenditures in the following year. The effect of demographics, count of chronic conditions, presence of the prevalent chronic conditions, and utilization indicators were evaluated and compared. Models were compared via the Bayesian information criterion and c-statistic.
Results: The count of chronic conditions, diagnosis of congestive heart failure, and numbers of total hospital visits and prescriptions were significantly and independently associated with being in the future high cost segment. Overall, the model that included demographics and utilization indicators had a reasonable discrimination (c=0.67).
Conclusions: A simple model including demographics and health utilization indicators predicted high future costs. The count of chronic conditions and certain medical diagnoses added additional predictive value. With further validation, the approach could be used to identify high-risk individuals and target interventions that decrease utilization and improve health.
Primary Care, Health Care Cost, Chronic Diseases, and Administrative Data Uses
To achieve the triple aim of improving health, decreasing costs, and enhancing patient experience, the healthcare system must move toward population-focused models of organizing care . A population-focused approach, supported by adequate information systems to help understand the population, allows policy makers and planners to define the needs of specific segments of the populations and tailor care to the needs of specific individuals. In contrast, care in the current system is often poorly responsive to the needs of both individuals and communities and configured largely based on the supply of providers and other resources rather than by the demands of the population . In particular, because a small fraction of the population drives healthcare costs and presumably has the worst level of health , controlling the costs and improving the health of the highest cost group is a priority.
Understanding populations in order to configure care is a nascent field. The Bridges to Health model proposed dividing the population into eight groups based on disease burden with categories such as ‘good health’, ‘limited, acute illness’, ‘advanced organ system failure’ and ‘near death’ . Each of the categories was linked to quality goals based on the Institute of Medicine’s six domains of quality . This theoretical approach to population segmentation has been further validated with analyses of populations of older adults [6,7].
For our community, our institution developed a care coordination program for indigent adults, the Virginia Coordinated Care program (VCC), in 2000 . The goals of the program were to improve the health of this population and reduce utilization of higher cost services such as emergency department visits and hospitalizations. The program enrolled individuals who were uninsured and under 200% of the federal poverty level (PFL). Enrollees were provided free primary care through an assigned primary care provider, access to free specialist care and testing, and low-cost medications. Overall, the program saved costs and improved health . Within this population, several segments were defined based on care utilization, ranging from episodic care to high cost, frequent care . Yet, how leaders and planners can use this data to structure population-centered care to meet the future needs of the community most effectively remains to be defined.
The goal should not only be to target the highest cost group but also aim interventions at preventing individuals from entering into this group in the first place. Predicting healthcare costs is an option to shape healthcare delivery programs and potentially improve health and control costs. Health cost prediction models often incorporate demographic information and information on clinical conditions based on data from medical records or claims databases .
Clinical conditions have been entered into prediction models as the diagnostic cost group (DCG), prevalent chronic conditions, or counts of chronic conditions. DCG models were originally developed to match HMO payments to the healthcare needs of enrollees. The system uses patients’ age, gender and medical diagnosis profiles to predict healthcare expenditures [11- 17]. While these methods have been broadly used in Medicare databases, they are also applicable to private insurance and Medicaid databases . However, the system is limited by its requirement of special resources (e.g., commercial software and expertise), dependence on a common classification structure within ICD-9 codes, and accuracy of these codes.
As an alternative, the presence of certain chronic conditions can predict health expenditures in various settings [19-22]. A third approach to predicting future costs is to use the count of chronic conditions. Several studies suggested that a simple count of chronic conditions can predict the length of hospital stay and mortality 23 and amount of health expenditures [19,22]. The study by Farley et al  showed that not only the simple count of diagnose clusters but also the counts of prescriptions and physician visits were better predictors of future costs than the comorbidity measures (i.e. Charlson and Elixhauser indexes) with a count of diagnose clusters being the best predictor among all measurements examined. Similarly, Fleishman and Cohen , using data from the national Medical Expenses Survey, compared the ability of the DCG method, counts of chronic conditions, and the presence of ten prevalent chronic indicators to predict the top ten percent medical expenditure. The results showed that the count of chronic conditions significantly predicted future highcost, controlled for DCG category, demographic characteristics and self-reported functional status.
To advance our understanding of how population management principles should guide the structure of care delivery, we sought to determine what factors correlate with transitioning to the highest cost segment of the population within the VCC program 10. Using demographic data, diagnosis information, and medication utilization of healthcare utilization, we describe the factors that are associated with the transition of individuals from an intermediate cost segment to a high cost segment of the population.
Data sources and participants
The data from the VCC Program for enrollment, utilization and claims between 2010 and 2013 were utilized for the study, with an annual average of 26,974 adults enrolled in the program. Utilization data included all services with VCU Health System including prescription data and affiliated primary care practices. Based on annual data, the program stratified enrollees into subgroups of episodic, chronic, complex and specialty care, based upon diagnosis, utilization and prescriptions 10. For the purpose of developing a prediction model on future healthcare expenditures, we focused on the subgroup of 9,624 enrollees with stable chronic conditions and intermediate healthcare cost. The criteria for classifying this subgroup include 1) annual hospital spending between $7001 and $19,999; 2) having a minimum of six and maximum of 12 emergency room (ED) visits; or 3) six plus prescriptions within a fiscal year (e.g., July 1, 2010 through June 30, 2011). Individuals who exceed the upper limit for any category were considered in the complex (highest utilization) group. Likewise, individuals only needed to exceed one lower limit of the criteria to be placed in the intermediate cost group.
Key chronic conditions and number of chronic conditions
Information on medical diagnoses was derived from the primary or secondary diagnoses for each encounter along with the ICD-9-CM codes. We created binary indicators for the ten most prevalent chronic conditions: mental health problems, hypertension, diabetes, chronic obstructive pulmonary disease (COPD), coronary artery disease, mild liver disease, cancer, heart disease, cerebrovascular disease and congestive heart failure. To increase the sample size, we further combined the diagnoses of asthma with COPD, moderate and severe liver disease with mild liver disease, and myocardial infarction (MI) with coronary artery disease. Further, we calculated the number of chronic conditions for each individual. Due to small numbers, individuals with six or more conditions were collapsed into a single group.
Health utilization variables
We used hospital visits and number of prescription drugs as measures of health utilization. The hospital visits is an aggregate measure of inpatient, outpatient and ED visits. The number of hospital visits was treated as continuous with the exception of individuals with 15 or more visits who were collapsed into one group. The number of prescription drugs was also collapsed into categories: 0, 1-2, 3-5, and 6 or more.
We analyzed aggregate datasets through three panels between 2010 and 2013. Each panel included all enrollees who were classified as the intermediate cost segment in the preceding year and remained enrolled in the program the following year. The enrollees who were classified as the specialty care in their following program year (i.e. their care was driven by a single dominant disease state such as polytrauma or poisoning) were excluded.
For each panel, we identified the individuals in the top 10% of expenditures in the following year. Based on the information collected in the preceding year, we used logistic regression to estimate the association between the potential risk factors and entering the top 10% of expenditures for the following year. The baseline model includes age, quadratic term of age, gender, race ethnicity categories, and fiscal year. Subsequently, we added number of chronic conditions, the presence of the ten most prevalent chronic conditions, total hospital visits, and number of prescriptions. Combinations of these risk adjustor sets were further examined. As a sensitivity analysis, we repeated analyses using the top 5% as cut off for high expenditure cases. We compared models using the Bayesian information criterion (BIC) 24, with lowest values indicating a better model fit.
The performances of the models were assessed with respect to calibration and discrimination. Calibration is the ability of a model to produce unbiased estimates of the outcome probabilities, while discrimination is the ability of a model to separate high from non-high expenditure cases. We utilized Hosmer-Lameshow (H-L) goodness of fit test 25, which assesses agreement between the observed and predicted risks over the full range of predicted probabilities, to evaluate calibration. The H-L test specifically identifies subgroups as the deciles of predicted risks to perform the test. Models for which expected and observed event rates in subgroups are similar are called well calibrated. We used c-statistics to measure the discrimination. C-statistics range from 0.5 to 1, where 0.5 corresponds to random chance and 1 corresponds to perfect discrimination. SAS software (Version 9.4, SAS Institute Inc., Cary, NC, USA) was used for all analyses.
We analyzed the data from 9,624 VCC enrollees between 2010 and 2013, including 2410, 2798 and 4416 from the panels one, two and three respectively. The unadjusted mean expenditures and standard deviations for the panels 1, 2 and 3 were $1062.18 ($1692.22), $1248.57 ($4821.10) and $1985 ($6172.00), respectively.
Table 1a shows the characteristics of all panels in the preceding year and unadjusted total hospital costs in the following year. The majority of VCC enrollees were adults between 18-65 years old, and 55% of enrollees were female. Approximately 65% of enrollees were Black Non-Hispanics. About 16% enrollees did not have any chronic condition; 17% did not have any hospital visits, and 28% did not use any prescription drugs.
Table 1b presents the ten most prevalent chronic conditions identified from all panels. Among the 9,624 enrollees, the conditions were mental health problems (38.32%), hypertension (30.91%), diabetes (28.53%), COPD (17.64%), coronary artery disease/MI (9.05%), liver disease (5.09%), cancer (4.53%), heart disease (3.77%), cerebrovascular disease (2.24%) and congestive heart failure (2.07%), respectively.
Compared with younger age individuals and males, older age individuals and females were significantly associated with transitioning to the highest cost segment. When compared with White Non-Hispanics individuals, being Black Non-Hispanics and other Non-Hispanics individuals were significantly less likely to transition to the highest cost group. Increased numbers of chronic conditions, total hospital visits and prescriptions were associated with transitioning to the highest cost group.
Table 2 displays the estimated associations between the potential predictors in the preceding year and being in the top 10% of total hospital expenditures in the following year from several logistic regression models.
Among the demographic characteristics examined, age was a consistent predictor of entering the highest utilization segment across all models. Compared with males, females were more likely to be in the top 10% of spenders. This finding was independent of age, race ethnicity (Model 1), numbers of chronic conditions (Model 2), prevalent individual chronic conditions (Model 3) and combination of models 2 and 3 (Model 5). However, adding health utilization indicators to models 4, 6, 7 and 8 eliminated any significant gender disparity. Compared with White Non-Hispanics individuals, Black Non-Hispanics and other Non-Hispanics were less likely to be in the top 10% of spenders through all models.
The odds of being in the top 10% of expenditure in the following year significantly increased as the numbers of chronic conditions increased. The association was independent of age, gender, and race ethnicity and health utilization indicators. However, in models including prevalent chronic conditions (Models 5 and 8), the significant association weakened or disappeared.
Several individual chronic conditions predicted the odds of being in the top 10% of expenditure in the following year. In Model 3, hypertension, coronary artery disease/MI, liver disease, cancer, heart disease, and congestive heart failure each showed significant association with higher odds of being in the top 10% of expenditure in the following year. However, in models including the number of chronic conditions (Models 5, 7 and 8), the significant association between these individual conditions and the outcome disappeared, with the exception of congestive heart failure.
The odds of being in the top 10% of expenditures significantly increased as the numbers of hospital visits and prescription drugs increased. The association was independent of age, gender and race ethnicity (Model 4), number of chronic conditions (Model 6), individual chronic conditions (Model 7) and all components combined (Model 8). Noticeably, the increase was non-linear with a notable jump in the odds for the category of 6 or more prescriptions.
Table 3 describes the goodness of fit of the models. All models except for Model 1 fit well, as indicated by nonsignificant H-L goodness of fit test. The performances of the models were further evaluated using the BIC and C-statistic. Adding the number of chronic conditions (Model 2), individual chronic conditions (Model 3) or health utilization indicators (Model 4) each markedly improved baseline model, as indicated with lower BIC values and improved c-statistics. Among models 2-4, Model 4 has the best model fitness (lowest BIC) and highest c-statistic (c=0.67). Comparing models 5, 6, and 7 to their nested models indicates preference for the parsimonious models. Similarly, comparing model 8 to the best-performing nested model, i.e., model 6, indicated the more complex model was not preferable. Final model (Model 4) thus included demographic information and health utilization indicators.
The top 5% of total hospital costs was used as a cutpoint to test for relative model performance. The results were consistent to those obtained using the top 10% of total hospital costs. However, the three most complex models had gross lack of fit, as indicated by significant H-L Chi-Square test results (Appendix Table 4), likely due to the limited data size for such complex models.
Using data from an indigent care program, we demonstrated that integrating information on medical diagnoses, the count of chronic conditions, utilization indicators in the preceding year and demographic data was associated with high healthcare expenditures in the following year. We further evaluated whether combinations of the risk sets would improve prediction. Overall, the model that included demographics and utilization indicators (hospital visits and number of prescriptions) had the greatest association with transitioning to the highest cost segment (c=0.67). Identifying these complex patients may help target clinical interventions that improve health and reduce costs. Further inclusion of the count of chronic conditions or presence of ten prevalent chronic conditions in the model only slightly improved the discrimination aspect (c=0.68).
The criteria most predictive in these models could also help structure future interventions. With regard to individual risk factors, the count of chronic conditions and having a diagnosis of congestive heart failure were significantly associated with high future costs, regardless of demographics and utilization indicators. Identifying patients with multiple chronic conditions or heart failure before they develop a pattern of high healthcare utilization may be opportunities to intervene for the benefit of the patient and system. In addition, because both utilization indicators (hospital visits and number of prescriptions) were significantly and independently associated with high cost in the future, the complex patients with a higher risk for progression to a pattern of high utilization can be defined several ways. Leaders have some flexibility in their approach to identifying high-risk patients based on the data available within their information systems.
Our study, by systematically comparing the ability of the count of chronic conditions, the ten most prevalent chronic indicators, counts of prescriptions, and counts of total hospital visits to predict future high-costs, had results consistent with other studies [19,22]. Our prediction models, however, combined clinical and administrative data without the complexity of DCG scores. Hence, the results derived from these models may be easier to implement. Moreover, our study focused on an uninsured adult population whose income was under 200% FPL. Our results may provide unique insights relevant to the newly insured populations under Medicaid expansion.
Noticeably, the model discrimination presented by this study is lower (c=0.68) than the results presented by other studies (c ranged from 0.81-0.85) [22,26,27]. This difference may be a consequence of several factors. Our model included the counts of chronic conditions and presence of individual chronic conditions as the main predictors of high costs. These indicators are sensitive to information on severity of the conditions. For example, a study by Omachi and colleagues  showed that adding COPD severity measures significantly improved predictions of medical costs in the following year among a cohort of patients with COPD. Adding further information regarding the severity of each condition may increase the predictive power of the model. Additionally, the overall prevalence of chronic conditions were relative low in our dataset, likely due to the population being younger (mean age=43.3 years, standard deviation=12.6 years). Leaders and researchers should select the model that best fits the demographics of the population of interest.
This study has several limitations. Our dataset may not have contained all data on utilization or diagnoses for each patient. In addition, similar to the cost prediction results drawn from other administrative databases, we are unable to adjust the heterogeneities of disease conditions in the cost prediction model. Finally, the patient population included in this study is medically underserved and they are relatively younger and healthier, thus the results may not be generalizable to other patient population segments, such as those more affluent or individuals with better access to care, and those with older age and sicker conditions. Further model validation studies are needed to confirm our findings.
In conclusion, the results from this study show that a simple model including demographics and health utilization indicators is associated with high future costs. Specific medical diagnoses, such as congestive heart failure, and the count of chronic conditions were also associated with higher future utilization. Further validation of the models is recommended to confirm the predictive capacity of this approach. If confirmed, this approach could be used to identify high-risk individuals and target interventions that decrease utilization and improve health.
The study was approved by the institutional review board and supported by CTSA award No. UL1TR000058 from the National Center for Advancing Translational Sciences. The authors report no declarations of interest.