Received date:Received 11 June 2013; Accepted date:Accepted 2 October 2013
BackgroundIn most national health systems, especially when universal coverage is provided, family physicians act as gatekeepers, because most healthcare services are only delivered if there is a formal prescription provided by a primary care physician. Although the consumption of healthcare resources is initiated by prescriptions coming from family physicians, studies that evaluate their performance, especially those using a consolidated methodology (e.g. quality and efficiency) are limited in the literature. The specific aim of this paper is to propose a method for assessing primary care performance. MethodsThe novelty of the proposed model is twofold. First, physician performance is assessed following a clinical pathway that focuses on homogeneous groups of patients, in this case, diabetes patients. Second, we argue that performance should not be limited to efficiency, but should encompass clinical effectiveness. Performance assessment is not based on the physician practice as a whole, but on a single disease, in this paper, diabetes. Data were collected from a sample of family physician practices in Italy, andData Envelopment Analysis (DEA) is used to evaluate their efficiency performance. ResultsWe found that 35 of 96 practices were efficient based on the standard DEA model. The number of efficient practices decreased based on three restricted models that explored various behavioural preferences of physicians in relation to patient visits, medication administration and referrals to hospitals. ConclusionThe efficiency assessment is completed by a post-hoc evaluation of effectiveness, which in this study is defined as patient care adherence to the prescribed guideline. This study identified best practices both in terms of efficiency and effectiveness. The methods used in this paper are generalizable and could be applied to many other chronic conditions, which may constitute the prevalent activities within the primary care.
data envelopment analysis (DEA), performance assessment, primary care
Family physicians play an influential role in determining total healthcare expenditure, because in most health systems, every service referred to the different levels of secondary and tertiary care, such as diagnostics, drugs, hospital admission, must be prescribed by them. Assessing the primary physician’s performance in contributing to resource allocation is, therefore, particularly important. In spite of this, the performance of primary care practices is not generally assessed or limited only to the total budget requirement and not to how patient needs are met.[1–4]
The starting point to assess performance is to identify the relevant production process: this involves the processes that transform inputs into outputs and the identification of the relevant inputs and outputs to be considered. Family physicians’ activities include many different services such as disease prevention, visits, drug prescription and chronic disease management. In this paper, we assess primary care practices in managing patients with a chronic disease: diabetes mellitus. The proposed method, therefore, is not based on the whole activity of the family practice as it typically occurs in hospitals, but on a single disease. In particular, the study evaluates the manner in which different primary care practices deliver care to patients affected by the same chronic disease. This study uses the case of diabetes mellitus as a chronic illness that rapidly increases with the aging of a population.
Diabetes mellitus affected approximately 246 million adults worldwide in 2008, according to the International Diabetes Federation and this number is expected to soar to 380 million adults worldwide by 2025. With the increasing number of patients with this disease, concerns are raised about how physicians will effectively manage their care. Diabetes and the complications associated with the disease account for a large number of hospital admissions each year, and hospitalisations due to diabetes are responsible for a significant proportion of hospital expenses. In Italy, about 3 million people (4.9% of the population) have been diagnosed with diabetes, and have then been offered healthcare services. The number is expected to increase to 5 milliion by 2030, because there are also 1 million pepole (1.6%) thought to have diabetes who have not yet been diagnosed. Moreover, another 2.6 million people (4.3%) are finding it difficult to maintain good control of diabetes, which puts them at a greater risk of developing associated complications in a short time.
Diabetes mellitus is a chronic disease that should be managed mainly in a patient’s home with appropriate, periodic controls on their lifestyle, use of medication when necessary, and only in some cases, admission to hospital. The management of patients with diabetes is a role of the primary care physician. Correctly managed patients may not only avoid costly hospital admissions, but, much more importantly, also prevent complications that may lead to severe disability (such as retinopathy and blindness, nephropathy and diabetic foot). In addition, diabetes increases the risk of developing stroke or heart attack.
The specific aim of this paper is to evaluate the performance of primary care practices in the treatment of their diabetic patients. In the prior literature, themethodology most used to assess healthcare delivery performance is data envelopment analysis (DEA). The same methodology can also be used to evaluate the efficiency of practices. In Giuffrida and Gravelle’s work, DEA is cited as being a useful tool in determining efficient physician practices, along with the associated savings if ‘best’ practices were to be adopted by inefficient physicians. There are very few diseasebased studies of DEA: examples include those conducted by Chilingerian et al1 and Ozcan et al who investigated otitis media, sinisitus and ashtma.[2–4,11] DEA methodology utilises a linear combination of outputs over a linear combination of inputs to determine unit (in this case primary care physician) performance. The details of this technique are explained in the Methods.
In this paper, we follow Chilingerian and Sherman’s and Ozcan’s methods.[1,2,10,11] In particular, Ozcan noted that physician practices can be evaluated using DEA based on specific disease conditions such as diabetes; the inputs and outputs of a physician practice follow the same logic used in hospital production, where patient treatments are outputs, and the resources used to produce these treatments are presented as inputs. For family physicians, however, the DEA model should take another point of view, which is a disease- or a patient-centred perspective. This means that the delivered health services (visits, laboratory tests, hospital admissions and so on) are not considered outputs as in the case of a hospital facility, but the inputs of a production process where ‘properly followed’ diabetic patients are the outputs. In addition to the use of a standard DEA model for assessing the efficiency of physicians’ practices in treating diabetes, concurrent examination of the values of the input weights (also called multipliers) may allow researchers to delineate how certain physicians’ preferences (e.g. more drugs than visits or hospitalisation) influence healthcare delivery.[2,11]
Assessing efficiency alone is not sufficient, because the final effect on health outcome (that is effectiveness) is also important in healthcare delivery. Both dimensions are key aspects of performance assessment: effectiveness is the ability to achieve the delivered service goal and objectives (to do the right job), whereas efficiency means producing the services with a minimum resource level required (to do the job right).
Efficiency is easier to determine because most earlier studies on production functions show consistent methods to compare outputs with inputs. A question remains about how effectiveness should be measured. Doing the right job is a clinical issue, which may be difficult to quantify. This study maintains that evaluating family physician effectiveness should be kept separate from outcome evaluation. For instance, glycaemic level may measure health status, but not the effectiveness of physicians’ actions, because it is generated not only by decisions taken by the physician, but also by patient behaviour and the different severity of their pathology and socioeconomic conditions.
This study proposes to measure the so-called ‘appropriateness’ of provided services, which depends only on the physician’s decision, as a proxy of effectiveness. Appropriateness means that the physician ensures that the patient receives the right services at the right moment. In particular, physicians are attempting to monitor and manage their patients’ glycaemic levels through activities such as more frequent consultations with patients, which may in turn prevent or delay complications from the disease. The use of appropriate medications along with algorithms of diabetes disease management could contribute to better primary care for patients with diabetes. Drugs such as metformin and insulin may be used to control the disease. The focus on more interaction with patients and/or the use of the drugs may also help practice physicians in preventing their patients from having costly and life-threatening admissions to acute care facilities such as hospitals.
In summary, the novelty of the proposed approach is twofold. First, physician performance is assessed from the point of view of the patient following a clinical pathway. Second, we assume not only an economic point of view, which is based on the number of services delivered, but also a clinical one, which is whether the pathway is appropriate and evidence based. This last consideration is particularly important, because concerns have been raised about the excessive costs due to variations in medical practice that result in inappropriate care being provided to patients by physicians.
Following a clinical pathway perspective requires collection of data at the individual level, because a clinical pathway can be conceived as an algorithm detailing all treatments to be performed for a patient with a given pathology, with logic based on sequential stages. In this study, datawere available thanks to the collaboration of GP-LIGUR.net, the Primary Care Observatory of Regione Liguria in Italy. Data were composed of clinical and prescription information for about 200 000 citizens and more than 140 physicians. The physician data were collected using the same software (www.millewin.it) between January 2010 and December 2011. Because some patients entered the physician panel towards the latter part of the 2010, they were followed up for one year, which is why we collected data until the end of 2011 (two years).
Family physicians collected all clinical information regarding their patients. This was done on a voluntary basis and did not represent a new reporting obligation on family physicians. In fact, they were not obliged to record data in order to be paid, because in Italy doctors are paid based on capitation and not activity.
This might be a drawback, because they may not have recorded all the information, but also an advantage because they were not incentivised to register prescriptions for opportunistic reasons. From the physician’s point of view, the only aim of registration was to follow his/her patient better by collecting all the relevant information. The data were therefore suitable for describing the appropriateness of the clinical pathways for patients with chronic illness requiring continuous monitoring by their physicians.
Data reliability was sufficiently guaranteed because the data were filtered and validated with the collaboration of the physicians who agreed a set of criteria identifying a ‘good compiler’ (e.g. the propensity to insert numerical values for the results of the prescribed examinations, the propensity to link the problem with the particular prescription, the propensity to record new problems using the internal ICD9 code and not a description only). Only the data from good compilers (96 of 140 physicians) was utilised for this study.
The final database included all the information needed to measure performance. Efficiency can be assessed using a large number of inputs and outputs. Among the different ways of evaluating primary care efficiency, we prefer defining inputs and outputs following Chillingerian et al, Ozcan and Amado et al,[1,2,11,15] which involves considering visits, hospital admissions and medications as inputs, and patients classified into severity levels as the particular output of the production process. With regard to effectiveness, for a chronic condition such as diabetes, once a person is diagnosed with the condition, the appropriate course is to control their disease, and prevent serious complications such as cardiovascular disease, kidney damage, blindness and lower limb amputation. From the available database it was possible to assess the extent to which the family physicians controlled each patient.
Variables used for performance evaluation
Following Chilingerian1 and Ozcan,[2–4] the inputs used for the study included the annual number of patient consultations with their physicians, the total annual hospital admissions for patients with diabetes (although data were collected for two years, data metrics were developed for each patient on a yearly basis for a given physician), and whether the physicians administered diabetes-related medication (metformin or insulin and other drugs) for a given patient during the year.
With regards to the outputs, our database mainly contains elderly patients aged over 65 years who were classified into three different levels of patient severity (i.e. low, medium and high), based on age and the presence of associated comorbidities (hypertension, heart failure) among patients with diabetes seen by the practices. In particular, the three severity classes were considered in the model for three different outputs defined as:
. low severity, < 65 years old with no comorbidities
. medium severity, 65 years or older with no comorbidities
. high severity, 65 years or older and comorbidities.
Using this method we were able to construct what resources (hospitalisation, drugs) were used for each patient in a year, and classify the patient to a severity category.
For primary care facilities, we defined effectiveness as the ability of the physician to follow an evidence-based programme along the clinical pathway for the care of patients with diabetes, rather than as a judgement about the final health status of the patients.
In the healthcare literature, this particular concept is usually called ‘appropriateness’. The underlying assumption is that the physician has done everything the clinical pathway prescribes for that particular patient. For instance, it is important that the physician checks that the patient does not smoke. Whether the patient eventually gives up smoking cannot be solely attributed to a physician’s performance.
The appropriateness of the patient clinical pathway is assessed through a set of indicators chosen by the general practitioners taking part in the study, which is in agreement with the Italian Standards for Diabetes Mellitus (www.aemmedi.it/files/Linee-guida_ Raccomandazioni/2007/2007-cura-diabete-mellito.pdf). They defined care pathway for diabetes is appropriate if, at least, the following parameters are checked:
haemoglobin alpha 1 (HbA1) once every 12 months
. creatinine, once every 15 months
. microalbuminuria, once every 15 months
. low-density lipoprotein (LDL), once every 15months
. smoking status (as a proxy for lifestyle).
More specifically, the paper uses a weighted ‘appropriateness index score’, which rewards physicians that provide a higher proportion of their diabetic patients with HbA1 (score of 1); HbA1, creatinine, microalbuminuria and LDL tests (score of 2.5); and HbA1, creatinine, microalbuminuria, LDL test and smoking assessment (score of 3). The values of these scores were agreed with the physicians taking part in the study and may be sensitive to different opinions. The sum of these scores was then standardised to a score of 0–1, with the practice with the highest weighted appropriateness index score being the reference for all the other practices.
Assessment of the efficiency of the production process, in our case of a primary care practice using healthcare services to monitor patients with diabetes, consists of computing the ratio between outputs and inputs and comparing it with a benchmark or ideal value. Models differ as to how one might determine this comparison value. In the literature, the peculiarity of health services production and difficulties encountered in this estimation have led to a search for so-called ‘nonparametric’ methods of efficiency evaluation, different from the traditional econometric analysis or stochastic frontiers (parametric methods).9 The usual choice is DEA, based on the economic concept of Pareto optimality,16 and developed by Charnes and colleagues17 as a linear programming problem.
Compared with parametric methods, the use of DEA has many advantages because it does not require the specification of a functional form and a distribution of the error, it may be applied to technologies that use multiple inputs for more than one output, and the benchmark can be identified among the other family practices in the sample. Consequently, applications of the DEA method in health have increased over the years, especially when focused on hospitals.8 In DEA models, the efficiency of a decision-making unit (DMU), which in our application is a family practice, is evaluated as the weighted ratio between outputs and inputs for a given physician relative to their peers in the evaluation. The physician is efficient, and is assigned a score of 1, if this ratio is greater than or equal to the corresponding calculated ratio, using the same system of weights, for the other physicians in the evaluation set. Each physician is allowed to choose the weights so as to maximise efficiency score. The only constraints are that the weights must be nonnegative and such as not to render > 1 the ratio for all the other physicians.
A detailed description of DEA methods and the calculations behind DEA have been described at length in the literature.11,18 We provide a brief explanation for the calculation of DEA efficiency scores here using mathematical notations adapted from Ozcan (pp. 24–56).11 The efficiency scores (yo) for a group of peer clinics (j = 1 ... n) are computed for the selected outputs (yrj, r = 1, ... s) and inputs (xij, i = 1, ... m) using the following fractional programming formula:
In this formulation, the weights for the outputs and inputs, respectively, are ur and vi, and ‘o’ signifies a focal clinic (to obtain efficiency scores, each clinic in turn becomes a focal clinic when its efficiency score is being computed relative to others). Note that the input and output values, as well as all weights are assumed by the formulation to be > 0. The weights ur and vi for each physician are determined entirely from the output and input data for all physicians in the peer group. Therefore, the weights used for each physician are those that maximise the focal physician’s efficiency score. In order to solve the fractional programme described above, it needs to be converted to a linear programming formulation. Because the focus of this paper is not on the mathematical aspects of DEA, an interested reader is referred to Ozcan11 for more detail on how the above equations are algebraically converted to a linear programming formulation. Other technical DEA books and papers may also be consulted for an in-depth exposure.18,19 In summary, the DEA identifies a group of optimally performing physicians that are defined as efficient, and assigns them a score of 1. These efficient physicians are then used to create an ‘efficiency frontier’ or ‘data envelope’ against which all other physicians are compared.
Note that the mathematical formulation assumes a so-called input-oriented DEA model, which focuses on the extent to which input quantities can be reduced without changing output quantities, whereas output models focus on an organisation’s attempt to maximise outputs without altering input quantities. In assessing physicians’ performance, we preferred the input-oriented model, because the assumption is that physicians have control over their inputs and not their outputs (patient severity based on co-morbidities).
DEA estimates can be obtained by utilising a constant returns to scale (CRS) or variable returns to scale (VRS). The CRS model assumes that there is a linear, proportional change in outputs for changes in inputs, whereas the VRS assumes that returns are dependent on changes in volume. Our model specification employs a VRS because the practices are different in size and hence cannot be assumed to have similar economies of scale. More specifically, the VRS model accounts for the possibility that different practices may have different proportions of change in output for a given amount of input based on characteristics such as the number of physicians employed by the practice.
The negative sign of the additional variable uo indicates increasing or diminishing returns if it is positive or negative, respectively.
In this study, two further specifications were used. First, because in the DEA approach the absence of an error term can create problems in measurements of errors or deviations, a bootstrapping method was used to analyse the sensitivity of the measured efficiency scores to sampling variation. For this reason, in accordance with the recommendations of Simar and Wilson,20 bias-corrected efficiency scores are also computed to overcome the inherent bias present in the construction of the DEA score of the practices. Using the FEAR command in the R statistical software package, the study used the homogenous bootstrap algorithm provided by Simar and Wilson with 1000 repetitions. The FEAR program was also used to generate the DEA scores for the standard model.
Second, DEA models do not prioritise certain inputs over others for a physician; for instance, how much they value the contribution of visits, drugs or hospital admissions. In the case of healthcare delivery, however, substitution among inputs can have different meanings, reflecting both different severity and also sometimes inappropriate actions. Clinical pathways are designed to constrain physicians to choose the right mix of inputs.
In the case of diabetes, particularly in the early stages of the disease, glycaemic control and a healthy lifestyle are sufficient, whereas in the second stage one begins to administer specific drugs (e.g. metformin), and hospital admissions in all cases should be avoided and used only in extreme cases, not being a substitute for other services. For this reason, the impact of substitution among inputs, concurrent use of a restricted model (RM) in addition to the use of a standard DEA model may allow delineation of different practice styles, that is the relative contribution in the preferred pathways of visits, drugs and hospital admission.[2,11] In particular, to obtain the efficiency scores of the restricted DEA models, upper (80th percentile) and lower (69th percentile) bound weights were assigned to physicians that preferred to administer medication over hospital admissions (RM1), preferred to have patients visit them over hospital admissions (RM2) and preferred to administer medication and have visits over hospital admissions (RM3). The literature suggests that if there is no a priori information for weight restrictions to confine physician practices to a norm (or practice pattern), then using percentiles or quartiles of the weight distributions generated by nonweight restricted DEA models is appropriate.2,11 The decision to use the 69th percentile for lower bound restriction and the 80th percentile for upper bound restriction was rendered after examining the weight structure of these variables and running the initial unrestricted model.
In summary, DEA models allowed the computation of a set of scores for each physician depending on the particular specification of the model (standard, bootstrapped and restricted). This assessment concerned only efficiency and not appropriateness; this comes down to the physicians’ choices regarding following specific types of evidence-based programmes tomanage the disease condition. Final performance was therefore assessed taking into account efficiency and appropriateness scores for each physician. The latter score was obtained with a post-hoc evaluation with regard to the effectiveness of physicians in delivering for their patients the appropriate clinical pathways following clinical guidelines.
A descriptive summary of the utilised database is given in Table 1. On average, in Italy, a primary care practice provides care for more than 120 450 patients older than 14 years, and about 8% are affected by diabetes mellitus.Most patientswith diabetes are elderly (73%) and diabetes is associated with at least one other chronic pathology in 77% of patients.
A summary of the efficiency scores, input, and output variables is presented in Table 2. Inputs are the average number of yearly contacts the physician had with their patients with diabetes, the number of hospital admissions in the previous five years and the percentage of prescriptions for metformin, insulin and other antidiabetic drugs. Outputs are the percentage of patients classified as being low, medium and high severity.
In the standard model (first row of Table 2), the mean input-oriented, VRS efficiency score of the sample practices was 0.86, with 35 practices (fewer than half) being efficient. In the bias-corrected model, mean efficiency fell to 0.78 and none of the practices was efficient (see Appendix for detailed information about individual practice efficiency scores and confidence intervals). A t-test comparing the standard efficiency and bias-corrected scores indicated that their means were significantly different (P < 0.001), although we focus on the standard efficiency score for convenience.
The restricted models had a lower mean efficiency score in relation to the standard efficiency scores, and RM2 had more efficient practices than RM1 and RM3, but still fell short in relation to the standard efficiency model.
In the post-hoc evaluation, practice (DMU) efficiency and standardised appropriateness scores were compared. Detailed scores for the first practices are presented in Table 3 to show that the efficiency score can be misleading without the appropriateness assessment.
With reference to the standard efficiency model, for example, Practice 3 is both efficient and appropriate,Practice 9 is efficient (score of 1), but poor at following medical guidelines with a score of only 0.04. Similar considerations may be relevant for the other efficient practices reported in Table 2. However, Practice 3 is the only one that performs best from both an efficiency and appropriateness point of view.
The results for the sample are presented in Figure 1, in which practices are dichotomised by their level of efficiency and effectiveness. With regard to the quadrants in the figure, practices with efficiency scores of 1 and effectiveness scores 0.90 were regarded as the best performers, whereas those with efficiency scores< 1 and effectiveness scores < 0.90 were classified as poor performers. Those with scores in only one area (i.e. efficiency or effectiveness) were categorised in quadrants that indicated their need to improve in the dimension in which they performed poorly.
As illustrated, Practices 3 and 24 had the best performance in regard to efficiency and appropriateness. Thirty-three practices needed to improve on appropriateness, but were efficient. The remaining practices (61) were poor performers in both efficiency and appropriateness scores. Therewere no practices in the lower right quadrant, where a practice may have had high appropriateness and low efficiency.
Additional information can be drawn from Table 3 with regard to the restricted models. In general, the restrictions were more severe for RM1 than for RM2; this means that it was more difficult to avoid hospital admissions by prescribing drugs (RM1) than by visiting the patient personally. Of course, when both restrictions were introduced (RM3) the average score was lower (0.52). It is also interesting to compare all the scores between the models. For instance, Practices 3 and 4 had high efficiency scores in RM2 also (1 and 0.98, respectively). This meant that they also used a correct mix of inputs (i.e. they did not resort to hospital admissions and, instead, took care of their patients at home, by visiting them). Among the 22 efficient practices in RM2, only one (Practice 3) had an appropriateness score of 1 – incidentally, this was also the only practice that had an appropriateness score of 1 among all the practices evaluated. Within the efficient practices in RM1, none had an appropriateness index score higher than 0.70; amongst the efficient practices in RM3, none had an appropriateness score higher than 0.52. Interestingly, these 11 practices had efficiency scores of 1 in the standard efficiency model, RM1, and RM2 as well. These practices, on average, tended to spend less effort with appropriateness programmes and had more patients of low and medium severity.
The availability of individual clinical pathway data and collaboration with primary care physicians provided a more complete performance assessment of family practices than previously reported in the literature, which has generally been limited to efficiency perspectives.
From an epidemiological point of view, the detected prevalence of diabetes of 7.9% was higher than the 4.9% reported on a national basis by the Istituto Superiore di Sanita` (www.iss.it). This may indicate how diabetes might evolve in the future, since in the Ligurian population aging has already reached a level that will be reached in other Italian regions in the next 20 years.
The second important result is the existence of great variability among family practices, with respect to the efficiency scores, and greater still in input mix and appropriateness scores. From the restricted models, it appeared that most physicians generally preferred to focus on medication than on visits and this leads to more hospital admissions.
The most surprising result, however, comes from inspection of Table 2. It is apparent that most of the inefficient practices were also inappropriate, and that in no case was high appropriateness accompanied by low efficiency. This seems to contradict the conventional notion that increasing appropriateness (and therefore quality) in healthcare delivery requires greater use of resources.
From the results, it also appeared that appropriateness and efficiency should be evaluated and assessed together, since healthcare delivery performance cannot be limited to efficiency alone. This requires that performance should be assessed in strict collaboration between economists and clinicians, to know both the appropriate input mix and the appropriate clinical pathway to compare with current practice behaviours. Furthermore, it would be interesting to include notions of clinical appropriateness from the patients’ perspective (e.g. quality of life measures) as well. Such a measure could help gauge the value of appropriate clinical pathways in relation to their impact on patients.
Assessing the performance of family practices shows that improvement is still needed. It is recommended that future work might include socioeconomic conditions as external variables to make differences in the performance of primary care practices more apparent, as practices facing challenges in their population may have greater difficulty in achieving good performance, due to external factors. Also, political intervention may be implemented to promote improvements. One of the first issues explored could be to promote the socalled medicine initiative, which is to give the physician an active role in intervention, especially where patient adherence to medicines is unsatisfactory.
DEA can also give provide insight into performance assessment in primary care practices, such as helping decision makers to detect problem practices and plan appropriate strategies for improvement fine-tuned case by case. This assessment is particularly timely because of the current debate taking place in Italy regarding the possible reform of primary care. It is recommended that reforms promote a shift in resources from hospitals to primary care. This would leave primary care practices operating continuously 24 hours a day, seven days a week, making physicians group together. Different dimensions of practices do not seem, however, to induce more efficiency.
The issue of how to achieve improvement remains unresolved. Most relevant in this case seems to be the incentive embodied in payment design. Currently, Italian family physicians are reimbursed by capitation, but this could be replaced by a mixed system, in which a quota based on the appropriateness of prescriptions is added to the fixed compensation. Of course, in order to implement incentive schemes such as ‘pay for performance’, it is necessary to share with physicians appropriate methods for assessing the efficiency and the quality of their activity, which is what was attempted in this paper.
Data were collected thanks to the collaboration with GP-LIGUR.net (Dr Pier Claudio Brasesco).