Professor of Primary and Prehospital Health Care, Community and Health Research Unit (CaHRU), University of Lincoln, UK
Steve Gillam MD FFPH FRCP FRCGP
Department of Public Health and Primary Care, Institute of Public Health, University of Cambridge, UK
Received date: 8 July 2013 Accepted date: 31 July 2013
This is the fourth in a series of articles about the science of quality improvement. We examine what to measure, how to measure and some important measurement techniques, such as run charts, controlcharts and funnel plots. These help us to understand healthcare processes, to assess whether they are stable or improving and to determine how they can be improved further.
control charts, funnel plots, general practice, primary care, quality improvement, run charts, statistical process control
This, the fourth in our series of articles on quality improvement tools and techniques, examines what to measure, how to measure it and techniques of measurement for improvement. Previous articles in the series have considered: frameworks for improve-ment, understanding processes and how to improve them, and most recently leadership and management for improvement.[2-4] Everything we do can be seen as part of a process. The structure and process of care - for example, how people work, their work methods, the equipment and materials used, the work environ-ment - as well as the outcomes of care are important measures for evaluating quality. Measurement is itself a process which not only helps us to assess other processes, but which can also be used to drive im-provement. The techniques we discuss will help us understand whether processes are stable, improving or deteriorating and the extent to which they can be improved further.
What to measure
What we measure depends on what outcome we wish to achieve and therefore which parts of the process we should improve to do this. There are different types of measure. We can select particular criteria (also called audit or review criteria), standards and indicators. The latter may include quality indicators, performance indicators or clinical performance indicators, depend-ing on what is being measured and why.
A criterion is a measurable aspect of quality (struc-ture, process or outcome) of care and has been defined as ‘a systematically developed statement that can be used to assess the appropriateness of specific health-care decisions, services and outcomes’. An example of a criterion is that every patient diagnosed with hyper-tension should have had their blood pressure recorded within the previous six months. This is translated into a measure: the proportion of patients with hyperten-sion who have a blood pressure recorded within the previous six months (usually expressed as a percent-age). If the criterion is based on research evidence which directly links it to improved outcomes it is sometimes referred to as a ‘review criterion’. For example, patients with hypertension should have a latest blood pressure reading (measured in the pre-ceding six months) of 150/90 mmHg or less.
The level achieved for the criterion is compared with a standard, i.e. what should be achieved. The standard is the threshold of expected compliance for the criterion. Standards are usually derived from con-sensus opinion (either local or from a wider group) or are based on a previous audit. Less commonly, a standard may be based on published evidence about levels of performance that lead to improved outcomes. An example here is the standard required to achieve herd immunity for measles, mumps and rubella vac-cination of 95% of the population.
How to measure
Different types of improvement project use different types of measurement, but not all are equally useful or informative. Measurements are generally of three types: before-and-after, continuous or comparative.
Clinical audits characteristically employ before-and-after measurements where standards are compared before and after an intervention. The advantage of this approach is that it provides the analyst with a target to aim for and it is usually simple to analyse and present data. A disadvantage is that the standard may be arbitrary. This may lead to gaming or unintended consequences. Furthermore, and most importantly, a change comparing a single measurement before and after an intervention may be an artefact of measure-ment rather than demonstrating a real improvement.
By contrast, quality-improvement projects tend to use continuous data-measuring processes, either recorded as counts or proportions (percentages). Rather than two measurements, i.e. before-and-after, multiple measurements are taken before, during and after the intervention has taken place. Relatively simple statistical methods are then used to analyse whether the process is showing a natural (or random) variation over time, and if so, can demonstrate the extent of this variation and whether real improvement (over and above natural variation) is occurring.
The advantage of this approach is that it helps us understand whether real improvement has taken place and can demonstrate the extent of this improvement. It avoids interpreting natural variation as real change; and it enables us to see the effects of multiple inter-ventions over time. The disadvantage of this method is that measurements need to be taken repeatedly during the process of change and some basic analytical con-cepts and techniques need to be learned (Table 1).
Measurement, whether using simple counts (e.g. numbers of referrals to hospital), rates (e.g. propor-tion of patients with a particular condition referred to hospital) or other more complex continuous vari-ables, is also a process that can introduce variation. Therefore, it is important that data using samples or whole populations are gathered in a careful and consistent way.
Finally, we may wish to compare performance for different groups or organisations with the aim of comparing the best performers with the worst. The traditional method of doing this has been to represent the performance of each organisation on a bar chart that ranks the highest with the lowest or vice versa. Unfortunately, this method may prevent a clear dif-ferentiation being made between high- and low-performing organisations/groups. If they are aware of the method of presentation, it may lead them to aim for a middle rank in such a table, where they are less likely to be noticed. This pursuit of mediocrity can be prevented by using funnel plots to compare organis-ations. Funnel plots are a special type of control chart that compare different organisations rather than a single organisation over time (see below).
Every measure of a process, a combination of pro-cesses or an outcome will show variation over time. Variation is therefore part of any process. It is inevi-table, and ubiquitous, but is amenable to measurement and control. If we want to demonstrate improvement it is essential that we select the key variables to measure quality in terms of outputs or outcomes that will signify improvement.
The natural variation in a stable process unaffected by the external factors affecting it or attempts to improve it is called ‘common cause variation’. We see common cause variation in, for example, repeated measures of blood pressure. These may be due to changes in the physiological state of the individual, subtle differences in the technique of measurement or in the response to the measuring instrument.
For other measures, such as prescribing rates, these may vary over time due to differences in patient case-mix; between prescribers and in their prescribing behaviour. Similarly, referral rates, vaccination rates, or indeed any other measure of health processes, organisations or systems will also vary over time due to variation in the process itself or in the process of measurement.
Variation which falls outside the ‘common cause variation’ is termed ‘special cause variation’. As its name implies, ‘special cause variation’ is caused by an ‘external’ factor, whether this is planned or unplanned, intended or unintended. Analysing variation over time involves using statistical techniques, but the simplest way of analysing and representing such vari-ation involves a technique called statistical process control (SPC), developed by Walter Shewhart at Bell Laboratories in the 1920s and championed by WE Deming and Joseph Juran, Davis Balestracci and many others since.[4,8-10] Table 2 summarises the differences between common and special cause variation.
Any improvement in the healthcare process requires a change in a process to reduce the effect of ‘common cause variation’ and to trigger a ‘special cause vari-ation’ which will represent a significant improvement. However, responding to common cause variation as though it is special cause variation has the opposite effect to that which is intended. It may actually increase variation in the system. This is called ‘tampering’. An example of tampering is when an organisation re-sponds to a single reduction (or increase) in a measure before checking that the change is due to common cause variation.
Table 3 summarises how we should and should not respond to the different types of variation. A special cause strategy calls for investigation and explanation, which will sometimes lead to specific responses depend-ing on the special cause identified. Common cause variation requires a different approach. A common cause strategy first requires us to explore the variation more closely using stratification to reveal any special causes. Next, one should seek to understand variation through the processes and systems which cause a problem. Finally, we should redesign processes to reduce inappropriate and unintended variation in an agreed measure and a way that is responsive to patients’ needs.
Statistical process control
Run charts are the simplest way of plotting data over time. Data for a particular indicator are plotted as dots (data points) on a simple graph with time plotted on the x-axis and the value of the indicator plotted on the y-axis. The time intervals should be ordered and sequential, but not necessarily equal. They are often regularly spaced but need not be. At least 16 dots are usually required to see if a process is stable. The dots are connected by lines and a median line is drawn. Figure 1 is a run chart showing hypnotic prescribing data for a single general practice.
A ‘run’ is a sequence of dots above or below the median. Common cause variation is represented in a run chart as runs randomly distributed about the median. One way of conceptualising this is that, in common cause variation, the chance of a run being above or below the median is the same as the chance of throwing heads or tails with a coin. Three simple statistical rules have been developed to show whether there has been a significant change in a measure over time, i.e. a special cause variation.
These rules are helpful because they prevent indi-viduals or groups just ‘eyeballing’ a chart of measure-ments over time and misinterpreting them. Following certain rules leads to consistent interpretation of what constitutes a significant change over time. This also prevents an inappropriate response to common cause variation as if it were a special cause.
The three rules that identify the most important types of special cause variation are: shifts, trends and runs (shown in Box 1). A shift is a sequence or ‘run’ of seven dots above or below the median. A trend is a sequence of seven dots all going upwards or downwards (dots on the same level are excluded from the count).
An example of a shift shown in Figure 2 is the rate of hypnotic drug prescribing in another general practice. The run chart shows a sequence of 25 dots. There are 11 dots below the median from January 2008, indicating a shift.
The final rule refers to ‘runs’ which give the ‘run chart’ its name. Runs should be randomly distributed about the median when there is only common cause variation. Therefore, we can calculate whether there are the right numbers of runs (between upper and lower limits) depending on how many dots there are in total in the chart which gives a probability table for runs (see Table 4). In Figure 2 there are only four runs when we would expect between 10 and 16 according to Table 4.
A control chart is a more sophisticated form of run chart. The relationship between a run chart and control chart has been described as analogous to that between an X-ray and a magnetic resonance imaging (MRI) scan. The latter is more sensitive at detecting abnormalities, but also more complex and requires greater resources. The principles of its construction and interpretation are very similar.
Figure 3 is a control chart showing hypnotic pre-scribing data for a single general practice and corre-sponds to the run chart in Figure 1. Again, data for a particular indicator are plotted as dots on a simple graph with time plotted sequentially on the x-axis and the value of the indicator plotted on the y-axis. The time intervals should be sequential. They are often regularly spaced but need not be. The dots are con-nected by lines but this time a mean line is drawn (Figure 3).
In addition, the control chart has two further lines: the upper and lower control limits. They differ from confidence intervals and should not be confused with them. The control limits are lines representing three standard deviations above and below the mean. The only slight complication here is that the mean and standard deviations should be calculated according to the type of data, i.e. normal distribution for biological variables such as blood pressure, Poisson distribution for count data and binomial distribution for yes/no or percentage performance data.
Confidence intervals are different from control limits. They represent a range of effect sizes around an odds or risk ratio for a study. Confidence levels are usually set at 95%. Stated simply, this means that if a study were to be repeated 100 times, 95 times out of a 100 the effect size would fall between the 95% confidence intervals.
Common cause variation in a control chart is shown in Figure 3 (which shows the same data as the run chart in Figure 1). All the dots fall within the upper and lower control limits and are randomly distributed about the mean. Control charts are more sensitive than run charts in detecting significant change over time. There are more rules for determining significant changes over time in a control chart, but again three basic rules identify the most important types of special cause variation: points outside the control limits, shifts and trends.
Figure 4 is a control chart showing hypnotic pre-scribing data for a single general practice correspond-ing to the run chart in Figure 2. In Figure 4 the second dot in the sequence falls above the upper control limit. A shift is a sequence of seven dots above or below the mean (as shown in Figure 4). A trend is a sequence of seven dots all going upwards or downwards (dots on the same level are again excluded from the count). The number of dots for a shift or trend varies from six toeight depending on the text and the total number of data points. A number of other rules can help to provide further signals that a significant change may have occurred. These are additional, more sensitive, rules but they are also more likely to cause false positive signals and need to be interpreted with caution. This practice has significantly reduced its hypnotic drug prescribing since December 2007.
As well as looking at an indicator over time, control charts can also be used to compare organisational units at a single point or during a fixed period. In this type of control chart, organisational units are arranged on the x-axis with their performance as a count, rate or proportion (or percentage) on the y-axis. The mean is represented and control limits are calculated for each organisational unit based on all the data provided (Figure 5).
In Figure 5 we represent data on the performance of ambulance stations as an organisational unit. In each case, a team of paramedics, delivered care for patients with acute myocardial infarction (AMI) during a single month. Each dot represents an ambulance station. Performance is measured as the delivery of a process ‘care bundle’ for AMI. A care bundle is an all-or-none measure where every eligible patient with AMI should receive aspirin, glyceryl trinitrate, pain assessment and analgesia, unless there is a valid exception. The delivery of the care bundle can vary from 0 to 1 (i.e. 0 to 100%). The samples provided by each station were small which led to wide control limits (0 and 1) for most stations. The mean perform-ance was 44.4% which meant that the care bundle was delivered to just over two in every five patients.
The control limits are denoted again by the dashed line vary for each station. If data are arranged accord-ing to the size of the sample denominator provided by each organisational unit, this produces a funnel plot (Figure 5). In Figure 6 we see AMI care bundle per-formance for 12 larger regional ambulance services in England. Each service is represented by a dot labelled 1 to 12. The sample denominator is now greater but varies from a few cases (service 12) to over 200 cases (service 7) in the month that performance is meas-ured. The mean performance across all the services is again around 45%. The control limits, which are joined by a smooth line, are wider for services with small samples of AMI and become narrower as the sample size increases. This produces the characteristic funnel shape of the control limits. Because this looks like the bell of a trombone, funnel plots are sometimes referred to as ‘trombonograms’.
In the chart one can immediately see that most trusts are contained within the control limits. The processes in these trusts are delivering ‘average’ care defined by these control limits. Four trusts show performance either above or below the control limits. These trusts show significantly different performance, either higher or lower, than other trusts and this special cause requires further investigation to under-stand why this might be the case. The investigation may reveal a difference in the system of care providing this outcome.
This article has tried to provide an introduction to measurement for improvement. In it, we have explained how we can measure processes and why repeated measurement over time is the key to understanding variation in a process and how to improve quality. We have also introduced readers to the principles of constructing and interpreting run and control charts and how to respond to common or special cause variation. More information on SPC is available from a number of excellent articles and books.[4,12-17]
Commissioned; not externally peer reviewed.