A grey zone for quantitative diagnostic and screening tests (2024)

Abstract

Background Most quantitative tests do not perfectly discriminate between subjects with and without a given disease and their results do not always allow certainty about disease status for diagnostic or screening purposes. We propose a method to construct a three-zone partition for quantitative tests to avoid the binary constraint of a ‘black or white’ decision that often does not fit the reality of clinical or screening practice. This partition intentionally includes a grey zone between positive and negative conclusions.

Methods and Results We show that the width of this grey zone depends on the difference between the means of test results for subjects with and without the disease, the variability of the test results and its components (biological, measurement), and the level of the misclassification risks (false positive, false negative) required by the context of use. We illustrate the method by application to the tuberculin skin test and iron deficiency markers in children.

Conclusion This method can be used both to display the discriminatory performance of a quantitative test in a variety of contexts and to scrutinize its components of variability. Due to the simplicity of the graphical representations, the grey zone approach may be useful during the development of quantitative tests and the publication of their performance.

Discrimination, diagnostic, screening, quantitative test, reliability, measurement, graphical method

Diagnostic and screening discrimination problems require a rule that enables a new subject to be assigned to the correct population, e.g. with or without a given disease, with the lowest rate (or cost) of misclassification. In the case of a quantitative test and a binary decision, the discrimination rule classifies as diseased a subject whose value is above (or below) an optimal cutpoint, determined under given constraints (e.g. rate or cost of false positives/negatives).1,2 However, since most tests do not discriminate perfectly between subjects with and without a given disease, certainty about disease status cannot be obtained for results within a given range of (intermediate) values. To deal with this problem, the construction of a three-zone partition, including a middle inconclusive zone of intermediate values has been proposed3 and applied to categorical and ordinal tests.4,5

In this paper, we extend the grey zone approach to quantitative diagnostic and screening tests, and illustrate it by application to the tuberculin skin test and iron deficiency markers in children. The graphical representations allowed by this approach are intended to help in the development of quantitative tests, and the evaluation and reporting of their measurement properties.

Construction of the grey zone for diagnostic or screening discrimination

Definition and construction of the grey zone

We define the grey zone of a quantitative test as an area of values where the discriminatory performance is ‘insufficient’, in the sense that a value in the grey zone does not allow the target disease to be scored as either present or absent. It is thus the range of values that do not eliminate uncertainty about the disease status.

In situations where perfect discrimination between subjects with and without a given disease is possible, e.g. some IgM-antibody serological tests, the construction of a grey zone is irrelevant. However, there is often a significant overlap between distributions of test results for subjects with and without the disease, and the grey zone may be wide. Its width obviously depends on the level of the overlap of distributions due to ‘true’ overlap but also on the measurement error. It also depends on the requirement of the clinical or screening context in terms of likelihood ratios (LR). Indeed, to confirm or exclude the presence of a target disease, high positive LR (LR+) and low negative LR (LR−) values are necessary to ensure post-test probabilities close to 1 or 0 for a range of pre-test probability values. The required degree of closeness of post-test probabilities to 1 or 0 depends on the context. Sometimes, post-test probabilities over 0.99 or even 0.999 (or under 0.01 or 0.001) are required to confirm or exclude the presence of a target disease. Examples include confirming the diagnosis of human immunodeficiency virus (HIV) infection to initiate antiretroviral therapy or to exclude Down’s syndrome by a prenatal screening test. Otherwise, clinicians or Public Health professionals may find slightly lower probabilities e.g. 0.95 (0.05) sufficient to decide whether a target disease is present.

Once the analysis of the context has provided suitable values of LR, the identification of the two cut-off points delimiting the grey zone is straightforward: one, gup, associated with the minimal desirable value of LR+; the other, glow, associated with the maximal desirable value of LR−. These cut-off points define an area of inconclusive values: the grey zone.

The tuberculin skin test

Results of the tuberculin skin test are important for care management decisions, and in particular whether to initiate antituberculous therapy in HIV-seropositive patients.6–9 Let us suppose that clinicians in a specialized AIDS unit want to use the tuberculin skin test to rule in or rule out a diagnosis of tuberculosis in a HIV-1 infected patient with signs of ‘probable’ tuberculosis, the pre-test probability being estimated to be between 0.30 and 0.50. Clinicians require post-test levels of (1) about 0.95 for the positive predictive value and thus, LR+ being over 44 to accept the hypothesis and treat the patient without further exploration; and (2) 0.05 for the negative predictive value and thus the LR− being under 0.022 to reject the diagnosis and seek another (these values are subjective probabilities provided by clinicians working in such AIDS units). The construction of a grey zone (Figure 1, panels A and B) for these LR values (using reference data on the distribution of tuberculin skin test results in healthy and tuberculosis-infected subjects10) gives the interval 7.8–16.6 mm. An immediate inspection of the width of the grey zone shows the poor discrimination ability of this test in the context considered: the grey zone corresponds to one-third of the range of possible values. Also, the usual cut-off point of 10 mm, shown in bold in Figure 1, is clearly located well within the grey zone.

The reticulocyte haemoglobin content test

Early identification of iron deficiency in children is important for prevention of several systemic complications including the impairment of mental and motor development. A recent study11 suggested that reticulocyte haemoglobin content (CHr) could be used for screening children for iron deficiency. We used the data reported to derive the distributions of CHr in iron-deficient children and children with satisfactory iron status: these distributions were largely overlapping (Figure 2, panel A). We constructed a grey zone for the main use of the test discussed by the authors: to detect iron deficiency, in order to treat it, in a population where the frequency of the disease is about 10%. To construct the grey zone, we considered that the negative predictive value should be below 0.1% (due to the severe developmental consequences of failure to detect), and this implies LR− being under 0.01. We also considered that the treatment, which has few side effects, could usefully be given if the post-test probability was over 0.70, which implies LR+ being over 20 (as in the first example, these values are subjective probabilities obtained by interviewing paediatricians). The grey zone constructed with such values of LR results in (Figure 2, panel B) the interval 22.0–28.2 pg i.e. more than one-third of the range of observed values. Note that the ‘optimal cut-off value’ suggested by the authors, 26 pg (based on receiver operating characteristic [ROC] curve analysis), is clearly inside the grey zone. Also note that the grey zone would have approximately the same limits for every value of LR+ ⩾ 20 and LR− ⩽ 0.1 (Figure 2, panel B). In this example, the grey zone width appears relatively insensitive to LR requisites.

The grey zone and diagnostic or screening discrimination

Clinicians need conclusive tests that allow the diagnostic process to be terminated as quickly but as confidently as possible.12 For quantitative tests, this means that cutpoints associated with very high positive LR+ and very low LR− should be identified to rule in or to rule out a diagnostic hypothesis.

Conclusive tests are also required for screening.13 There should be as few false negative cases as possible, and false positives are also unwelcome because they are often further investigated by more invasive or costly methods. The grey zone approach would allow a differentiated attitude towards results: definitely negative (no further action), most certainly positive (requiring verification) and ‘grey’ (requiring another test or a follow-up).

For a given clinical or screening context, and a range of estimated pre-test probability values, our method can therefore be used with a candidate quantitative test to construct a three-zone partition including the grey zone according to LR requisites. Similarly, it could help evaluation of the discriminatory performance of a test in various clinical or screening contexts with different LR requisites, and help to choose between several tests or thresholds in a given context.

Identifying the proportion of results that will fall within the grey zone will also help assess the usefulness of a test in practice (see below).

Entering the grey zone and analysing its determinants

The limits of the grey zone are determined by an analysis of the context and its requirements in terms of LR (according to the usual range of pre-test probabilities and targeted post-test probabilities). However, the mathematical management of LR is relatively complex, and dialogue with non-specialists about LR can be difficult.14 The presentation and discussion of the grey zone concept therefore needs to be as simple as possible. Fortunately, once the limits of the grey zone (gup and glow) have been determined by an analysis of LR, the reasoning and the interpretation of the grey zone may be pursued with concepts and values of sensitivity and specificity that are much more understandable by non-specialists.

Risks of misclassification and expected proportion of test values in the grey zone

The construction of a grey zone for a test therefore implies three possible responses: ‘positive’, ‘inconclusive’ or ‘grey’, and ‘negative’. A subject with the disease should ideally be classified as ‘positive’ and a subject without the disease as ‘negative’, and consequently there are four risks of misclassification (Figure 3):

  1. the risk of a subject with the disease being classified as ‘negative’, called λ (lambda) and determined by the value of the lower limit of the grey zone; λ is 1 minus the sensitivity of the test at glow;

  2. the risk of a subject without the disease being classified as ‘positive’, called υ (upsilon) and determined by the value of the upper limit of the grey zone; υ is 1 minus the specificity of the test at gup;

  3. the risk of a subject with the disease being classified as ‘grey’, called υ′ (as it mainly depends on the value of the upper limit of the grey zone); υ′ is 1 minus the sensitivity of the test at glow minus λ;

  4. the risk of a subject without the disease being classified as ‘grey’, called λ′ (as it mainly depends on the value of the lower limit of the grey zone); λ′ is 1 minus the specificity of the test at glow minus υ.

Estimating these risks is straightforward using a plot of sensitivity and specificity (Figure 4).

Proportion of results in the grey zone

The proportion of results in the grey zone is of utmost importance when considering the usefulness of a diagnostic or screening test. This proportion can be easily computed using collected (real) data. It can also be evaluated a priori. Indeed, the probability of a value being in the grey zone p(g), depends on the risks λ′ and υ′ and the probability of having the disease, p(D):

\[p(g)\ =\ {\lambda}{^\prime}[1\ {-}\ p(D)]\ +\ {\upsilon}{^\prime}p(D)\]

The tuberculin skin test

The grey zone determined above in the context of a probable diagnosis of tuberculosis (p(D) between 0.30 and 0.70) is 7.8–16.6 mm. These limits correspond (Figure 1, panel C) to υ = λ = 0.025, υ′ = 0.435 and λ′ = 0.295, and the expected proportion of values inside the grey zone would therefore be between 0.34 and 0.39.

The reticulocyte haemoglobin content test

The grey zone determined above in the context of screening in a population where p(D) is 0.10 is 22.0–28.2 pg. These limits give λ = 0.04, υ = 0.02, λ′ = 0.46 and υ′ = 0.83 and the expected proportion of values inside the grey zone would be 0.50 (Figure 2, panel C).

Width of the grey zone and its determinants

Once the requisites in terms of LR values have been determined by the analysis of the clinical or screening context, the grey zone can be constructed for the test under consideration. The width of this grey zone depends on the overlap of the distributions of test values for subjects with and without the disease, and in turn, on the difference of location and level of dispersion of these distributions.

Where normal distributions of test values can be obtained, possibly after transformation, the limits of the grey zone gup and glow can be expressed in a relatively simple analytical form (Appendix) and computed with the simplest parameters of the distributions of the test result:

\[g_{up}\ =\ \mathit{{\bar{X}}_{H}}\ +\ \mathit{z}_{{\upsilon}}\mathit{s}_{\mathit{H}}\ and\ g_{low}\ =\ \mathit{{\bar{X}}_{D}}\ {-}\ \mathit{z}_{{\lambda}}\mathit{s}_{\mathit{D}}\ =\ \mathit{{\bar{X}}_{D}}\ {-}\ \mathit{kz}_{{\lambda}}\mathit{s}_{\mathit{H}}\]

where H(sH) and D (sD = ksH) are the sample means (standard deviations) of the test results for subjects without and with the disease, respectively (we suppose the test gives higher values in subjects with the disease); and zυ and zλ are the upper (1 − υ)th and (1 − λ)th quantiles of the standard normal distribution.

The width of the grey zone is directly and positively dependent on the overall variability of the test results, the level of the risks υ (risk of false positive) and λ (risk of false negative); and negatively dependent on the true difference between the means of the distributions (Appendix).

The grey zone for two or several tests

second (or n + 1th) test by the post-test probability of the first (or nth test), to construct the partition. When the results of the tests are obtained simultaneously, a multidimensional graphical display of the grey zones can be constructed. We illustrate this case with two tests for screening for iron deficiency in children.

The authors of the study detailed above11 reported data for several markers of iron deficiency in children. We used these data to establish distributions in iron-deficient and healthy children, and further to construct the grey zones according to the screening context (pre- and post-tests probabilities and LR requisites as detailed above). Figure 5 shows the graphical representation of the grey zones for both CHr ([22.0–28.2 pg], see above) and mean corpuscular haemoglobin (MCH, [20.9– 28.8 pg]): the CHr grey zone is thinner than the MCH grey zone, for which the expected proportion of grey values is 0.92! Plotting the data for individuals (not available in this report) onto this Figure would have shown that the proportion of subjects ‘grey’ for both tests (in the central grey intersection zone) is smaller than that observed with each test individually.

The same approach can be used to evaluate the effect of the repetition of the same test. However, a more complete approach to repetition is presented in the next section.

The grey zone and the evaluation and minimization of the measurement error

Quantitative tests are subject to measurement imprecision due to the limitations of the observer or the method or both. Moreover, within-subject biological variation (e.g. postprandial or circadian) may be significant.15 Our method can also be used to display the reliability of the test and scrutinize the components of variance. Assessment of reliability is based on the analysis of variance and the computation of intraclass correlation coefficients (ICC).16–18 The normality of distributions and similarity of variances, which are generally obtained after a suitable transformation (usually log-transformation), allow the use of this approach.

The components of variance and the grey zone

The total variance of a test,

\({\sigma}^{2}_{\mathit{TOT}}\)

⁠, has two main components: a between-subject variance,

\({\sigma}^{2}_{\mathit{B}}\)

, i.e. the ‘true variability’ component; and a within-subject variance,

\({\sigma}^{2}_{\mathit{w}}\)

, i.e. the ‘measurement variability’ component

\(({\sigma}^{2}_{\mathit{TOT}}\ =\ {\sigma}^{2}_{\mathit{B}}\ +\ {\sigma}^{2}_{\mathit{w}})\)

.

Therefore, we can construct and delimit two sub-zones of uncertainty inside the grey zone, which reflect these two components:

(1) The first subzone, associated with

\({\sigma}^{2}_{\mathit{B}}\)

⁠, reflects the ‘true’ uncertainty due to the overlap of the distributions of biological nature. This dark grey zone is incompressible and inherent to the particular test. Its limits, glow,DARK and gup,DARK, are determined (Appendix) using the estimation

\(\mathit{s}^{2}_{\mathit{B}}\)

of

\({\sigma}^{2}_{\mathit{B}}\)

instead of

\({\sigma}^{2}_{\mathit{TOT}}\)

(or

\({\sigma}^{2}_{\mathit{H}}\)

):

\[g_{up,DARK}\ =\ \mathit{{\bar{X}}_{H}}\ +\ \mathit{z}_{{\upsilon}}\mathit{s}_{\mathit{B}}\ and\ g_{low,DARK}\ =\ \mathit{{\bar{X}}_{D}}\ {-}\ \mathit{kz}_{{\lambda}}\mathit{s}_{\mathit{B}}.\]

(2) The second subzone associated with

\({\sigma}^{2}_{w}\)

reflects the ‘measurement error’ including observer, instrumental and possibly biological components of variance (see below). Unlike the dark grey zone, this light grey zone may be limited by measurement organization (standardization, repetition, etc.). The width of this light grey zone depends on the absolute value of the between-subject variability (

\(\mathit{s}^{2}_{\mathit{B}}\)

), but also and especially on the value of the ICC (Appendix).

The graphical representation of the grey zone in reliability studies

As graphical representations of the grey zone are easily understandable, it would be convenient to couple its representation with the graphical method to evaluate reliability described by Bland and Altman (Appendix).19,20

Indeed simultaneous representation of the differences between assessments and the grey zone and its sub-zones would allow the reliability of a quantitative test to be analysed, and in particular, visualization of the proportion of subjects inside the grey zone.

The tuberculin skin test

A recent study21 evaluated the reliability of two techniques of tuberculin skin test measurement. The diameter of skin induration was measured along the long axis of the forearm both by the customary palpation method (P) or by the ballpoint-pen technique (BP).

The differences between the measures recorded by the two observers for both techniques in 69 patients with non-null values are shown in Figure 6 (panels P1 and BP1). There were relationships between the differences and the means, so that log-transformations were needed. The mean (SD) of differences on the log-scale was 0.01 (0.29) for palpation, and was −0.04 (0.25) for BP giving the limits of agreement shown in Figure 5 (panels P2 and BP2). The values of the ICC were 0.84 for palpation, and 0.88 for BP.

We estimated

\({\sigma}^{2}_{\mathit{B}}\)

⁠, the between-subject variance. It was similar for the two tests (sB = 0.47 on the log-scale). We used this estimation of

\({\sigma}^{2}_{\mathit{B}}\)

and the values υ = λ = 0.025 determined above, which are associated with minimal LR+ and maximal LR− values for use in an AIDS unit of an internal medicine department to compute the limits glow,DARK and gup,DARK. The dark grey zone for both tests was therefore: 2.10–2.59 on the log-scale (Figure 5, panels P3 and BP3), and 8.2–13.3 mm on the natural scale (Figure 6, panels P4 and BP4). The light grey zone width was 0.15 (1.6 mm) for palpation, and 0.10 (1.1 mm) for BP, on the log-scale (natural scale).

Further uses of the grey zone: to scrutinize and minimize the components of variance

As the within-subject variance may include inter-observer, intra-observer, instrumental, and possibly biological components of variance, different Bland and Altman analyses of the differences in measurement are therefore possible: difference between observers, between evaluations for a single observer, between times for a single subject, etc. These analyses can be coupled to the construction of subzones of uncertainty, reflecting each component of the variability of the test.

If we consider a test with inter- and intra-observer (or residual) components of variability (as for example, in the tuberculin skin test analysis presented above) the within-subject variance

\({\sigma}^{2}_{\mathit{W}}\)

may be decomposed into two components:

\({\sigma}^{2}_{\mathit{W}}\ =\ {\sigma}^{2}_{\mathit{INTER}}\ +\ {\sigma}^{2}_{\mathit{INTRA}}\)

. It can be shown (Appendix) that the ratio of the width of the sub-zone associated with inter-observer variability, wLIGHT/INTER, to the width of the light grey zone is equal to the ratio of the standard deviations:

\(\frac{\mathit{w}_{\mathit{LIGHT/INTER}}}{\mathit{w}_{\mathit{LIGHT}}}\ =\ \frac{\mathit{s}_{\mathit{INTER}}}{\mathit{s}_{\mathit{W}}}\)

. The width of the subzone associated with intra-observer variability is calculated by subtraction.

The magnitude of the components of variance as displayed by the width of their associated sub-zones may help optimize strategies to limit measurement error.

(1) The mean of measures of a repeated test can be used, instead of individual values, to decrease the intra-observer component of the within-subject variability, and therefore to shrink the compressible light grey zone.

(2) Using a sole observer allows the ‘measurement component’ of variability to be decreased to the intra-observer variability. The intra-observer reliability value is considered as the ‘asymptotic value’ when there are several observers who are similarly trained and experienced with the measurement method.

The tuberculin skin test

For the ballpoint technique, a two-way random effect analysis of variance allowed

\(\frac{\mathit{s}_{\mathit{INTER}}}{\mathit{s}_{\mathit{W}}}\)

to be estimated to be 0.97, suggesting that the main variability in the test is the inter-observer variability. The sub-zones associated with the inter- and intra-observer variability components of the test (still for use in an HIV unit) are shown in Figure 7 (panel 1): 97% of the light grey zone width corresponds to inter-observer variability.

The consequences for the light grey zone of using the mean of two measures was tested. The shrinkage of the grey zone was minor (Figure 7, panel 2), a result which was expected, because the intra-observer variability constituted only a small proportion of the within-subject variability. In contrast, using a single observer allowed the light grey zone to be reduced substantially, and to limit the width of the grey zone to close to the dark incompressible component (Figure 7, panel 3).

Discussion

We propose a method to construct a three-zone partition for quantitative test results. This partition intentionally includes a grey zone between positive and negative conclusions about the condition tested; this grey zone is defined according to the requirements of the clinical or screening context in terms likelihood ratios (LR). Its width depends on the difference between the means of test results for subjects with and without the disease, the variability of the test results, and the level of the misclassification risks (false positive, false negative) required by the context of use. The visual aspects of this method are useful: (1) for discrimination, as they help in the choice of the limits of the zones according to the context; and (2) to assess the components of variability of a quantitative diagnostic test. Due to its simplicity and its graphical representations, we hope it will be useful during the development of diagnostic and screening tests.

Above all, our approach allows the binary constraint of a ‘black or white’ decision to be avoided, as this is often inappropriate to clinical or screening practice. A test result falling in the grey zone is not uninformative as it could lead one to seek further evidence, thereby transforming the test result from a decisive to contributory role. Several controversies concerning suitable thresholds for quantitative tests would have probably been avoided if such an approach had been used. A good example is the recent debate concerning the change in the criteria for the diagnosis of type 2 diabetes, and the shift in the threshold from 7.8 mmol/l to 7.0 mmol/l of fasting plasma glucose.22

Our approach also provides a complementary or alternative representation to effect scores23 and especially to ROC curves for the evaluation of the discriminatory performance of a quantitative test and the choice of thresholds. The conventional ROC curves give symmetrical parts to sensitivity and specificity, and only recent refinements of the ROC curve methodology have dealt with unequal costs of misclassification; however, these refinements are complex.24

Another advantage of our method is that it gives a visual representation of both the relationship between the width of the grey zone and the range of possible values, and the proportion of observations within this zone. This can be done by coupling the grey zone construction to the Bland and Altman method to assess reliability, a method now familiar to many clinicians and biologists. A quantitative test whose grey zone width contains one-third or a half of observed values (as was the case for the two examples) is obviously of little value in practice.

In assessing reliability by this method, the light grey zone reflects the ‘measurement component’ of variability in a given design. Thus, the subzones give a simple representation of the components of variance of a measurement method. In the absence of transformation, the width of the compressible light grey zone is proportional to the within-subject standard deviation for the design. A simultaneous representation of the light grey zone and the limits of agreement provided by the Bland-Altman method exploit this proportionality.

The main difficulty in implementing the grey zone approach is determining appropriate values of LR. This involves analysis of the clinical or screening context (expressed in terms of pre-test probabilities) and requirements (expressed in terms of post-test probabilities) and may be difficult. In particular, pre-test probabilities may vary according to the epidemiological context, the care facility, information already gathered about diagnostic or risk factors, and other factors; furthermore ‘subjective probabilities’ produced by clinicians or experts may be unreliable. (Post-test probabilities requisites may also vary, albeit to a lesser extent.) The rule of thumb proposed by the Evidence-Based Medicine group i.e. to consider LR+ over 10 and LR− below 0.1 as indicating conclusive tests25 may be used as a first approximation although much higher/lower values of LR+/LR− (however seldom attained by current screening or diagnostic tests) would be required in many contexts. Another approach would be to consider the sensitivity of the LR values and the resulting limits of the grey zone associated with various scenarios or hypotheses. A two-way sensitivity analysis, varying pre- and post-test probabilities simultaneously and studying the effect on LR should be performed. The location of the resulting interval of values on the LR curves would further indicate the stability of the grey zone limits: their location in (or near) the straight vertical parts of the LR curves would be reassuring (as in our second example, see above). Sensitivity analyses would also allow the stability of the grey zone limits to be tested when empirical data concerning the test are limited and cannot provide reliable estimates of LR (i.e. when the confidence interval for LR are large) and/or do not include many cut-off points.

Another limitation of this method is its reliance on several assumptions for evaluation and minimization of the measurement error. The use of analysis of variance and ICC requires: the distributions of the test results to be normal in both healthy and diseased subjects; and the measurement error to be constant across the range of test values. Logarithmic transformations may in general allow these requirements to be satisfied, but render the computation more complex and assessment of the graphical representations less immediate. Further investigation with non-parametric ICC is needed before the grey zone can be adapted for the evaluation and minimization of the measurement error, when distributions cannot be normalized or measurement cannot be made constant across the range of test values. For a simple application to evaluation of diagnostic or screening discrimination, no assumption is necessary: the grey zone construction only requires plotting both LR+ and LR− against the values of the test. Otherwise, the methodology is non-specific and the recommendations of Reid et al.26 must be followed to avoid the various biases (spectrum bias, verification bias, review bias) affecting the evaluation of the performance of screening and diagnostic tests.

In conclusion, our method allows simple graphical representation of both the discriminatory performance and the components of variability of quantitative diagnostic and screening tests. These representations may be useful supports during the development, evaluation and publication of the performances of such tests.

Appendix

Limits and width of the grey zone

Let X and Y be the interval-scaled results of a candidate diagnostic or screening test in subjects without and with the disease, XNH,

\({\sigma}^{2}_{\mathit{H}}\)

⁠), and YND,

\({\sigma}^{2}_{\mathit{D}}\)

). If we suppose μD > μH, the values of glow and gup, the lower and upper limits of the grey zone, determined by previous LR analysis, may be written as follows:

\[g_{up}\ =\ {\mu}_{\mathit{H}}\ +\ \mathit{z}_{{\upsilon}}{\sigma}_{\mathit{H}}\ and\ g_{low}\ =\ {\mu}_{\mathit{D}}\ {-}\ \mathit{z}_{{\lambda}}{\sigma}_{\mathit{D}}\]

where zυ and zλ are the (1 − υ)th and (1 − λ)th quantiles of the standard normal distribution (Figure 4). Replacing population values of means and standard deviations by their sample estimates, we obtain:

\[g_{up}\ =\ \mathit{{\bar{X}}_{H}}\ +\ \mathit{z}_{{\upsilon}}\mathit{s}_{\mathit{H}}\ and\ g_{low}\ =\ \mathit{{\bar{X}}_{D}}\ {-}\ \mathit{z}_{{\lambda}}\mathit{s}_{\mathit{D}}\]

If we let Δ = μD − μH, and σD = kσH, the width, w, of the grey zone is:

\[\mathit{w}\ =\ g_{up}\ {-}\ g_{low}\ =\ (\mathit{z}_{{\upsilon}}\ +\ \mathit{kz}_{{\lambda}}){\sigma}_{\mathit{H}}\ {-}\ {\Delta}\]

(1)

Components of variance and the light and dark grey zones

The variance of a test,

\({\sigma}^{2}_{\mathit{TOT}}\)

⁠, is decomposed into a between-subject variance,

\({\sigma}^{2}_{\mathit{B}}\)

, and a within-subject variance,

\({\sigma}^{2}_{\mathit{W}}\)

:

\({\sigma}^{2}_{\mathit{TOT}}\ =\ {\sigma}^{2}_{\mathit{B}}\ +\ {\sigma}^{2}_{\mathit{W}}\)

.

Let ρI be the one-way random effect intraclass correlation coefficient (ICC)16

\({\rho}_{\mathit{I}}\ =\ \frac{{\sigma}^{2}_{\mathit{B}}}{{\sigma}^{2}_{\mathit{TOT}}}\ =\ \frac{{\sigma}^{2}_{\mathit{B}}}{{\sigma}^{2}_{\mathit{B}}\ +\ {\sigma}^{2}_{\mathit{W}}}\)

which we suppose is identical in subjects with and without the disease (note that a log transformation leading to a measurement error independent of the magnitude of the measurement may, in general, allow this assumption to be satisfied). Replacing σH of equation (1) by

\({\sigma}_{\mathit{TOT}}\ =\ \frac{1}{\sqrt{{\rho}_{\mathit{I}}}}{\sigma}_{\mathit{B}}\)

, the width of the grey zone becomes:

\[\mathit{w}\ =\ \left[(\mathit{z}_{{\upsilon}}\ +\ \mathit{kz}_{{\lambda}})\frac{1}{\sqrt{{\rho}_{\mathit{I}}}}{\sigma}_{\mathit{B}}\right]\ {-}\ {\Delta}\]

(2)

When ρI → 1 or σW → 0, wwDARK = (zυ + kzλB − Δ which is the incompressible dark grey zone, the limits of which are:

\[g_{up,DARK}\ =\ {\mu}_{\mathit{H}}\ +\ \mathit{z}_{{\upsilon}}{\sigma}_{\mathit{B}}\ and\ g_{low,DARK}\ =\ {\mu}_{\mathit{D}}\ {-}\ \mathit{kz}_{{\lambda}}{\sigma}_{\mathit{B}}\]

The width of the compressible light grey zone is therefore:

\[\mathit{w}_{\mathit{LIGHT}}\ =\ \mathit{w}\ {-}\ \mathit{w}_{\mathit{DARK}}\ =\ (\mathit{z}_{{\upsilon}}\ +\ \mathit{kz}_{{\lambda}})\ (\frac{1}{\sqrt{{\rho}_{\mathit{I}}}}\ {-}\ 1){\sigma}_{\mathit{B}}\]

(3)

Note that since

\({\sigma}_{\mathit{B}}\ =\ \frac{{\rho}_{\mathit{I}}}{\sqrt{1\ {-}\ {\rho}_{I}}}{\sigma}_{\mathit{W}}\)

we can also express the light grey zone width as a function of σW:

\[\mathit{w}_{\mathit{LIGHT}}\ =\ (\mathit{z}_{{\upsilon}}\ +\ \mathit{kz}_{{\lambda}})\frac{1\ {-}\sqrt{{\rho}_{\mathit{I}}}}{\sqrt{1\ {-}\ {\rho}_{\mathit{I}}}}{\sigma}_{\mathit{W}}\]

(4)

The estimations of gup,DARK, glow,DARK, wDARK and wLIGHT require prior computation of the estimated component of variance

\({\sigma}^{2}_{\mathit{B}}\)

(or

\({\sigma}^{2}_{\mathit{W}}\)

) and the ICC

\({\hat{{\rho}}}_{\mathit{I}}\)

. This can be done by a one-way random effect analysis of variance.16,17

The Bland and Altman method and the grey zone

The Bland and Altman method is based on the construction of a residual-like plot of the difference between the results of two measures against their mean. The mean and standard deviation sd of differences between pairs of repeated measurements are combined to define the limits of agreement ± 2sd, which correspond to a 95% range for the difference between two repeated measurements. The method assumes that sd is constant across the range of measurements, and, in the frequent case of the measurement error being proportional to the mean, requires a log-transformation: the limits of agreement antilogged back into the natural scale give a range of proportional agreement between repeated measurements.

Since

\(\mathit{s}_{\mathit{d}}\ =\ {\surd}2\ \mathit{s}_{\mathit{W}}\)

⁠,19 there is a proportional relationship between the interval between the limits of agreement, 4sd, and the estimated width of the ‘light’ compressible grey zone.

Components of variance and the inter- and intra-observer light grey zones

The within-subject variance

\({\sigma}^{2}_{\mathit{W}}\)

is decomposed into an inter-observer component,

\({\sigma}^{2}_{\mathit{INTER}}\)

, and an intra-observer component,

\({\sigma}^{2}_{\mathit{INTRA}}\)

:

\({\sigma}^{2}_{\mathit{W}}\ =\ {\sigma}^{2}_{\mathit{INTER}}\ +\ {\sigma}^{2}_{\mathit{INTRA}}\)

.

If we let

\({\tau}\ =\ \frac{{\sigma}^{2}_{\mathit{INTER}}}{{\sigma}^{2}_{\mathit{INTER}}\ +\ {\sigma}^{2}_{\mathit{INTRA}}}\)

⁠, which we will assume is identical in healthy and diseased subjects, the width of the light grey zone becomes:

\[\mathit{w}_{\mathit{LIGHT}}\ =\ (\mathit{z}_{{\upsilon}}\ +\ \mathit{kz}_{{\lambda}})\left[\frac{1\ {-}\ \sqrt{{\rho}_{\mathit{I}}}}{\sqrt{1\ {-}\ {\rho}_{\mathit{I}}}}\right]\left(\frac{1}{\sqrt{{\tau}}}\right){\sigma}_{\mathit{INTER}}\]

(5)

When σINTRA → 0 or τ → 1,

\[\mathit{w}_{\mathit{LIGHT}}\ {\rightarrow}\mathit{w}_{\mathit{LIGHT/INTER}}\ =\ (\mathit{z}_{{\upsilon}}\ +\ \mathit{kz}_{{\lambda}})\left[\frac{1\ {-}\ \sqrt{{\rho}_{\mathit{I}}}}{\sqrt{1\ {-}\ {\rho}_{\mathit{I}}}}\right]{\sigma}_{\mathit{INTER}}\]

(6)

Thus

\(\frac{\mathit{w}_{\mathit{LIGHT/INTER}}}{\mathit{w}_{\mathit{LIGHT}}}\ =\ \frac{{\sigma}_{\mathit{INTER}}}{{\sigma}_{\mathit{W}}}\ =\ \sqrt{{\tau}}\)

: the ratio of the width of the sub-zone associated with inter-observer variability to the width of the light grey zone is equal to the ratio of the standard deviations. The width of the subzone associated with intraobserver variability can be easily calculated by subtraction.

The estimation of wLIGHT/INTER requires prior computation of both the estimated components of variance

\({\sigma}^{2}_{\mathit{INTER}}\)

and

\({\sigma}^{2}_{\mathit{INTRA}}\)

and the values of

\({\hat{{\sigma}}}_{\mathit{B}}\)

,

\({\hat{{\sigma}}}_{\mathit{W}}\)

and

\({\hat{{\rho}}}_{\mathit{I}}\)

. A two-way random effect analysis of variance is therefore necessary.16,17

KEY MESSAGES

  • Most quantitative tests do not perfectly discriminate between subjects with and without a given disease.

  • We propose a method to construct a three-zone partition for quantitative tests which intentionally includes a grey zone between positive and negative conclusions.

  • This method allows the binary constraint of a ‘black or white’ decision to be avoided, as this is often inappropriate to clinical or screening practice.

  • This method can be used both to display the discriminatory performance of a quantitative test in a variety of contexts and to scrutinize its components of variability.

A grey zone for quantitative diagnostic and screening tests (1)

Figure 1

Panel A: Histograms of tuberculin skin test results (non-null values) in subjects with (n = 3826) and without tuberculosis (n = 643 694) according to Rose et al.10 Panel B: Construction of the grey zone for the tuberculin test for LR+ = 44 and LR− = 0.022, using a plot of both LR− and LR+ for different values of the test. Panel C: Determination of the risks of misclassification associated with the grey zone for the tuberculin test using a plot of both sensitivity and specificity; υ = 0.025, λ = 0.025, υ′ = 0.435, λ′ = 0.295

Open in new tabDownload slide

A grey zone for quantitative diagnostic and screening tests (2)

Figure 2

Panel A: Histograms of reticulocyte haemoglobin content (CHr) results in children with (n = 43) and without iron deficiency (n = 167) drawn from data reported by Brugnara et al.11 Panel B: Construction of the grey zone for the CHr test for LR+ = 20 and LR− = 0.01, using a plot of both LR− and LR+ for different values of the test. Panel C: Determination of the risks of misclassification associated with the grey zone for the CHr test using a plot of both sensitivity and specificity λ = 0.04, υ = 0.02, λ′ = 0.46, υ′ = 0.83 (In this example, where test values are lower in diseased subjects, the risks λ and λ′ depend on the value at the upper limit of the grey zone and the risks υ and υ′ on the values at the lower limit)

Open in new tabDownload slide

A grey zone for quantitative diagnostic and screening tests (3)

Figure 3

Interpretation of the grey zone for a quantitative test. The area under the curve of probability density for subjects without the disease over gup, the upper limit of the grey zone, represents the risk υ; the area under the curve of probability density for subjects with the disease under glow, the lower limit of the grey zone, represents the risk λ. The risk of a subject with the disease being classified as ‘grey’ is υ′ and the risk of a subject without the disease being classified as ‘grey’ is λ′

Open in new tabDownload slide

A grey zone for quantitative diagnostic and screening tests (4)

Figure 4

Determination of the risks of misclassification υ, λ, υ′, λ′ associated with the grey zone for a quantitative test using a plot of both sensitivity and specificity

Open in new tabDownload slide

A grey zone for quantitative diagnostic and screening tests (5)

Figure 5

Grey zones for reticulocyte haemoglobin content (CHr) and mean corpuscular hemoglobin (MCH) results in children with (n = 43) and without iron deficiency (n = 167) drawn from data reported by Brugnara et al.11 In this example, test values are both lower in diseased subjects. Due to the strong positive correlation of the tests, two of the nine possible combinations of results, (+/−) and (−/+), are very unlikely

Open in new tabDownload slide

A grey zone for quantitative diagnostic and screening tests (6)

Figure 6

Plots of the difference between measures of two observers against the average of measures, n = 69 pairs of non-null measures by palpation (P) and the ballpoint-pen method (BP). TST means tuberculin skin test21

Upper panels (P1 and BP1) use the original scales (mm). Upper middle panels (P2 and BP2) use log-transformed values (base e). The horizontal lines indicate the mean difference and mean difference ± 2 standard deviations of differences. Lower middle panels (P3 and BP3) show the grey zones and subzones superimposed (dark grey for the dark grey zone, and light grey for the light grey zone) for υ = λ = 0.025 in log-scales.

Lower panels (P4 and BP4) show the grey zones and subzones superimposed (as above) for υ = λ = 0.025 in the original scale.

Open in new tabDownload slide

A grey zone for quantitative diagnostic and screening tests (7)

Figure 7

Dark grey zones and subzones of the light grey zones superimposed on plots of difference between measures against the average of measures, for the ballpoint-pen method. TST means tuberculin skin test. Computations were conducted on log-transformed values (not shown) and results were antilogged back to natural scales

Upper panel (1) is the same as panel BP4 of Figure 5, but shows the relative parts that are intra-observer and inter-observer components of variability of the light grey zone. Lower left panel (2) shows the influence on the grey zone of using means of two repeated measures for each observer to minimize intra-observer variability. Lower right panel (3) shows the influence on the grey zone of using a single observer to avoid inter-observer variability.

Open in new tabDownload slide

References

1

Begg CB. Advances in statistical methodology for diagnostic medicine in the 1980’s.

Stat Med

1991

;

10

:

1887

–95.

2

Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine.

Clin Chem

1993

;

39

:

561

–77.

3

Feinstein AR. The inadequacy of binary models for the clinical reality of three-zone diagnostic decisions.

J Clin Epidemiol

1990

;

43

:

109

–13.

4

Simel DL, Samsa GP, Matchar DB. Likelihood ratios for continuous test results, making the clinicians’ job easier or harder?

J Clin Epidemiol

1993

;

46

:

85

–93.

5

Jamart J. Chance-corrected sensitivity and specificity for three-zone diagnostic tests.

J Clin Epidemiol

1992

;

45

:

1035

–39.

6

Subcommittee of the Joint Tuberculosis Committee of the British Thoracic Society. Guidelines on the management of tuberculosis and HIV infection in the United Kingdom.

BMJ

1992

;

304

:

1231

–33.

7

Pape JW, Jean SS, Ho JL, Hafner A, Johnson WD Jr. Effect of isoniazid prophylaxis on incidence of active tuberculosis and progression of HIV infection.

Lancet

1993

;

342

:

268

–72.

8

Bass JB Jr, Farer LS, Hopewell PC et al. Treatment of tuberculosis and tuberculosis infection in adults and children.

Am J Respir Crit Care Med

1994

;

149

:

1359

–74.

9

De co*ck KM, Grant A, Porter JD. Preventive therapy for tuberculosis in HIV-infected persons: international recommendations, research, and practice.

Lancet

1995

;

345

:

833

–36.

10

Rose DN, Schechter CB, Adler JJ. Interpretation of the tuberculin skin test.

J Gen Intern Med

1995

;

10

:

635

–42.

11

Brugnara C, Zurakowski D, DiCanzio J, Boyd T, Platt O. Reticulocyte hemoglobin content to diagnose iron deficiency in children.

JAMA

1999

;

281

:

2225

–30.

12

Kassirer JP, Kopelman RI. Learning Clinical Reasoning. Baltimore: Williams & Wilkins,

1991

.

13

Morrison AS. Screening. In: Rothman KJ, Greenland S (eds). Modern Epidemiology, 2nd Edn. Philadelphia: Lippincott, Williams & Wilkins,

1998

.

14

Reid MC, Lane DA, Feinstein AR. Academic calculations versus clinical judgments: practicing physicians’ use of quantitative measures of test accuracy.

Am J Med

1998

;

104

:

374

–80.

15

Healy MJ. Measuring measuring errors.

Stat Med

1989

;

8

:

893

–906.

16

Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability.

Psychol Bull

1979

;

86

:

420

–28.

17

Müller R, Büttner P. A critical discussion of intraclass correlation coefficients.

Stat Med

1994

;

13

:

2465

–76.

18

Bland JM, Altman DG. A note on the use of the intraclass correlation coefficient in the evaluation of agreement between two methods of measurement.

Comput Biol Med

1990

;

20

:

337

–40.

19

Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies.

The Statistician

1983

;

32

:

307

–17.

20

Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement.

Lancet

1986

;

i

:

307

–10.

21

Pouchot J, Grasland A, Collet C, Coste J, Esdaile JM, Vinceneux P. The reliability of tuberculin skin test measurement.

Ann Intern Med

1997

;

126

:

210

–14.

22

Davidson MB, Schriger DL, Peters AL, Lorber B. Relationship between fasting plasma glucose and glycosylated hemoglobin: potential for false-positive diagnoses of type 2 diabetes using new diagnostic criteria.

JAMA

1999

;

281

:

1203

–10.

23

Blakeley DD, Oddone EZ, Hasselblad V, Simel DL, Matchar DB. Noninvasive carotid artery testing. A meta-analytic review.

Ann Intern Med

1995

;

122

:

360

–67.

24

Hilden J, Glasziou P. Regret graphs, diagnostic uncertainty and Youden’s Index.

Stat Med

1996

;

15

:

969

–86.

25

Jaeschke R, Guyatt G, Sackett DL and the Evidence-Based Medicine Working Group. Users’ guides to the medical literature. III. How to use an article about a diagnostic test. Are the results of the study valid?

JAMA

1994

;

271

:

389

–91.

26

Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good.

JAMA

1995

;

274

:

645

–51.

© International Epidemiological Association 2003

A grey zone for quantitative diagnostic and screening tests (2024)

FAQs

What does grey mean on quest lab results? ›

Definition and construction of the grey zone. We define the grey zone of a quantitative test as an area of. values where the discriminatory performance is 'insufficient', in. the sense that a value in the grey zone does not allow the target. disease to be scored as either present or absent.

What is diagnostic grey zone? ›

The decision about the disease is not certain at intermediate values in a given range. To deal with this uncertainty, a middle inconclusive area, which is indicated as the gray zone in this article, is proposed [10] and adapted for categorical data [11], ordinal data [12], and quantitative data [13].

What is grey zone in medical terms? ›

We define the grey zone of a quantitative test as an area of values where the discriminatory performance is 'insufficient', in the sense that a value in the grey zone does not allow the target disease to be scored as either present or absent.

What is the difference between a screening test and a diagnostic test? ›

Screening tests are intended for asymptomatic (showing no or disguised symptoms) people, whereas diagnostic tests are intended for those showing symptoms in need of a diagnosis.

How fast are Quest diagnostic results? ›

Getting your results

Most test results are available within 2-5 days of collection. Yet some complex testing may take 14 days or more. Quest sends completed test results directly to the ordering doctor for review. Results will also be available in your MyQuest account as tests are completed.

What does it mean to be in the GREY zone? ›

a situation in which it is not clear whether something is legal or illegal, acceptable or not acceptable, etc.: There is a considerable grey zone in the law when it comes to online publishing.

How do you diagnose GREY zone lymphoma? ›

You might have swollen lymph nodes just above your collar bone, but most people do not have swollen nodes in other places. It is usually diagnosed when it is at an early stage. In fewer than 1 in 3 cases, grey zone lymphoma starts outside the chest. This is sometimes called 'non-mediastinal grey zone lymphoma'.

What is an example of a GREY zone? ›

According to Vincent Cable, examples of grey-zone activities include undermining industrial value chains or oil and gas supplies, money laundering, and the use of espionage and sabotage.

What are grey zone issues? ›

The 2020 Defence Strategic Update describes the grey zone as, "activities designed to coerce countries in ways that seek to avoid military conflict... paramilitary forces, militarisation of disputed features, exploiting influence, interference operations and the coercive use of trade and economic levers."

What does it mean to have gray areas? ›

: an area or situation in which it is difficult to judge what is right and what is wrong.

What is Hodgkin's disease in the GREY zone? ›

Gray zone lymphoma (GZL) is a term used to represent cases of malignant lymphoma which cannot be distinctively classified as either Hodgkin's or non-Hodgkin lymphoma and has been denoted as being an intermediate group between Primary Mediastinal B-cell Lymphoma (PMBL) and classical Hodgkin's lymphoma (CHL), nodular ...

Can a screening test give you a diagnosis? ›

A screening test (sometimes termed medical surveillance) is a medical test or procedure performed on members (subjects) of a defined1 asymptomatic population or population subgroup to assess the likelihood of their members having a particular disease. With few exceptions, screening tests do not diagnose the illness.

Why is screening not diagnostic? ›

The medical tests used for screening purposes are often not suitable for making a final diagnosis. Instead, many tests are used to detect any abnormalities first, which are then looked at more closely in other tests. This is the case for stool tests done in bowel cancer screening, for example.

What are 3 diagnostic tests? ›

Diagnostic tests
  • Biopsy. A biopsy helps your doctor diagnose a medical condition. ...
  • Colonoscopy. During a colonoscopy, a tube is inserted into the anus to view the inside of the large bowel. ...
  • CT scan. ...
  • Electrocardiogram (ECG) ...
  • Electroencephalogram (EEG) ...
  • Gastroscopy. ...
  • Eye tests. ...
  • Hearing test.

What do the different colors mean on blood test? ›

Each bottle's tests are the same: the purple one is for cell count, the yellow one is for electrolytes, albumin and LDH, the grey one is for glucose, and blood culture bottles can be used for fluid cultures.

What lab test will be run when you draw blood into a gray tube? ›

Gray top tube with potassium oxalate/sodium fluoride: used for lactic acid testing and other plasma or whole blood determinations. Yellow top tube with ACD (acid citrate dextrose) Solution A or B: used for whole blood determinations including flow cytometry and tissue typing assays.

How do you read lab report results? ›

On your test report, you may see these terms:
  1. Negative or normal. This means "No, the test didn't find what it was looking for." ...
  2. Positive or abnormal. This means, "Yes, the test found what it was looking for." ...
  3. Inconclusive or uncertain. This means "not sure."
Oct 24, 2023

Why can't i see my Quest lab results? ›

Sometimes incomplete or out-of-date demographic information gets sent with a lab test order. When this occurs, Quest Diagnostics can't always match a lab test result to your Quest Account, and in such cases, we can't send a lab test result to MyQuest.

Top Articles
Latest Posts
Article information

Author: Errol Quitzon

Last Updated:

Views: 6652

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.