**Purpose**:
To develop a Bayesian model (BM) for visual field (VF) progression accounting for the hierarchical, censored and heteroskedastic nature of the data.

**Methods**:
Three versions of a hierarchical BM were developed: a simple linear (Hi-linear); censored at 0 dB (Hi-censored); heteroskedastic censored (Hi-HSK). For the latter, we modeled the test variability according to VF sensitivity using a large test-retest cohort (1396 VFs, 146 eyes with glaucoma). We analyzed a large cohort of 44,371 VF tests from 3352 eyes from five glaucoma clinics. We quantified the bias in the estimated rate-of-progression, the detection of progression (Hit-rate [HR]), the median time-to-progression and the prediction error of future observations (mean absolute error [MAE]). HR and time-to-progression were compared at matched false-positive-rate (FPR), quantified using permutations of a separate test-retest cohort (360 tests, 30 eyes with glaucoma). BMs were compared to simple linear regression and Permutation-Analyses-of Pointwise-Linear-Regression. Differences in time-to-progression were tested using survival analysis.

**Results**:
Censored models showed the smallest bias in the rate-of-progression. The three BMs performed very similarly in terms of HR and time-to-progression and always better than the other methods. The average reduction in time-to-progression was 37% with the BMs (*P* < 0.001) at 5% FPR. MAE for prediction was very similar among methods.

**Conclusions**:
Bayesian hierarchical models improved the detection of VF progression. Accounting for censoring improves the precision of the estimates, but minimal effect is provided by accounting for heteroskedasticity.

**Translational Relevance**:
These results are relevant for quantification of VF progression in practice and for clinical trials.

^{1}

^{–}

^{3}

^{4}

^{,}

^{5}Moreover, VF sensitivity is measured over a limited range. On one of the most commonly used devices, the Humphrey field analyzer (HFA; Zeiss Meditec, Dublin, CA, USA), the scale ranges from 50 dB to 0 dB. This scale is inversely related to the brightness of the stimulus, with 0 dB being the brightest. Such a lower bound on the measurement is completely arbitrary, and different cutoff values have been proposed.

^{6}

^{,}

^{7}Nevertheless, a limited measurement scale is bound to produce censored data. However, values censored at 0 dB are often considered to be actual 0 dB for the scope of analysis. This can introduce positive biases in the measured progression rate, especially in VFs with more advanced damage, in which a trail of 0 dB values can arise in locations progressing beyond the measurement limits.

^{8}

^{,}

^{9}Finally, it is often difficult to efficiently combine pointwise measurements to obtain a combined, easily interpretable, progression score for the whole VF without losing the rich information from individual locations.

^{10}

^{,}

^{11}Pointwise data have been combined using both scoring systems

^{10}

^{,}

^{12}

^{,}

^{13}and multilevel models,

^{14}

^{,}

^{15}almost invariably improving modeling performance. Data censoring has been dealt with only occasionally. Other workarounds have been proposed to deal with the 0 dB limits, such as asymptotic modeling through exponential decay,

^{13}

^{,}

^{16}

^{,}

^{17}which, however, do not reflect the underlying nature of the data.

^{10}

^{,}

^{15}our method also accounts for clustering of locations within the VF according to the anatomy of RGC axon bundles.

^{1}We measure the incremental improvement provided by censoring and heteroskedasticity by comparing different implementations of the model in terms of bias in the estimated progression rate, prediction error and clinical detection of progression using a large longitudinal cohort of 3352 eyes (44,371 VF tests). All these models are then benchmarked against other commonly available methods to assess VF progression.

^{18}

^{–}

^{20}In short, all patient data were anonymized at the point of data extraction and subsequently transferred to a single secure database held at City, University of London. Subsequent analyses of the data were approved by a research ethics committee of City, University of London. The study adhered to the Declaration of Helsinki and the General Data Protection Regulation of the European Union. All VFs were recorded on the HFA using a Goldmann size III stimulus with a 24-2 test pattern and the Swedish Interactive Testing Algorithms (SITA Standard or SITA Fast). The aggregated database contained 576,615 VFs from 71,361 people recorded between April 2000 and March 2015. We excluded any eye for which the EMR contained ocular surgery other than cataract removal during the follow-up period. We also excluded VFs with a percentage of false-positive errors ≥15%. No exclusion criteria were applied based on fixation losses or false-negative errors. For this study, we selected all patients with at least 10 VFs recorded over at least four years in one or both eyes and a mean deviation worse than −2 dB in at least two (not necessarily consecutive) VFs

^{21}

^{–}

^{23}in the same eye. It seems likely that subjects with this level of damage and frequency of VF testing were either strong glaucoma suspects or persons with glaucomatous optic neuropathy. Finally, only one eye from each patient was selected, at random if both were eligible. The final selection included 44,371 VFs from 3352 eyes. Demographic details are reported in Table 1.

^{24}

^{,}

^{25}This dataset was used to obtain a model linking pointwise response variability with sensitivity (described in the next paragraph). Data were collected from stable eyes with primary open angle glaucoma at Moorfields Eye Hospital (reference 13/NS/0132) upon written informed consent. The data collection was in accordance with the Declaration of Helsinki. The final dataset used for this analysis was composed of 1396 test repeats performed in 146 eyes of 75 subjects. The number of test repeats per eye was 10 [7, 10] (median [95% quantiles]), with a minimum of 3 for inclusion. All tests were performed with a HFA 24-2 grid, SITA Standard strategy and a G-III stimulus size over an average time period of nine (maximum 13) weeks. The second was the HALIFAX dataset,

^{26}as provided in

*visualFields*package

^{27}for R (R Foundation for Statistical Computing, Vienna, Austria). This dataset is composed of 12 VF test repeats from 30 eyes of patients with glaucoma and was used to create stable series through permutations to quantify the false-positive discovery rate (FPR = 1 – specificity, see section on model testing). The tests were performed with a HFA 24-2 grid, SITA Standard strategy and a G-III stimulus size over a time period of 12 weeks. Demographic characteristics for both datasets are reported in Table 1.

^{28}Then, for each location, we defined the median of the test-retest values as the best available estimate (BAE) of the true sensitivity for a given location. This gives an estimate of the central value of the test-retest of the distribution that is not affected by censoring (as opposed to the mean). When the median was 0 dB, we assumed that the BAE for the underlying threshold was not available and the location was removed from the analysis (5.1% of the tested locations).

*crch*for R.

^{29}

^{,}

^{30}The model allows one regression equation for the mean and one for the log(standard deviation [SD]). The mean was not modeled because the fitting was performed by including an offset term for each observation equal to the corresponding BAE of the true sensitivity. The log(SD) was modeled as a function of sensitivity. A good fitting was obtained with a third-degree polynomial. The coefficients for the polynomial model are reported in Appendix. However, for simplicity and speed of calculation in the implementation of our Bayesian model, we adopted an approximation where the relationship between log(SD) and sensitivity was linear in log-scale, with a maximum capping value (bilinear with one slope fixed at zero). This strategy has been previously adopted when implementing similar models of variability for simulation of perimetric responses.

^{31}To estimate the capping value, the model for the log(SD) was a broken stick linear relationship where the break point was varied until the slope of the relationship for sensitivities lower than the break-point was null (i.e., no change). The coefficients for the linear approximation are reported in Figure 1. This relationship is in good agreement with the linear relationship described by Henson et al.

^{4}obtained through direct measurements of the psychometric function. Figure 1A allows a direct comparison of the polynomial and approximated model of variability. For reference, we performed the calculations using the same methodology but modeling each (rounded) sensitivity value as a discrete factor level, so that a value of SD was calculated independently for each sensitivity value. This can then be compared with the naïve calculation of SD, which shows a clear bias at lower sensitivity, where the naïve calculation underestimates variability because of the censored values (Fig. 1B).

^{32}) using

*rjags*package for R.

^{33}A separate model was fitted for each eye. We developed three versions of the hierarchical model. In its simplest form, the model was a linear regression of pointwise values over time (Hi-linear). The fixed effects were the global intercept and slope for the change of sensitivity over time. Hierarchical random effects on both intercepts and slopes were then added to model the change over time for different VF clusters (according to the map described by Garway-Heath et al.

^{1}) and for each location within a cluster. The prior distributions for the fixed effects were noninformative normal. The mean of these prior distributions and the starting points for the Markov Chain Monte Carlo (MCMC) algorithm were obtained from the fixed effect estimates of the same model fitted through ML (using the package

*lme4*for R

^{34}). The random effects for the intercepts and slopes were modeled as bivariate noninformative normal distributions with zero mean. JAGS was then used to run the MCMC algorithm until convergence was achieved (Gelman-Rubin

^{35}diagnostic ≤ 1.2 on two parallel chains) to numerically estimate the posterior distribution of the parameters. More details on the fitting procedure can be found in Appendix.

^{15}the posterior distribution for the global slope (fixed effect) was used to assess the global progression of the VF in each eye. The progression score (P-score) to assess progression was obtained by taking the value of the empirical cumulative distribution function of the posterior distribution of the global slope at 0 dB/year (no progression). In frequentist terms, the P-score is similar in interpretation to a one-sided

*P*value for the slope: the closer to 1, the stronger the evidence for progression; a perfectly stable series would yield a P-score of 0.5; a P-score < 0.5 indicates a positive slope. An example is provided in Figure 2. One advantage of Bayesian inference over ML is that a posterior distribution can be estimated also for the random effects, making it possible to assess progression for individual clusters and locations using the same methodology. In other words, a P-score could be calculated in the same fashion using the posterior distributions of the random effects for the slope at the cluster and location level (see supplementary material for an example). Different cutoffs for the P-score were explored (see paragraph on model testing – clinical detection of progression). A second version of the model (Hi-censored) was identical to the one described, but the error distribution was a normal censored at 0 dB. This is similar in concept to a Tobit regression.

^{8}However, the hierarchical mixed-model approach greatly overcomes the major limitation of unstable fitting results when only few uncensored values are available for specific locations. A third version of the model (Hi-HSK) also used a censored normal error distribution for sensitivity, but the SD of the error distribution was heteroskedastic and linked to the estimated sensitivity using the linear approximation reported in Appendix and previously described. Variability in this model was therefore imposed as a fixed relationship and not estimated, similarly to previous work.

^{10}

^{12}The method combines the

*P*values for the slopes of pointwise linear regression (PLR) equations fitted to each location into a statistic S, using the Truncated Product Method. A customized null hypothesis distribution is then generated by calculating the S statistics on random permutations of the VF tests in the series and a

*P*value for the S statistic is calculated. For our analysis, we used both the calculated S statistics and its

*P*value. PoPLR only provides statistics for global progression. We used the implementation of the method provided by the

*visualFields*package.

^{27}

*P*value of the slopes from these equations was used to assess progression. These can be directly compared with the P-scores for the global slopes, the cluster slopes, and the location slopes from the Bayesian hierarchical models.

^{36}Differently from the original method, the expected variability for different levels of sensitivities was not based on the empirical cumulative distribution function of the pointwise regression residuals but on a polynomial equation fitted to the HALIFAX test-retest database using the same methodology used to model variability in the RAPID dataset. This equation is also reported in Appendix. We finally compared, for each Bayesian model, the slopes estimated from the shifted series with the slopes obtained from the original series. Ideally, they should be equivalent, that is, the model least affected by censoring should give central estimates for the shifted series similar to the original series.

*pROC*for R,

^{37}up to a minimum specificity of 90%. Confidence intervals for the HR and the partial area under the HR curve (pAUC) were calculated with the bootstrap procedure implemented in the pROC package. Time to detect progression was quantified using survival curves for different specificity levels (97.5%, 95%, and 90%) using the package

*survival*for R.

^{38}Progression was assessed with a minimum of 4 VF tests on series truncated at a progressively increasing number of tests, up to a maximum of 10 tests. For each step in the series, all models were applied, and progression was detected on basis of the cutoff chosen for a given specificity level. The time to progression was then recorded as the earliest time from baseline where progression was detected. Formal comparisons between models were performed using a Cox proportional hazard model with a cluster term to account for the fact that progression was assessed on the same eye (VF series) with multiple methods. Note that survival analyses also account for the different number of actually observed progression events because eyes that did not progress within the observation window are considered as censored data.

*P*values were corrected for multiple comparisons using the Bonferroni-Holm method.

^{39}No statistical test was performed on the pAUCs because specificity for the HR curves was estimated using resampled permutated series that do not constitute an actual experimental sample.

*P*value was significantly different from simple linear regression only with 8 and 10 VF tests (

*P*< 0.001). Complete tables with partial areas under the curve and corresponding CIs are available as supplementary material.

*P*< 0.001) and PoPLR performed better than simple linear regression (

*P*< 0.001). All hierarchical models performed very similarly and the only significant difference was detected at 97.5% specificity for the Hi-linear when compared to the Hi-censored (

*P*< 0.001) and Hi-HSK (

*P*= 0.0261) models. Table 2 reports the results of the survival analysis on global progression and the HR at three specificity levels.

^{10}also used a Bayesian approach to model VF progression. This method, called ANSWERS, was more focused on modeling heteroskedasticity (discussed later), but the parameter used to detect progression was a combination of the results obtained at different locations to calculate an estimated “probability of no progression.” In the evaluation of the method, the differential effect of modeling heteroskedasticity and combining locations was not quantified. Interestingly, when compared to simple linear regression, we obtained an improvement in performance very similar to that of ANSWERS for all the hierarchical models tested in our analysis, re-enforcing the idea that the strength of these methods comes from efficient combination of pointwise information. Importantly, we used the same dataset as Zhu et al.

^{10}to obtain stable VF series to determine specificity, so our results can be easily compared. Of note, ANSWERS did not use a multilevel approach to obtain an estimate of the global progression slope but rather relied on a combination of pointwise regression models fitted individually (although linked through spatial correlations). This is similar in principle to the S statistics used by PoPLR, which also improved the clinical performance over simple linear regression in our dataset.

^{14}

^{,}

^{15}Betz-Stablein et al.

^{15}in particular used the estimated posterior distribution on the global slope to assess progression. In their analysis, the hierarchical model did not show a large improvement over pointwise methods. However, they relied on clinical judgment to assess specificity in their calculations instead of permutations of stable series. Their results are therefore difficult to compare to ours. Nevertheless, modeling the global progression rate has additional advantages besides practical progression detection because it has a meaningful and direct interpretation for clinicians, and this is an additional strength of our approach.

^{4}

^{,}

^{5}This relationship is often explained with a change in the psychometric function

^{4}

^{,}

^{5}at more damaged locations. Although this is certainly an incomplete characterization,

^{5}sensitivity has been shown to be the best predictor of variability in glaucoma.

^{4}

^{,}

^{5}Henson et al.

^{4}modeled the psychometric function with a cumulative Gaussian function and reported the change of log(SD) with sensitivity, down to 10 dB. We adopted a similar modeling approach using a large test-retest perimetric dataset. The coefficients of our linear approximation were in good agreement with the results from Henson et al.

^{4}(Appendix). Importantly, our approach was meant to model the expected variability around the predicted sensitivity, allowing us to rely on previous knowledge of the psychophysics of perimetric responses. In contrast, ANSWERS used test-retest data to model the variability of the observed response,

^{10}effectively using the estimated variability as a weighting method for the observations. However, in our case, modeling heteroskedasticity only improved the results in the detection of progression for individual clusters and locations (Fig. 5) but did not improve the global performance.

^{9}

^{,}

^{13}

^{,}

^{16}

^{,}

^{17}However, few have addressed the actual censored nature of the data. This is important, because it implies that the actual sensitivity could extend beyond the measurements limits. This aspect is willingly neglected by methods adopting an asymptotic modeling of the floor effect,

^{13}

^{,}

^{16}

^{,}

^{17}creating problems in the estimation and interpretation of the rate of progression. These methods are particularly problematic when considering that the censoring level is completely arbitrary and can be changed without any bearing on the measurements recorded above the chosen limit.

^{6}This is not captured by asymptotic models, such as the exponential decay, in which the estimated rate of loss is tightly linked to the relative distance of the observations from the measurement limit. On the other hand, considering censored values as actual measurements of 0 dB sensitivity can introduce a positive bias in the estimated rate of loss with simple linear models. This is demonstrated by our first experiment on artificially shifted series (Fig. 3). It is important to notice that, at least in a dataset drawn from real clinics, such as the one used in this analysis, such a bias had little bearing on the ability of the models to predict future sensitivity, as demonstrated by our quantification of the MAE for predictions (Fig. 6), because a significant effect is only obtained when a large number of trailing 0 dB observations accumulate in a series (see example in Fig. 2). A similar result was reported by Bryan et al.

^{9}

^{15}also accounted for the censored nature of the data. In contrast, ANSWERS used a transformation of 0 dB sensitivity because such a value would be undefined in a Weibull error distribution (the one used in their model).

^{10}This approach allowed modeling of the observed test-retest variability, but would still be affected by a bias of trailing 0 dB values. However, despite reducing the estimation bias, accounting for censoring did not greatly improve the clinical performance (Fig. 4 and Fig. 5). This result could have multiple explanations, including the fact that our clinical dataset was mostly composed of people with early loss (see Table 1). This is representative of many glaucoma clinics, but in our case the number of people with early damage could have been inflated by our exclusion of eyes undergoing glaucoma surgery before 10 VFs could be collected. However, the most likely explanation is that models accounting for censoring correctly interpret censored observations as being less informative. This increases the uncertainty around the final estimate of the rate of progression, despite reducing the bias. Instead, noncensored models assume that complete information can be extracted from 0 dB measurements, leading to less uncertainty in the estimates. This is clearly illustrated by the example in Figure 2: both censored models provide more negative estimates of global progression, but the P-score is almost identical to that obtained with the Hi-linear model on account of the greater uncertainty. The comparison between the Hi-linear and the Hi-censored model is particularly useful, because they constitute two implementations of the same model that only differ for their handling of censored data. Other practical solutions could be applied to increase the dynamic range of VF testing itself at the lower end, for example, by using larger stimulus sizes for more advanced stages of damage.

^{40}

^{25}

^{,}

^{41}One drawback is that clusters are modeled as hard-edged groups instead of “smooth” correlations. Therefore, proximity of locations within the same cluster does not affect the correlation structure and adjacent locations in different clusters are modeled as completely independent. However, this also allowed us to avoid complex correlation structures in the model, greatly reducing the number of parameters and therefore improving efficiency, as opposed to previous similar attempts.

^{10}

^{,}

^{15}Moreover, when compared to results from ANSWERS, our discrimination performance seems very close to those obtained with spatial enhancement in their model. For example, with five VF tests, they reported a 2.6-fold improvement in the HR compared to simple linear regression at 95% specificity.

^{10}In our analysis, the improvement was 2.8-fold with the Hi-HSK model and 2.6 for the Hi-censored model. Note that results with 5 VFs are only reported here for comparison and are not part of our main analysis. Finally, we showed that our hierarchical structure with random effects retained enough flexibility so not to compromise its predictive ability (Fig. 6). These observations, together with the improved interpretability of the model, make our approach reasonable and, to some extent, preferable. Further flexibility could be introduced with customized structure-function clustering techniques.

^{3}

^{,}

^{42}

^{43}

^{,}

^{44}Their evaluation mostly focused on the improvement in prediction accuracy and did not explore the effect of their model on progression detection. Their results are difficult to compare to ours since they based their modeling on total deviation values. Interestingly, their MAE for prediction was similar but generally lower compared to ours. However, this was also the case for the simple pointwise linear regression, possibly indicative of a difference in the composition of the datasets used for validation, such as a larger proportion of stable patients. One notable difference is that we calculated the prediction MAE for all subsequent VFs and not just the last one in the series, as in Murata et al.

^{43}

^{,}

^{44}

^{45}but they still rely on time-consuming numerical computations that offer little advantage over Bayesian implementations. Moreover, they do not allow for potential integration of prior knowledge, for example from structural data.

^{41}

^{46}showed how hierarchical models were more powerful than event based analyses, although they only analyzed the progression of mean deviation over time. A more complex hierarchical structure, making full use of pointwise data, has been used by our group in a previous analysis of trial data.

^{47}That application was based on a ML procedure and did not account for censoring. Although this might be irrelevant when only people with early or no VF loss are recruited, the floor effect could bias the results where the focus is on more advanced visual field loss in people with later stage glaucoma.

**G. Montesano**, CenterVue (C);

**D.F. Garway-Heath**, Carl Zeiss Meditec (C), CenterVue (C), Heidelberg Engineering (F), Moorfields MDT (P), ANSWERS (P), T4 (P);

**G. Ometto**, (N);

**D.P. Crabb**, CenterVue (C), ANSWERS (P), T4 (P)

*Ophthalmology*. 2000; 107: 1809–1815. [CrossRef] [PubMed]

*Vision Res*. 2009; 49: 2157–2163. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53: 6981–6990. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2000; 41: 417–421. [PubMed]

*Invest Ophthalmol Vis Sci*. 1993; 34: 3534–3540. [PubMed]

*Invest Ophthalmol Vis Sci*. 2016; 57: 288–294. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2018; 7: 35. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2011; 52: 9539–9540. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2013; 54: 6694–6700. [CrossRef] [PubMed]

*PLoS One*. 2014; 9: e85654. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2019; 8: 25. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53: 6776–6784. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2018; 7: 14. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2015; 56: 4283–4289. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2013; 54: 1544–1553. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2011; 52: 4765–4773. [CrossRef] [PubMed]

*JAMA Ophthalmol*. 2016; 134: 496–502. [CrossRef] [PubMed]

*BMJ Open Ophthalmol*. 2019; 4: e000352. [CrossRef] [PubMed]

*Am J Ophthalmol*. 2019; 207: 144–150. [CrossRef] [PubMed]

*Br J Ophthalmol*. 2020; 104: 1406–1411. [CrossRef] [PubMed]

*Ophthalmic Physiol Opt*. 2015; 35: 225–230. [CrossRef] [PubMed]

*Ophthalmic Physiol Opt*. 2017; 37: 82–87. [CrossRef] [PubMed]

*Br J Ophthalmol*. 2012; 96: 1185–1189. [CrossRef] [PubMed]

*Trans Am Ophthalmol Soc*. 2017; 115: T4. [PubMed]

*Health Technol Assess*. 2018; 22: 1–106. [CrossRef] [PubMed]

*Ophthalmology*. 2014; 121: 2023–2027. [CrossRef] [PubMed]

*J Vis*. 2013; 13(4): 10. [CrossRef] [PubMed]

*Ger J Ophthalmol*. 1992; 1: 79–85. [PubMed]

*Wind Energy*. 2014; 17: 1753–1766. [CrossRef]

*The R Journal*. 2016; 8: 8. [CrossRef]

*J Vis*. 2012; 12(11): 22. [CrossRef] [PubMed]

*Proc 3rd Int Workshop on Distributed Statistical Computing*. 2003; 124(125.10): 1–10.

*J Stat Software*. 2015; 1(1): 2015.

*Stat Sci*. 1992; 7: 457–472.

*Transl Vis Sci Technol*. 2018; 7: 22. [CrossRef] [PubMed]

*BMC Bioinformatics*. 2011; 12: 77. [CrossRef] [PubMed]

*A Package for Survival Analysis in R, 3.2-9*ed, 2021. Available at https://cran.r-project.org/web/packages/survival/vignettes/survival.pdf.

*Scand J Stat*. 1979; 6: 65–70.

*Arch Ophthalmol*. 2010; 128: 570–576. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2012; 53: 2760–2769. [CrossRef] [PubMed]

*Transl Vis Sci Technol*. 2021; 10: 19. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2014; 55: 8386–8392. [CrossRef] [PubMed]

*Invest Ophthalmol Vis Sci*. 2018; 59: 1897–1904. [CrossRef] [PubMed]

*censReg: Censored Regression (Tobit) Models, 0.5-32*ed, 2020. Available at https://rdrr.io/cran/censReg/

*Ophthalmol Glaucoma*. 2019; 2: 72–77. [CrossRef] [PubMed]

*Ophthalmology*. 2020; 127: 1313–1321. [CrossRef] [PubMed]

^{4}are reported for reference. The polynomial fit used on the HALIFAX dataset was used to model the change in variability in the artificially shifted series to assess bias in the estimates of the slopes. Confidence Intervals are reported only for the linear approximation for comparison with Henson's results. The linear approximation was also recalculated with a break-point at 10 dB, the minimum sensitivity in Henson et al.

^{4}Despite the small coefficient value, the cubic term in the polynomial fit significantly improved the goodness of fit (

*P*< 0.001).

^{35}diagnostics < 1.2. We used 1000 samples for adaptation, 5000 burn-in samples and a minimum of 5000 samples to effectively sample the posterior distribution (more were added until convergence was achieved). Priors on the fixed effects (global intercept and slope) were non-informative normal distributions with a precision of 0.01 (Variance = 100) and the mean determined by the results of a simple ML fit on the same data (using the package

*lme4*

^{34}). These were also the starting values for the MCMC.