Multi-state modelling

Saturday 30 March 2013

Simulation Based Confidence Intervals for Functions with Complicated Derivatives

Micha Mandel has a new paper in the American Statistician. This concerns the simulation based Delta method approach to obtaining asymptotic confidence intervals which has been used quite extensively for obtaining confidence intervals of transition probabilities in multi-state models. This essentially involves assuming $\inline \hat\theta \sim N(\theta, I(\theta)^{-1})$ and constructing confidence intervals for $\inline g(\hat\theta)$ by simulating $\inline \hat\theta^{*}_1 , \ldots, \hat\theta^{*}_B$ and then considering the empirical distribution of $\inline g(\hat\theta^{*}_1) , \ldots, g(\hat\theta^{*}_B)$

A formal justification for the approach is laid out and some simulations of its performance compared to the standard Delta method in some important cases is given. It is stated that the utility of the simulation method is not in situations when the Delta method fails, but in situations where the calculating the derivatives needed for the Delta method is difficult. In particular, it still requires the functional g to be differentiable. This seems to down play the simulation method slightly. One advantage of the simulation method is that it is not necessary to linearize g around the MLE. To take a pathological example consider data that are $\inline N(\mu, 1)$ . Suppose we are interested in estimating $\inline \mu^3$ . When $\inline \mu = 0$ the coverage of the Delta method confidence interval will be anti-conservative because $\inline \mu^3$ has a point of inflection at $\inline \mu = 0$ . The simulation based will work fine in this situation (as obviously would just constructing a confidence interval for $\inline \mu$ and just cubing the upper and lower limits!).

Tuesday 5 February 2013

The gradient function as an exploratory goodness-of-fit assessment of the random-effects distribution in mixed models

Geert Verbeke and Geert Molenberghs have a new paper in Biostatistics. The paper proposes the use of the gradient function (or equivalently the directional derivatives) of the marginal likelihood with respect to the random effects distribution, as a way of assessing goodness-of-fit in a mixed model. They concentrate on cases related to standard longitudinal data analysis using linear (or generalized linear) mixed models, however the method can be extended to other mixed models, such as clustered multi-state models with multivariate (log)-normal random effects.

If we consider data from units i with observations x_i, given a mixing distribution G, we can say the marginal density is given by $f(\mathbf{x}_i , G) = \int f(\mathbf{x}_i, \mathbf{u}) dG(\mathbf{u})$ .

The gradient function is then taken as $\Delta(G,\mathbf{u}) = \frac{1}{N} \frac{f(\mathbf{x}_i, \mathbf{u})}{f(\mathbf{x}_i, G)}$ where N is the total number of independent clusters

The use of the gradient function stems from finite mixture models and in particular the problem of finding the non-parametric maximum likelihood estimate of the mixing distribution. At the NPMLE the gradient function has a supremum of 1. If instead we assume there is a parametric mixing distribution, under correct specification the gradient function should be close to 1 across all values of u. Verbeke and Molenberghs use this property to construct an informal graphical diagnostic of the appropriateness of the proposed random effects distribution.

An advantage of the approach is that essentially no additional calculations are required to compute the measure, above and beyond those already needed for estimation of the parametric mixture model itself. A current limitation of the approach is that there is no formal test to assess whether the observed deviation is statistically significant. It is stated that this is ongoing work. It seems reasonably straightforward to show that the gradient function will tend to a Gaussian process with mean 1 but with a quite complicated covariance structure. Obtaining some nice asymptotics for a statistic based either on the maximum deviation from 1 or some weighted integral of the distance from 1 therefore seems unlikely. However, it may be possible to obtain a simulation based p-value by simulating from the limiting Gaussian process.

Wednesday 9 January 2013

Book Review of: Competing Risks and Multistate Models with R.

Ross Maller has written a book review of Beyersmann, Schumacher and Allignol's recent Springer book on Competing Risks and Multistate Models with R, published in Australian & NZ Journal of Statistics. This is primarily a rant against the cause-specific hazard approach to modelling competing risks. For instance cause specific hazards "do not explicitly take into account the obvious mixture of distributions inherent in the data." Moreover, the fact that assuming proportionality in cause-specific-hazards (CSHs) can lead to non-proportional, even crossing, relationships for cumulative incidence functions (CIFs), is painted as a terminal weakness to the approach.

Maller's main contribution to survival analysis is through models for cure fractions (see e.g. Maller and Zhou 1995) an approach that he is evidently very taken with. Apparently the correct approach to take in modelling competing risks data is to assume a finite mixture model, such that individuals in a particular class are only at risk of one particular failure rate. Moreover, the problem of covariates is claimed to be entirely solved by allowing proportional hazards within failure types, which Maller says is the approach taken by Fine and Gray (1999).

The entire nature of survival and event history analysis is in modelling the dynamics of the process. In most circumstances it is much more useful to be able to describe the process at time t given no event has occurred by time t than to describe the process conditional on a latent class membership. Moreover, in the vast majority of competing risks data, at least in medical contexts, all patients are at risk of all event types until experiencing an event. A mixture model could therefore only ever be viewed as a mathematical convenience. The fact that in practice a CSH method is actually substantially more convenient, particularly if a non- or semi-parametric approach is to be adopted, hardly aids the case for mixture models.

Maller is also misrepresenting the Fine-Gray approach which does not assume proportional hazards within failure types. The Larson-Dinse (1985) paper that Maller also cites does involves that approach. But that can lead to the same crossing cumulative incidence curves Maller takes issue with in the context of CSH. Fine-Gray assumes proportionality of the sub-distribution hazard for a particular cause. This does allow proportionality for that cause's corresponding CIF but, as a consequence, is unable to provide a covariate model for other CIFs that is guaranteed to lead to a feasible set of CIFs for all covariate values (ie. we can fit a Fine-Gray model to each cause of failure but the resulting models will be contradictory).

Fundamentally, whatever model that is postulated, we can find the implied cause-specific hazards. Assuming proportionality of the cause-specific hazards is obviously only a modelling assumption but in nearly all cases it will be a better starting point than assuming the existence of cure fractions.

Friday 23 November 2012

Ties between event times and jump times in the Cox model

Xin, Horrocks and Darlington have a new paper in Statistics in Medicine. This considers approaches for dealing with ties in Cox proportional hazard models, not between event times but between event times and changes to time dependent covariates.

If a change in a time-dependent covariate coincides with a failure time there is ambiguity over which value of the time dependent covariate, z(t+) or z(t-), should be taken for the risk set at time t. By convention, it is usually assumed that z(t-) should be taken, i.e. that the change in the covariate occurs after the failure time. The authors demonstrate that for small sample sizes and/or a large proportion of ties, the estimates can be sensitive to the convention chosen. The authors also only consider cases where z(t) is a binary indicator that jumps from 0 to 1 at some point and cannot make the reverse jump. Obviously this will magnify the potential for bias because the "change after" convention will always underestimate the true risk whereas the "change before" will always overestimate the true risk.

The authors consider some simple adjustments for the problem: compute the "change before" and "change after" estimates and take their average or use random jittering. A problem with the averaging approach is estimating the standard error of the resulting estimator. An upper bound can be obtained by assuming the two estimators have perfect correlation. The jittering estimator obviously has the problem that different random jitters will give different results, though in principle the jittering could be repeated multiple times and combined in a fashion akin to multiple imputation.

It is surprising that the further option of adopting an method akin to the Efron method for ties. Essentially at each failure time there is an associated risk set. It could be argued that every tied covariate jump time had a 50% chance of occurring before or after the failure time. The expected contribution from a particular risk set could then be $\sum_{r \in \mathcal{R}} 0.5 \exp\{\beta z_r(t+)\} + 0.5\exp\{\beta z_r(t-)\}$
It should also be possible to apply this approach using standard software, e.g. coxph() in R. It is simply necessary to replace any (start,stop) interval that ends with a tied "stop" with two intervals (start, stop - 0.00001) and (start, stop + 0.00001) each of which are associated with a weight of 0.5.

Sunday 28 October 2012

Survival analysis with time varying covariates measured at random times by design

Stephen Rathbun, Xiao Song, Benjamin Neustiftler and Saul Shiffman have a new paper in Applied Statistics (JRSS C). This considers estimation of a proportional hazards survival model in the presence of time dependent covariates which are only intermittently sampled at random time points. Specifically they are interested in examples relating to ecological momentary assessment where data collection may be via electronic devices like smart phones and the decision on sampling times can be automated. They consider a self-correcting point-process sampling design, where the intensity of the sampling process depends on the past history of sampling times, which allows an individual to have random sampling times that are more regular than would be achieved from a Poisson process.

The proposed method of estimation is to use inverse intensity weighting to obtain an estimate of an individual's integrated hazard up to the event time. Specifically the estimator is $\hat{H}_i(T_i) = \sum_{j=1}^{N_i} h_i(u_{ij}) \pi^{-1}_i(u_{ij})$ for a individual with $\inline N_i$ sampling times at times $\inline 0 \leq u_{ij} \leq T_i$ and point process sampling intensity $\inline \pi(t)$ . This then replaces the integrated hazard in an approximate log-likelihood.

In part of the simulation study and in an application the exact point process intensity is unknown and taken from empirical estimates from the sample. Estimating the sampling intensity didn't seem to have major consequences on the integrity of the model estimates. This seems to suggest the approach might be applicable in other survival models where the covariates are sampled longitudinally in a subject specific manner, provided a reasonable model for sampling can be devised.

Drawbacks of the method seem to be that covariates measured at baseline (which is not a random time point) cannot be incorporated in the estimate and that it seems that the covariates must be measured at the event time which may not be the case in medical contexts. The underlying hazard also needs to be specified parametrically, but as stated flexible spline modelled can be used.

Friday 26 October 2012

Estimating parametric semi-Markov models from panel data using phase-type approximations

Andrew Titman has a new paper in Statistics and Computing. This extends previous work on fitting semi-Markov models to panel data using phase-type distributions. Here, rather than assume a model in which each state is assumed to have a Coxian phase-type sojourn distribution, the model assumes a standard parametric sojourn distribution (e.g. Weibull or Gamma). The computational tractability of phase-type distributions is exploited by approximating the parametric semi-Markov model by a model in which each state has a 5-phase phase-type distribution. In order to achieve this, a family of approximations to the parametric distribution, with scale parameter 1, is developed by solving a relatively large one-off optimization assuming the optimal phase-type parameters for given shape parameters evolve as B-spline functions. The resulting phase-type approximations can then be scaled to give an approximation for any combination of scale and shape parameter and then embedded into the overall semi-Markov process. The resulting approximate likelihood appears to be very close to the exact likelihood, both in terms of shape and magnitude.

In general a 2-phase Coxian phase-type model is likely to give similar results to a Weibull or Gamma model. The only advantages of the Weibull or Gamma model are that the parameters are identifiable under the null (Markov) model so standard likelihood ratio tests can be used to compare models (unlike for the 2-phase model). Also the Weibull or Gamma model requires fewer parameters so may be useful for smaller sample sizes.

Constrained parametric model for simultaneous inference of two cumulative incidence functions

Haiwen Shi, Yu Cheng and Jong-Hyeon Jeong have a new paper in Biometrical Journal. This paper is somewhat similar in aims to the pre-print by Hudgens, Li and Fine in that it is concerned with parametric estimation in competing risks models. In particular, the focus is on building models for the cumulative incidence functions (CIFs) but ensuring that the CIFs sum to less than 1 at the asymptote as time tends to infinity. Hudgens, Li and Fine dealt with interval censored data but without covariates. Here, the data are assumed to be observed up to right-censoring but the emphasis is on simultaneously obtaining regression models directly for each CIF in a model with two risks.

The approach taken in the current paper is to assume that the CIFs will sum to 1 at the asymptote, to model the cause 1 CIF using a modified three-parameter logistic function with covariates via an appropriate link function. The CIF for the second competing risk is assumed to also have a three-parameter logistic form, but covariates only affect this CIF through the probability of this risk ever occurring.

When a particular risk in a competing risks model is of primary interest, the Fine-Gray model is attractive because it makes interpretation of the covariate effects straightforward. The model of Shi et al seems to be for cases where both risks are considered important, but still seems to require that one risk be considered more important. The main danger of the approach seems to be that the model for the effect of covariates on the second risk may be unrealistic, but will have an effect on the estimates for the first risk. If we only care about the first risk the Fine-Gray model would be a safer bet. If we care about both risks it might be wiser to choose a model based on the cause-specific hazards, which are guaranteed to induce a model with well behaved CIFs albeit at the expense of some interpretability of the resulting CIFs.

Obtaining a model with a direct CIF effect for each cause seems an almost impossible task because, if we allow a covariate to effect the CIF in such a way that a sufficiently extreme covariate leads to a CIF arbitrarily close to 1, it must be having a knock-on effect on the other CIF. The only way around this would be to have a model that assigns maximal asymptote probabilities to the CIFs at infinity that are independent of any covariates e.g. $F_k(t;z) = p_k [1 - \{1-F^{*}_{0k}(t)\}^{\exp(\beta_k z)}]$ where $\inline F^{*}_{0k}(t)$ are increasing functions taking values in [0,1] and $\inline \sum p_k \leq 1$ . The need to restrict the $\inline p_k$ to be independent of covariates would make the model quite inflexible however.