Content
Survival analysis. 1
Kaplan-Meier survival estimates. 1
Survival plot. 10
Follow-up life table. 10
Abridged life table. 13
Log-Rank & Wilcoxon. 19
Wei-Lachin test. 25
Cox regression. 31
·Kaplan-Meier
·Follow-uplife table
·Abridgedlife table
·Log-rankand Wilcoxon
·Wei-Lachin
·Coxregression
Menu location: Analysis_Survival.
This section provides methods forthe description and comparison of survival experience in different groups.
Note that StatsDirectsurvival analysis functions do not use separate variables for different groups.The groups are indicated by a groupidentifier variable that contains group numbers or text strings, i.e. for 2groups you might have a column of 1 and 2. Each value in the group identifiercolumn identifies its row with respect to time, death and censorship data in adjacent columns.
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Survival_Kaplan-Meier.
This function estimates survivalrates and hazard from data that may be incomplete.
The survival rate is expressed asthe survivor function (S):
- where t is a time period knownas the survival time, time to failure or time to event (such as death); e.g. 5years in the context of 5 year survival rates. Some texts present S as theestimated probability of surviving to time t for those alive just before tmultiplied by the proportion of subjects surviving to t.
The product limit (PL) method of Kaplan and Meier(1958) is used to estimate S:
- where ti is duration of study at point i,di is number of deaths up to point i and ni is number of individualsat risk just prior to ti. S is based upon theprobability that an individual survives at the end of a time interval, on thecondition that the individual was present at the start of the time interval. Sis the product (P) of these conditional probabilities.
If a subject is last followed upat time ti and then leaves the study for any reason(e.g. lost to follow up) ti is counted as their censorship time.
Assumptions:
·Censored individuals have the sameprospect of survival as those who continue to be followed. This can not betested for and can lead to a bias thatartificially reduces S.
·Survival prospects are the same for early as for late recruits tothe study (can be tested for).
·The event studied(e.g. death) happens at the specified time. Late recording of the event studiedwill cause artificial inflation of S.
The cumulative hazard function(H) is the risk of event (e.g. death) at time t, it isestimated by the method of Peterson (1977)as:
S and H with their standarderrors and confidence intervals can be saved to a workbook for further analysis(see below).
Median and mean survival time
The median survival time iscalculated as the smallest survival time for which the survivor function isless than or equal to 0.5. Some data sets may not get this far, in which casetheir median survival time is not calculated. A confidence interval for themedian survival time is constructed using a robust non-parametric method due toBrookmeyer andCrowley (1982). Another confidence interval for the median survival time isconstructed using a large sample estimate of the density function of thesurvival estimate (Andersen,1993). If there are many tied survival times then the Brookmeyer-Crowleylimits should not be used.
Mean survival time is estimatedas the area under the survival curve. The estimator is based upon the entirerange of data. Note that some software uses only the data up to the lastobserved event; Hosmerand Lemeshow (1999) point out that this biases the estimate of the meandownwards, and they recommend that the entire range of data is used. A largesample method is used to estimate the variance of the mean survival time andthus to construct a confidence interval (Andersen, 1993).
Samples of survival times arefrequently highly skewed, therefore, in survival analysis,the median is generally a better measure of central location than the mean.
Plots
StatsDirect can calculate S and H for more than one group at a time and plotthe survival and hazard curves for the different groups together. Fourdifferent plots are given and certain distributions are indicated if theseplots form a straight line pattern (Lawless, 1982;Kalbfleisch and Prentice, 1980). The plots and their associateddistributions are:
Plot Distribution indicated by a straight line pattern
H vs. t Exponential, throughthe origin with slope l
ln(H) vs. ln(t) Weibull,intercept beta and slope ln(l)
z(S) vs. ln(t) Log-normal
H/t vs. t Linearhazard rate
- wheret is time, ln is natural (base e) logarithm, z(p) isthe p quantile from the standardnormal distribution and l (lambda) is the real probability of event/death at time t.
For survival plots that displayconfidence intervals, save the results of this function to a workbook and usethe Survivalfunction of the graphics menu.
Note that censored times aremarked with a small vertical tick on the survival curve; you have the option toturn this off. If you want to use markers for observed event/death/failuretimes then please check the box when prompted.
Technical validation
The variance of S is estimatedusing the method of Greenwood (1926):
- The confidence interval for thesurvivor function is not calculated directly using Greenwood's variance estimate as this wouldgive impossible results (< 0 or > 1) at extremes of S. The confidenceinterval for S uses an asymptotic maximum likelihood solution by logtransformation as recommended by Kalbfleisch andPrentice (1980).
The cumulative hazard function isestimated as minus the natural logarithm of the product limit estimate of thesurvivor function as above (Peterson, 1977).Note that some statistical software calculates the simpler Nelson-Aalenestimate (Nelson,1972; Aalen, 1978):
A Nelson-Aalen hazard estimatewill always be less than an equivalent Peterson estimate and there is nosubstantial case for using one in favour of theother.
The variance of H hat isestimated as:
Further analysis
S and H do not assume specificdistributions for survival or hazard curves. If survival plots indicatespecific distributions then more powerful estimates of S and H might beachieved by modelling. The commonest model isexponential but Weibull, log-normal, log-logistic andGamma often appear. An expert Statistician and specialist software (e.g. GLIM, MLPand some of the SAS modules) should beemployed to pursue this sort of work. In most situations, however, you shouldconsider improving the estimates of S and H by using Coxregression rather than parametric models.
If H is constant over time then aplot of the natural log of H vs. time will resemble a straight line with slope l. If this istrue then:
Probability of survival beyond t= exponent(-l * t)
- thiseases the calculation of relative risk from the ratio of hazard functions attime t on two survival curves. When the hazard function depends on time thenyou can usually calculate relative risk after fitting Cox'sproportional hazards model. This model assumes that for each group thehazard functions are proportional at each time, itdoes not assume any particular distribution function for the hazard function.Proportional hazards modelling can be very useful, however, most researchers should seek statisticalguidance with this.
Example
Test workbook (Survivalworksheet: Group Surv, Time Surv,Censor Surv).
In a hypothetical example, deathfrom a cancer after exposure to a particular carcinogen was measured in twogroups of rats. Group 1 had a different pre-treatment régime to group 2. Thetime from pre-treatment to death is recorded. If a rat was still living at theend of the experiment or it had died from a different cause then that time isconsidered "censored". A censoredobservation is given the value 0 inthe death/censorship variable to indicate a "non-event".
Group 1: 143, 165, 188, 188, 190,192, 206, 208, 212, 216, 220, 227, 230, 235, 246, 265, 303, 216*, 244*
Group 2: 142, 157, 163, 198, 205,232, 232, 232, 233, 233, 233, 233, 239, 240, 261, 280, 280, 295, 295, 323,204*, 344*
* = censored data
To analysethese data in StatsDirect you must first prepare themin three workbook columns appropriately labelled:
Group Surv | Time Surv | Censor Surv |
2 | 142 | 1 |
1 | 143 | 1 |
2 | 157 | 1 |
2 | 163 | 1 |
1 | 165 | 1 |
1 | 188 | 1 |
1 | 188 | 1 |
1 | 190 | 1 |
1 | 192 | 1 |
2 | 198 | 1 |
2 | 204 | 0 |
2 | 205 | 1 |
1 | 206 | 1 |
1 | 208 | 1 |
1 | 212 | 1 |
1 | 216 | 0 |
1 | 216 | 1 |
1 | 220 | 1 |
1 | 227 | 1 |
1 | 230 | 1 |
2 | 232 | 1 |
2 | 232 | 1 |
2 | 232 | 1 |
2 | 233 | 1 |
2 | 233 | 1 |
2 | 233 | 1 |
2 | 233 | 1 |
1 | 235 | 1 |
2 | 239 | 1 |
2 | 240 | 1 |
1 | 244 | 0 |
1 | 246 | 1 |
2 | 261 | 1 |
1 | 265 | 1 |
2 | 280 | 1 |
2 | 280 | 1 |
2 | 295 | 1 |
2 | 295 | 1 |
1 | 303 | 1 |
2 | 323 | 1 |
2 | 344 | 0 |
Alternatively, open the testworkbook using the file open function of the file menu. Then selectKaplan-Meier from the Survival Analysis section of the analysis menu. Selectthe column marked "Group Surv" when askedfor the group identifier, select "Time Surv"when asked for times and "Censor Surv" whenasked for deaths/events. Click on No when you are asked whether or not you wantto save various statistics to the workbook. Click on Yes when you are promptedabout plotting PL estimates.
For this example:
Kaplan-Meier survivalestimates
Group: 1 (Group Surv = 2)
Time | At risk | Dead | Censored | S | SE(S) | H | SE(H) |
142 | 22 | 1 | 0 | 0.954545 | 0.044409 | 0.04652 | 0.046524 |
157 | 21 | 1 | 0 | 0.909091 | 0.061291 | 0.09531 | 0.06742 |
163 | 20 | 1 | 0 | 0.863636 | 0.073165 | 0.146603 | 0.084717 |
198 | 19 | 1 | 0 | 0.818182 | 0.08223 | 0.200671 | 0.100504 |
204 | 18 | 0 | 1 | 0.818182 | 0.08223 | 0.200671 | 0.100504 |
205 | 17 | 1 | 0 | 0.770053 | 0.090387 | 0.261295 | 0.117378 |
232 | 16 | 3 | 0 | 0.625668 | 0.105069 | 0.468935 | 0.16793 |
233 | 13 | 4 | 0 | 0.433155 | 0.108192 | 0.836659 | 0.249777 |
239 | 9 | 1 | 0 | 0.385027 | 0.106338 | 0.954442 | 0.276184 |
240 | 8 | 1 | 0 | 0.336898 | 0.103365 | 1.087974 | 0.306814 |
261 | 7 | 1 | 0 | 0.28877 | 0.099172 | 1.242125 | 0.34343 |
280 | 6 | 2 | 0 | 0.192513 | 0.086369 | 1.64759 | 0.44864 |
295 | 4 | 2 | 0 | 0.096257 | 0.064663 | 2.340737 | 0.671772 |
323 | 2 | 1 | 0 | 0.048128 | 0.046941 | 3.033884 | 0.975335 |
344 | 1 | 0 | 1 | 0.048128 | 0.046941 | 3.033884 | 0.975335 |
Median survival time = 233
Andersen 95% CI for mediansurvival time = 231.898503 to 234.101497
Brookmeyer-Crowley 95% CI for median survival time = 232 to 240
Mean survival time (95% CI)[limit: 344 on 323] = 241.283422 (219.591463 to 262.975382)
Group: 2 (Group Surv = 1)
Time | At risk | Dead | Censored | S | SE(S) | H | SE(H) |
143 | 19 | 1 | 0 | 0.947368 | 0.051228 | 0.054067 | 0.054074 |
165 | 18 | 1 | 0 | 0.894737 | 0.070406 | 0.111226 | 0.078689 |
188 | 17 | 2 | 0 | 0.789474 | 0.093529 | 0.236389 | 0.11847 |
190 | 15 | 1 | 0 | 0.736842 | 0.101023 | 0.305382 | 0.137102 |
192 | 14 | 1 | 0 | 0.684211 | 0.106639 | 0.37949 | 0.155857 |
206 | 13 | 1 | 0 | 0.631579 | 0.110665 | 0.459532 | 0.175219 |
208 | 12 | 1 | 0 | 0.578947 | 0.113269 | 0.546544 | 0.195646 |
212 | 11 | 1 | 0 | 0.526316 | 0.114549 | 0.641854 | 0.217643 |
216 | 10 | 1 | 1 | 0.473684 | 0.114549 | 0.747214 | 0.241825 |
220 | 8 | 1 | 0 | 0.414474 | 0.114515 | 0.880746 | 0.276291 |
227 | 7 | 1 | 0 | 0.355263 | 0.112426 | 1.034896 | 0.316459 |
230 | 6 | 1 | 0 | 0.296053 | 0.108162 | 1.217218 | 0.365349 |
235 | 5 | 1 | 0 | 0.236842 | 0.10145 | 1.440362 | 0.428345 |
244 | 4 | 0 | 1 | 0.236842 | 0.10145 | 1.440362 | 0.428345 |
246 | 3 | 1 | 0 | 0.157895 | 0.093431 | 1.845827 | 0.591732 |
265 | 2 | 1 | 0 | 0.078947 | 0.072792 | 2.538974 | 0.922034 |
303 | 1 | 1 | 0 | 0 | * | infinity | * |
Median survival time = 216
Andersen 95% CI for mediansurvival time = 199.619628 to 232.380372
Brookmeyer-Crowley 95% CI for median survival time = 192 to 230
Mean survival time (95% CI) =218.684211 (200.363485 to 237.004936)
Below is the classical"survival plot" showing how survival declines with time. Theapproximate linearity of the log hazard vs. log time plot below indicates a Weibull distribution of survival.
At this point you might want torun a formal hypothesis test to see if there is any statistical evidence fortwo or more survival curves being different. This can be achieved usingsensitive parametric methods if you have fitted a particular distribution curveto your data. More often you would use the Log-rankand Wilcoxon tests which do not assume any particular distribution of thesurvivor function.
confidenceintervals
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Graphics_Survival.
This provides a step plot fordisplaying survival curves. It is intended for use with variables for Time onthe x (horizontal) axis and S (the Kaplan-Meier product limit estimate ofsurvival / survivor function) on the Y (vertical) axis. You can displaymultiple series, each with a different marker style. Confidence intervals for Scan be displayed.
Note that censored times aremarked with a small vertical tick on the survival curve.
This is a good accompaniment to apresentation of survival analysis that compares survival (or time to event)data in different groups. See Kaplan-Meierfor more information on generating S.
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Survival_Follow-Up Life Table.
This function provides afollow-up life table that displays the survival experience of a cohort.
The table is constructed by thefollowing definitions:
Interval For a Berkson and Gage survival table thisis the survival times in intervals.
For anabridgedlife table this is ages in groups.
Deaths Number of individuals who die in the interval. [dx]
Withdrawn Number of individualswithdrawn or lost to follow up in the interval.[wx]
AtRisk Number of individuals alive at the startof the interval. [nx]
Adj. atrisk Adjusted number at risk (half ofwithdrawals of current interval subtracted). [n'x]
P(death) Probability that an individual who survived the last intervalwill die in the current interval. [qx]
P(survival) Probability that an individual who survived the last intervalwill survive the current interval. [px]
% Survivors (lx) Probability of anindividual surviving beyond the current interval.
Proportion of survivors after the current interval.
Life table survival rate.
Var(lx%) Estimated variance of lx.
*% CI forlx% *% confidence interval for lx%.
- where lx is the product of all pxbefore x.
Technical validation
The Berksonand Gage method is used to construct the basic table (Berkson and Gage,1950; Armitage and Berry, 1994; Altman, 1991; Lawless, 1982; Kalbfleisch andPrentice, 1980; Le, 1997). The confidence interval for lx is not a simpleapplication of the estimated variance for Ix, insteadit uses a maximum likelihood solution from an asymptotic distribution by thetransformation of lx suggested by Kalbfleisch andPrentice (1980). This treatment of lx avoids impossible values (i.e. >1or <0).
Example
From Armitage and Berry(1994, p. 473).
Test workbook (Survivalworksheet: Year, Died, Withdrawn).
The following data represent thesurvival of a 374 patients who had one type of surgery for a particularmalignancy.
Years since operation | Died in this interval | Lost to follow up |
1 | 90 | 0 |
2 | 76 | 0 |
3 | 51 | 0 |
4 | 25 | 12 |
5 | 20 | 5 |
6 | 7 | 9 |
7 | 4 | 9 |
8 | 1 | 3 |
9 | 3 | 5 |
10 | 2 | 5 |
To analysethese data in StatsDirect you must first prepare themin three workbook columns appropriately labelled.Alternatively, open the test workbook using the file open function of the filemenu. Then select Simple Life Table from the survival analysis section of theanalysis menu. Selectthe column marked "Year" when asked for the times, select"Died" when asked for deaths and "Withdrawn" when asked forwithdrawals. Select 374 (total deaths and withdrawals) as thenumber alive at the start.
For this example:
Follow-up life table
Interval | Deaths | Withdrawn | At risk | Adj. at risk | P(death) |
0 to 1 | 90 | 0 | 374 | 374 | 0.240642 |
1 to 2 | 76 | 0 | 284 | 284 | 0.267606 |
2 to 3 | 51 | 0 | 208 | 208 | 0.245192 |
3 to 4 | 25 | 12 | 157 | 151 | 0.165563 |
4 to 5 | 20 | 5 | 120 | 117.5 | 0.170213 |
5 to 6 | 7 | 9 | 95 | 90.5 | 0.077348 |
6 to 7 | 4 | 9 | 79 | 74.5 | 0.053691 |
7 to 8 | 1 | 3 | 66 | 64.5 | 0.015504 |
8 to 9 | 3 | 5 | 62 | 59.5 | 0.05042 |
9 to 10 | 2 | 5 | 54 | 51.5 | 0.038835 |
10 up | 21 | 26 | 47 | * | * |
Interval | P(survival) | Survivors (lx%) | SD of lx% | 95% CI for lx% |
0 to 1 | 0.759358 | 100 | * | * to * |
1 to 2 | 0.732394 | 75.935829 | 10.57424 | 71.271289 to 79.951252 |
2 to 3 | 0.754808 | 55.614973 | 7.87331 | 50.428392 to 60.482341 |
3 to 4 | 0.834437 | 41.97861 | 7.003571 | 36.945565 to 46.922332 |
4 to 5 | 0.829787 | 35.028509 | 6.747202 | 30.200182 to 39.889161 |
5 to 6 | 0.922652 | 29.066209 | 6.651959 | 24.47156 to 33.805 |
6 to 7 | 0.946309 | 26.817994 | 6.659494 | 22.322081 to 31.504059 |
7 to 8 | 0.984496 | 25.378102 | 6.700832 | 20.935141 to 30.043836 |
8 to 9 | 0.94958 | 24.984643 | 6.720449 | 20.552912 to 29.648834 |
9 to 10 | 0.961165 | 23.724913 | 6.803396 | 19.323326 to 28.39237 |
10 up | * | 22.803557 | 6.886886 | 18.417247 to 27.483099 |
We conclude with 95% confidencethat the true population survival rate 5 years after the surgical operationstudied is between 24.5% and 33.8% for people diagnosed as having this cancer.
confidenceintervals
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Survival_Abridged life table
This function provides a currentlife table (actuarial table) that displays the survival experience of a givenpopulation in abridged form.
The table is constructed by thefollowing definitions of Greenwood (1922)and Chiang (1984):
- where qihat is the probability that an individual will die in the ithinterval, ni is the length of the interval, Mi is thedeath rate in the interval (i.e. the number of individuals dying in theinterval [Di] divided by the mid-year population [Pi], which is the number ofyears lived in the interval by those alive at the start of the interval, i.e.it is the person-time denominator for the rate), and aiis the fraction of the last age interval of life.
To explain ai:When a person dies at a certain age they have lived only a fraction of theinterval in which their age at death sits, the average of all of thesefractions of the interval for all people dying in the interval is call thefraction of the last age interval of life, ai. Infantdeaths tend to occur early in the first year of life (which is the usual firstage interval for abridged life tables). The ai valuefor this interval is around 0.1 in developed countries and higher where infant mortality ratesare higher. The values for young childhood intervals are around 0.4 and foradult intervals are around 0.5. The proper values for aican be calculated from the full death records. If the full records are notavailable then the WHO guidelines are to use the following aivalues for the first interval given the following infant mortality rates:
Infant mortality rate per 1000 | ai |
< 20 | 0.09 |
20 - 40 | 0.15 |
40 - 60 | 0.23 |
> 60 | 0.30 |
The rest of the calculationsproceed using the following formulae on a theoretical standard startingpopulation of 100,000 (the radix value) living at the start. In other words, weare constructing an artificial cohort of 100,000 and overlaying currentmortality experience on them in order to work out life expectancies.
- where w is the number ofintervals, di is the number out of the artificialcohort dying in the ith interval, liis the number out of the artificial cohort alive at the start of the interval,Li is the number of years lived in the interval by the artificial cohort, Ti isthe total number of years lived by those individuals from the artificial cohortattaining the age that starts the interval, and ei isthe observed expectation of life at the age that starts the interval.
Note that the value for the lastinterval length is not important, since this is calculated as an open intervalas above. When preparing your data you will therefore have one less row in theinterval column than in the columns for mid-year population in the interval andthe deaths in the interval. The conventional interval pattern is:
Interval length | Interval |
1 | 0 to 1 |
4 | 1 to 4 |
5 | 5 to 9 |
5 | 10 to 14 |
5 | 15 to 19 |
5 | 20 to 24 |
5 | 25 to 29 |
5 | 30 to 34 |
5 | 35 to 39 |
5 | 40 to 44 |
5 | 45 to 49 |
5 | 50 to 54 |
5 | 55 to 59 |
5 | 60 to 64 |
5 | 65 to 69 |
5 | 70 to 74 |
5 | 75 to 79 |
5 | 80 to 84 |
85 up |
- whichis extended to 90 nowadays.
Standard errors and confidenceintervals for q and e are calculated using the formulae given by Chiang (1984):
- where S squared e hat alpha isthe variance of the expectation of life at the age of the start of the intervalalpha, and S squared q hat i is the variance of theprobability of death for the ith interval.
If you want to test whether ornot the probability of death in one age interval is statistically significantlydifferent from another interval, or compare the probability of death in a givenage interval from two different populations (e.g. male vs. female), then youcan use the following formulae:
- whereZ is a standard normal test statistic and SE is the standard error of thedifference between the two (ith vs. jth) probabilities of death that you are comparing.
Comparison of two expectation oflife statistics can be made in a similar way to the above, but the standarderror for the difference between two e statistics is simply the square root ofthe sum of the squared standard errors of the e statistics being compared.
Adjusting life expectancy fora given utility
You can specify a weightingvariable for utility to be applied to each interval. This is used, for examplein the calculation of health adjusted life expectancy (HALE) by assuming thatthere is more health utility (sometimes defined by absence of disability) insome periods of life than in others. Wolfson (1996)describes the principles of health adjusted life expectancy.
StatsDirect simply multiplies Ti (the total number of years lived by thoseindividuals from the artificial cohort attaining the age that starts theinterval) by the given ith utility weight, thendivides as usual by li (the number out of theartificial cohort alive at the start of the interval) in order to computeadjusted life expectancy.
Data preparation
Prepare your data in four columnsas follows:
1. Lengthof age interval (w-1 rows corresponding to w intervals as described above)
2. Mid-year population, or number of years lived in the interval by those alive at its start (w rows)
3. Deathsin interval (w rows)
4. Fraction(a) of last age interval of life (w-1 rows)
5. (Utilityweight [optional], e.g. proportion of the interval of life spent withoutdisability in a given population)
If the fraction 'a' is notprovided then it is assumed to be 0.1 for the infant interval, 0.4 for theearly childhood interval and 0.5 for all other intervals. You should endeavour to supply the best estimate of 'a' possible.
Example
From Chiang (1984, p141):The total population of Californiain 1970.
Test workbook (Survivalworksheet: Interval, Population, Deaths, Fraction a).
Abridged life table
Interval | Population | Deaths | Death rate |
0 to 1 | 340483 | 6234 | 0.018309 |
1 to 4 | 1302198 | 1049 | 0.000806 |
5 to 9 | 1918117 | 723 | 0.000377 |
10 to 14 | 1963681 | 735 | 0.000374 |
15 to 19 | 1817379 | 2054 | 0.00113 |
20 to 24 | 1740966 | 2702 | 0.001552 |
25 to 29 | 1457614 | 2071 | 0.001421 |
30 to 34 | 1219389 | 1964 | 0.001611 |
35 to 39 | 1149999 | 2588 | 0.00225 |
40 to 44 | 1208550 | 4114 | 0.003404 |
45 to 49 | 1245903 | 6722 | 0.005395 |
50 to 54 | 1083852 | 8948 | 0.008256 |
55 to 59 | 933244 | 11942 | 0.012796 |
60 to 64 | 770770 | 14309 | 0.018565 |
65 to 69 | 620805 | 17088 | 0.027526 |
70 to 74 | 484431 | 19149 | 0.039529 |
75 to 79 | 342097 | 21325 | 0.062336 |
80 to 84 | 210953 | 20129 | 0.095419 |
85 up | 142691 | 22483 | 0.157564 |
Interval | Probability of dying [qx] | SE of qx | 95% CI for qx |
0 to 1 | 0.018009 | 0.000226 | 0.017566 to 0.018452 |
1 to 4 | 0.003216 | 0.000099 | 0.003022 to 0.00341 |
5 to 9 | 0.001883 | 0.00007 | 0.001746 to 0.00202 |
10 to 14 | 0.00187 | 0.000069 | 0.001735 to 0.002005 |
15 to 19 | 0.005638 | 0.000124 | 0.005395 to 0.005881 |
20 to 24 | 0.007729 | 0.000148 | 0.007439 to 0.00802 |
25 to 29 | 0.007079 | 0.000155 | 0.006776 to 0.007383 |
30 to 34 | 0.008022 | 0.00018 | 0.007669 to 0.008376 |
35 to 39 | 0.011193 | 0.000219 | 0.010764 to 0.011622 |
40 to 44 | 0.016888 | 0.000261 | 0.016376 to 0.0174 |
45 to 49 | 0.026639 | 0.000321 | 0.02601 to 0.027267 |
50 to 54 | 0.040493 | 0.000419 | 0.039671 to 0.041315 |
55 to 59 | 0.062075 | 0.00055 | 0.060997 to 0.063153 |
60 to 64 | 0.088863 | 0.000709 | 0.087474 to 0.090253 |
65 to 69 | 0.128933 | 0.000921 | 0.127129 to 0.130737 |
70 to 74 | 0.180519 | 0.001181 | 0.178204 to 0.182833 |
75 to 79 | 0.270386 | 0.001582 | 0.267286 to 0.273486 |
80 to 84 | 0.385206 | 0.002129 | 0.381034 to 0.389379 |
85 up | 1 | * | * to * |
Interval | Living at start [lx] | Dying [dx] | Fraction of last interval of life [ax] |
0 to 1 | 100000 | 1801 | 0.09 |
1 to 4 | 98199 | 316 | 0.41 |
5 to 9 | 97883 | 184 | 0.44 |
10 to 14 | 97699 | 183 | 0.54 |
15 to 19 | 97516 | 550 | 0.59 |
20 to 24 | 96966 | 749 | 0.49 |
25 to 29 | 96217 | 681 | 0.51 |
30 to 34 | 95536 | 766 | 0.52 |
35 to 39 | 94769 | 1061 | 0.53 |
40 to 44 | 93709 | 1583 | 0.54 |
45 to 49 | 92126 | 2454 | 0.53 |
50 to 54 | 89672 | 3631 | 0.53 |
55 to 59 | 86041 | 5341 | 0.52 |
60 to 64 | 80700 | 7171 | 0.52 |
65 to 69 | 73529 | 9480 | 0.51 |
70 to 74 | 64048 | 11562 | 0.52 |
75 to 79 | 52486 | 14192 | 0.51 |
80 to 84 | 38295 | 14751 | 0.5 |
85 up | 23543 | 23543 | * |
Interval | Years in interval [Lx] | Years beyond start of interval [Tx] |
0 to 1 | 98361 | 7195231 |
1 to 4 | 392051 | 7096870 |
5 to 9 | 488900 | 6704819 |
10 to 14 | 488075 | 6215919 |
15 to 19 | 486454 | 5727844 |
20 to 24 | 482921 | 5241390 |
25 to 29 | 479416 | 4758468 |
30 to 34 | 475840 | 4279052 |
35 to 39 | 471354 | 3803213 |
40 to 44 | 464903 | 3331858 |
45 to 49 | 454863 | 2866955 |
50 to 54 | 439827 | 2412091 |
55 to 59 | 417386 | 1972264 |
60 to 64 | 386289 | 1554878 |
65 to 69 | 344417 | 1168590 |
70 to 74 | 292493 | 824173 |
75 to 79 | 227663 | 531680 |
80 to 84 | 154596 | 304017 |
85 up | 149421 | 149421 |
Interval | Expectation of life [ex] | SE of ex | 95% CI for ex |
0 to 1 | 71.952313 | 0.037362 | 71.879085 to 72.025541 |
1 to 4 | 72.270232 | 0.034115 | 72.203367 to 72.337097 |
5 to 9 | 68.498121 | 0.033492 | 68.432478 to 68.563764 |
10 to 14 | 63.623174 | 0.033231 | 63.558043 to 63.688305 |
15 to 19 | 58.737306 | 0.033025 | 58.672578 to 58.802034 |
20 to 24 | 54.053615 | 0.032466 | 53.989981 to 54.117248 |
25 to 29 | 49.45559 | 0.031785 | 49.393293 to 49.517888 |
30 to 34 | 44.790023 | 0.031151 | 44.728969 to 44.851077 |
35 to 39 | 40.131217 | 0.030436 | 40.071563 to 40.190871 |
40 to 44 | 35.555493 | 0.029616 | 35.497446 to 35.613539 |
45 to 49 | 31.119893 | 0.028788 | 31.06347 to 31.176317 |
50 to 54 | 26.899049 | 0.027963 | 26.844242 to 26.953856 |
55 to 59 | 22.922407 | 0.02697 | 22.869548 to 22.975266 |
60 to 64 | 19.267406 | 0.025794 | 19.216851 to 19.31796 |
65 to 69 | 15.892984 | 0.024469 | 15.845026 to 15.940942 |
70 to 74 | 12.867973 | 0.022957 | 12.822978 to 12.912969 |
75 to 79 | 10.129843 | 0.021419 | 10.087862 to 10.171824 |
80 to 84 | 7.938844 | 0.018833 | 7.901931 to 7.975756 |
85 up | 6.346617 | * | * to * |
Median expectation of life (ageat which half of original cohort survives) = 75.876035
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Survival_Log-Rank & Wilcoxon.
This function provides methodsfor comparing two or more survival curves where some of the observations may becensored and where the overall grouping maybe stratified. The methods are nonparametric in that they do not makeassumptions about the distributions of survival estimates.
In the absence of censorship(e.g. loss to follow up, alive at end of study) the methods presented herereduce to a Mann-Whitney(two sample Wilcoxon) testfor two groups of survival times and a Kruskal-Wallistest for more than two groups of survival times. StatsDirectgives a comprehensive set of tests for the comparison of survival data that maybe censored (Taroneand Ware, 1977; Kalbfleisch and Prentice, 1980; Cox and Oakes, 1984; Le, 1997).
The null hypothesis tested hereis that the risk of death/event is the same in all groups.
Peto's log-rank test is generally the most appropriate method but thePrentice modified Wilcoxon test is more sensitivewhen the ratio of hazards is higher at early survival times than at late ones (Peto and Peto, 1972;Kalbfleisch and Prentice, 1980). The log-rank test is similar to the Mantel-Haenszeltest and some authors refer to it as the Cox-Mantel test (Mantel and Haenszel,1959; Cox, 1972).
Strata
An optional variable, strata,allows you to sub-classify the groups specified in the group identifiervariable and to test the significance of this sub-classification (Armitage and Berry,1994; Lawless, 1982; Kalbfleisch and Prentice, 1980).
Wilcoxon weights
StatsDirect gives you a choice of three different weighting methods for the generalised Wilcoxon test, these are Peto-Prentice, Gehan-Breslow and Tarone-Ware.The Peto-Prentice method is generally more robust thanthe others but the Gehan statistic is calculatedroutinely by many statistical software packages (Breslow, 1974;Tarone and Ware, 1977; Kalbfleisch and Prentice, 1980; Miller, 1981; Hosmer andLemeshow 1999). You should seek statistical guidance if you plan to use anyweighting method other than Peto-Prentice.
Hazard-ratios
An approximate confidenceinterval for the log hazard-ratio is calculated using the following estimate ofstandard error (se):
- where ei is the extent of exposure to risk of death(sometimes called expected deaths) for group iof k at the jth distinct observed time (sometimes called expected deaths) for group iof k (Armitageand Berry, 1994).
An exact conditional maximumlikelihood estimate of the hazard ratio is optionally given. The exact estimateand its confidence interval (Fisher or mid-P) should be routinely used inpreference to the above approximation. The exponents of Coxregression parameters are also exact estimators of the hazard ratio, butplease note that they are not exact if Breslow'smethod has been used to correct for ties in the regression. Please consult witha statistician if you are considering using Cox regression.
Trend test
If you have more than two groupsthen StatsDirect will calculate a variant of thelog-rank test for trend. If you choose not to enter group scores then they areallocated as 1,2,3 ... n in group order (Armitage and Berry,1994; Lawless, 1982; Kalbfleisch and Prentice, 1980).
Technical validation
The generaltest statistic is calculated around a hypergeometricdistribution of the number of events at distinct event times:
- where theweight wj for the log-rank test is equal to 1,and wj for the generalisedWilcoxon test is ni(Gehan-Breslow method); for the Tarone-Waremethod wj is the square root of ni; and for the Peto-Prenticemethod wj is the Kaplan-Meier survivorfunction multiplied by (ni divided by ni +1). eijis the expectation of death in group i at the jth distinct observed time where djevents/deaths occurred. nijis the number at risk in group i just beforethe jth distinct observed time. The teststatistic for equality of survival across the k groups (populationssampled) is approximately chi-square distributed on k-1 degrees offreedom. The test statistic for monotone trend is approximately chi-squaredistributed on 1 degree of freedom. c is avector of scores that are either defined by the user or allocated as 1 to k.
Variance isestimated by the method that Peto (1977) refers to as"exact".
Thestratified test statistic is expressed as (Kalbfleischand Prentice, 1980):
- where the statistics defined above are calculated withinstrata then summed across strata prior to the generalisedinverse and transpose matrix operations.
Example
From Armitage and Berry(1994, p. 479).
Test workbook (Survivalworksheet: Stage Group, Time, Censor).
The following data represent thesurvival in days since entry to the trial of patients with diffuswww.lindalemus.com/rencai/e histiocytic lymphoma. Two different groups of patients,those with stage III and those with stage IV disease,are compared.
Stage 3: 6, 19, 32, 42, 42, 43*,94, 126*, 169*, 207, 211*, 227*, 253, 255*, 270*, 310*, 316*, 335*, 346*
Stage 4: 4, 6, 10, 11, 11, 11,13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31, 33, 34, 35, 39, 40, 41*, 43*,45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89, 90, 93, 104, 110, 134, 137,160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*, 260*, 284*, 290*, 291*,302*, 304*, 341*, 345*
* = censored data (patient stillalive or died from an unrelated cause)
To analysethese data in StatsDirect you must first prepare themin three workbook columns as shown below:
Stage group | Time | Censor |
1 | 6 | 1 |
1 | 19 | 1 |
1 | 32 | 1 |
1 | 42 | 1 |
1 | 42 | 1 |
1 | 43 | 0 |
1 | 94 | 1 |
1 | 126 | 0 |
1 | 169 | 0 |
1 | 207 | 1 |
1 | 211 | 0 |
1 | 227 | 0 |
1 | 253 | 1 |
1 | 255 | 0 |
1 | 270 | 0 |
1 | 310 | 0 |
1 | 316 | 0 |
1 | 335 | 0 |
1 | 346 | 0 |
2 | 4 | 1 |
2 | 6 | 1 |
2 | 10 | 1 |
2 | 11 | 1 |
2 | 11 | 1 |
2 | 11 | 1 |
2 | 13 | 1 |
2 | 17 | 1 |
2 | 20 | 1 |
2 | 20 | 1 |
2 | 21 | 1 |
2 | 22 | 1 |
2 | 24 | 1 |
2 | 24 | 1 |
2 | 29 | 1 |
2 | 30 | 1 |
2 | 30 | 1 |
2 | 31 | 1 |
2 | 33 | 1 |
2 | 34 | 1 |
2 | 35 | 1 |
2 | 39 | 1 |
2 | 40 | 1 |
2 | 41 | 0 |
2 | 43 | 0 |
2 | 45 | 1 |
2 | 46 | 1 |
2 | 50 | 1 |
2 | 56 | 1 |
2 | 61 | 0 |
2 | 61 | 0 |
2 | 63 | 1 |
2 | 68 | 1 |
2 | 82 | 1 |
2 | 85 | 1 |
2 | 88 | 1 |
2 | 89 | 1 |
2 | 90 | 1 |
2 | 93 | 1 |
2 | 104 | 1 |
2 | 110 | 1 |
2 | 134 | 1 |
2 | 137 | 1 |
2 | 160 | 0 |
2 | 169 | 1 |
2 | 171 | 1 |
2 | 173 | 1 |
2 | 175 | 1 |
2 | 184 | 1 |
2 | 201 | 1 |
2 | 222 | 1 |
2 | 235 | 0 |
2 | 247 | 0 |
2 | 260 | 0 |
2 | 284 | 0 |
2 | 290 | 0 |
2 | 291 | 0 |
2 | 302 | 0 |
2 | 304 | 0 |
2 | 341 | 0 |
2 | 345 | 0 |
Alternatively, open the testworkbook using the file open function of the file menu. Then select Log-rank& Wilcoxon from the Survival Analysis section ofthe analysis menu. Selectthe column marked "Stage group" when asked for the group identifier,select "Time" when asked for times and "Censor" for censorship. Click on the cancel button whenasked about strata.
For this example:
Logrank and Wilcoxon tests
Log Rank (Peto):
For group 1 (Stage group = 1)
Observed deaths = 8
Extent of exposure to risk ofdeath = 16.687031
Relative rate = 0.479414
For group 2 (Stage group = 2)
Observed deaths = 46
Extent of exposure to risk ofdeath = 37.312969
Relative rate = 1.232815
test statistics:
-8.687031, 8.687031
variance-covariance matrix:
0.088912 | -11.24706 |
-11.24706 | 11.24706 |
Chi-square for equivalence ofdeath rates = 6.70971 P = 0.0096
Hazard Ratio, (approximate 95%confidence interval)
Group 1 vs. Group 2 = 0.388878,(0.218343 to 0.692607)
Conditional maximum likelihoodestimates:
Hazard Ratio = 0.381485
Exact Fisher 95% confidenceinterval = 0.154582 to 0.822411
Exact Fisher one sided P =0.0051, two sided P = 0.0104
Exact mid-P 95% confidenceinterval = 0.167398 to 0.783785
Exact mid-P one sided P = 0.0034,two sided P = 0.0068
Generalised Wilcoxon (Peto-Prentice):
test statistics:
-5.19836, 5.19836
variance-covariance matrix:
0.201506 | -4.962627 |
-4.962627 | 4.962627 |
Chi-square for equivalence ofdeath rates = 5.44529 P = 0.0196
Both log-rank and Wilcoxon tests demonstrated a statistically significantdifference in survival experience between stage 3 andstage 4 patients in this study.
Stratified example
From Peto et al. (1977):
Group | Trial Time | Censorship | Strat |
1 | 8 | 1 | 1 |
1 | 8 | 1 | 2 |
2 | 13 | 1 | 1 |
2 | 18 | 1 | 1 |
2 | 23 | 1 | 1 |
1 | 52 | 1 | 1 |
1 | 63 | 1 | 1 |
1 | 63 | 1 | 1 |
2 | 70 | 1 | 2 |
2 | 70 | 1 | 2 |
2 | 180 | 1 | 2 |
2 | 195 | 1 | 2 |
2 | 210 | 1 | 2 |
1 | 220 | 1 | 2 |
1 | 365 | 0 | 2 |
2 | 632 | 1 | 2 |
2 | 700 | 1 | 2 |
1 | 852 | 0 | 2 |
2 | 1296 | 1 | 2 |
1 | 1296 | 0 | 2 |
1 | 1328 | 0 | 2 |
1 | 1460 | 0 | 2 |
1 | 1976 | 0 | 2 |
2 | 1990 | 0 | 2 |
2 | 2240 | 0 | 2 |
Censorship 1 = death event
Censorship 0 = lost to follow-up
Stratum 1 = renal impairment
Stratum 2 = no renal impairment
The table above shows you how toprepare data for a stratified log-rank test in StatsDirect.This example is worked through in the second of two classic papers by Richard Peto and colleagues (Peto et al., 1977,1976). Please note that StatsDirect uses the moreaccurate variance formulae mentioned in the statistical notes section at theend of Peto etal. (1977).
P values
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Survival_Wei-Lachin.
This function gives a two sampledistribution free method for the comparison of two multivariate distributionsof survival (time-to-event) data that may be censored(incomplete, e.g. alive at end of study or lost to follow up). Multivariatemethods such as this should be used only with expert statistical guidance.
Wei and Lachingeneralise the Log-rank and Gehangeneralised Wilcoxon tests(using a random censorship model) for multivariate survival data with two maingroups. (Makuch and Escobar,1991; Wei and Lachin, 1984).
Data preparation
StatsDirect asks you for a group identifier, this could be a column of 1 and 2representing the two groups. You then select k pairs of survival time (time-to-event)and censorship columns for k repeat times. Censored data are coded as 0 anduncensored data are coded as 1.
Repeat times may representseparate factors or the observation of the same factor repeated on k occasions.For example, time to develop symptoms could be analysedfor k different symptoms in a group of patients treated with drug x andcompared with a group of patients not treated with drug x.
Missing data can be code eitherby entering a missing data symbol * as the time, or by setting censored equalto 0 and time less than the minimum uncensored time in your data set.
For further details please referto Makuch andEscobar (1991) and Wei and Lachin(1984).
Technical Validation
Wei and Lachin's multivariate tests are calculated for the case totwo multivariate distributions, and the intermediate univariatestatistics are given. The algorithm used for the method is that given by Makuch and Escobar (1991).
The generalunivariate statistic for comparing the time to event(of component type k out of m multivariate components) of the twogroups is calculated as:
- where n1is the number of event times per component in group 1; n2 is the numberof event times per component in group 2; n is the total number of eventtimes per component; rik is the number at riskat time t(i) in the kthcomponent; D is equal to 0 if an observation is censoredor 1 otherwise; eik is the expected proportion ofevents in group i for the kthcomponent; and wj is equal to 1 for thelog-rank method or (r1k+r2k)/n for the Gehan-Breslow generalised Wilcoxon method.
The univariate statistic for the kthcomponent of the multivariate survival data is calculated as:
- where skkcaret is the kth diagonal element of theestimated variance-covariance matrix that is calculated as described by Makuch and Escobar (1991).
An omnibustest that the two multivariate distributions are equal is calculated as:
- where T' is the transpose of the vector of univariate test statistics and S-1 is the generalised inverse of the estimated variance-covariancematrix.
Astochastic ordering test statistic is calculated as:
Note thatthe P value given with the stochastic ordering (linear combination) statisticis two sided, some authors prefer one sided inference (Davis, 1994). If you make a one sided inference then youare considering only ascending or only descending ordering, and you areassuming that observing an order in the opposite direction to that expectedwould be unimportant to your conclusions.
The teststatistics are all asymptotically normally distributed.
Example
From Makuch and Escobar(1991).
Test workbook (Survival worksheet:Treatment Gp, Time m1, Censor m1, Time m2, Censor m2,Time m3, Censor m3, Time m4, Censor m4).
The following data represent thetimes in days it took in vitro cultures of lymphocytes to reach a level of p24antigen expression. The cultures where taken from patientsinfected with HIV-1 who had advanced AIDS or AIDS related complex. Theidea was that patients whose cultures took a short time to express p24 antigenhad a greater load of HIV-1. The two groups represented patients on twodifferent treatments. The culture was run for 30 days and specimens whichremained negative or which became contaminated were called censored (=0). Thetests were run over four 30 day periods.
Treatment Gp | time m1 | censor m1 | time m2 | censor m2 | time m3 | censor m3 | time m4 | censor m4 |
1 | 8 | 1 | 0 | 0 | 25 | 0 | 21 | 1 |
1 | 6 | 1 | 4 | 1 | 5 | 1 | 5 | 1 |
1 | 6 | 1 | 5 | 1 | 28 | 0 | 18 | 1 |
1 | 14 | 0 | 35 | 0 | 23 | 1 | 19 | 0 |
1 | 7 | 1 | 0 | 0 | 13 | 1 | 0 | 0 |
1 | 5 | 1 | 4 | 1 | 27 | 1 | 8 | 1 |
1 | 5 | 1 | 21 | 0 | 6 | 1 | 14 | 1 |
1 | 6 | 1 | 10 | 1 | 14 | 1 | 18 | 1 |
1 | 7 | 1 | 4 | 1 | 15 | 1 | 8 | 1 |
1 | 6 | 1 | 5 | 1 | 5 | 1 | 5 | 1 |
1 | 4 | 1 | 5 | 1 | 6 | 1 | 3 | 1 |
1 | 5 | 1 | 4 | 1 | 7 | 1 | 5 | 1 |
1 | 21 | 0 | 5 | 1 | 0 | 0 | 6 | 1 |
1 | 13 | 1 | 27 | 0 | 21 | 0 | 8 | 1 |
1 | 4 | 1 | 27 | 0 | 7 | 1 | 6 | 1 |
1 | 6 | 1 | 3 | 1 | 7 | 1 | 8 | 1 |
1 | 6 | 1 | 0 | 0 | 5 | 1 | 5 | 1 |
1 | 6 | 1 | 0 | 0 | 4 | 1 | 6 | 1 |
1 | 7 | 1 | 9 | 1 | 6 | 1 | 7 | 1 |
1 | 8 | 1 | 15 | 1 | 8 | 1 | 0 | 0 |
1 | 18 | 0 | 27 | 0 | 18 | 0 | 9 | 1 |
1 | 16 | 1 | 14 | 1 | 14 | 1 | 6 | 1 |
1 | 15 | 1 | 9 | 1 | 12 | 1 | 12 | 1 |
2 | 4 | 1 | 5 | 1 | 4 | 1 | 3 | 1 |
2 | 8 | 1 | 22 | 1 | 25 | 0 | 0 | 0 |
2 | 6 | 1 | 6 | 1 | 8 | 1 | 5 | 1 |
2 | 7 | 1 | 10 | 1 | 10 | 1 | 18 | 1 |
2 | 5 | 1 | 14 | 1 | 17 | 0 | 6 | 1 |
2 | 3 | 1 | 5 | 1 | 8 | 1 | 6 | 1 |
2 | 6 | 1 | 11 | 1 | 6 | 1 | 13 | 1 |
2 | 6 | 1 | 0 | 0 | 15 | 1 | 7 | 1 |
2 | 6 | 1 | 12 | 1 | 19 | 1 | 8 | 1 |
2 | 6 | 1 | 25 | 0 | 0 | 0 | 22 | 0 |
2 | 4 | 1 | 7 | 1 | 5 | 1 | 7 | 1 |
2 | 5 | 1 | 7 | 1 | 4 | 1 | 6 | 1 |
2 | 3 | 1 | 9 | 1 | 7 | 1 | 6 | 1 |
2 | 9 | 1 | 17 | 1 | 0 | 0 | 21 | 0 |
2 | 6 | 1 | 4 | 1 | 8 | 1 | 14 | 1 |
2 | 5 | 1 | 5 | 1 | 7 | 1 | 16 | 0 |
2 | 12 | 1 | 18 | 0 | 14 | 1 | 0 | 0 |
2 | 9 | 1 | 11 | 1 | 15 | 1 | 18 | 0 |
2 | 6 | 1 | 5 | 1 | 9 | 1 | 0 | 0 |
2 | 18 | 0 | 8 | 1 | 10 | 1 | 13 | 1 |
2 | 4 | 1 | 4 | 1 | 5 | 1 | 10 | 1 |
2 | 3 | 1 | 10 | 1 | 0 | 0 | 21 | 0 |
2 | 8 | 1 | 7 | 1 | 10 | 1 | 12 | 1 |
2 | 3 | 1 | 6 | 1 | 7 | 1 | 9 | 1 |
To analysethese data in StatsDirect you must first prepare themin 9 workbook columns as shown above. Alternatively, open the test workbookusing the file open function of the file menu. Then select Wei-Lachin from the Survival Analysis section of the analysismenu. Selectthe column marked "Treatment GP" when asked for the group identifier.Next, enter the number of repeat times as four. Select "time m1" and"censor m1" for time and censorshipfor repeat time one. Repeat this selection process for the other three repeattimes.
For this example:
Wei-LachinAnalysis
Univariate Generalised Wilcoxon(Gehan)
total cases = 47 by group = 23 24
Observed failures by group = 2023
repeat time = 1
Wei-Lachint = -0.527597
Wei-Lachinvariance = 0.077575
z = -1.89427
chi-square = 3.588261, P = .0582
Observed failures by group = 1421
repeat time = 2
Wei-Lachint = 0.077588
Wei-Lachinvariance = 0.056161
z = 0.327397
chi-square = 0.107189, P = .7434
Observed failures by group = 1819
repeat time = 3
Wei-Lachint = -0.11483
Wei-Lachinvariance = 0.060918
z = -0.465244
chi-square = 0.216452, P = .6418
Observed failures by group = 2016
repeat time = 4
Wei-Lachint = 0.335179
Wei-Lachinvariance = 0.056281
z = 1.412849
chi-square = 1.996143, P = .1577
Multivariate Generalised Wilcoxon (Gehan)
Covariance matrix:
0.077575 | |||
0.026009 | 0.056161 | ||
0.035568 | 0.020484 | 0.060918 | |
0.023525 | 0.016862 | 0.026842 | 0.056281 |
Inverse of covariance matrix:
19.204259 | |||
-5.078483 | 22.22316 | ||
-8.40436 | -3.176864 | 25.857118 | |
-2.497583 | -3.020025 | -7.867237 | 23.468861 |
repeat times = 4
chi squared omnibus statistic = 9.242916 P = .0553
stochastic ordering z = -0.30981 one sided P = 0.3784, two sided P = 0.7567
Univariate Log-Rank
total cases = 47 by group = 23 24
Observed failures by group = 2023
repeat time = 1
Wei-Lachint = -0.716191
Wei-Lachinvariance = 0.153385
z = -1.828676
chi-square = 3.344058, P = .0674
Observed failures by group = 14 21
repeat time = 2
Wei-Lachint = -0.277786
Wei-Lachinvariance = 0.144359
z = -0.731119
chi-square = 0.534536, P = .4647
Observed failures by group = 1819
repeat time = 3
Wei-Lachint = -0.372015
Wei-Lachinvariance = 0.150764
z = -0.9581
chi-square = 0.917956, P = .338
Observed failures by group = 2016
repeat time = 4
Wei-Lachint = 0.619506
Wei-Lachinvariance = 0.143437
z = 1.635743
chi-square = 2.675657, P = .1019
Multivariate Log-Rank
Covariance matrix:
0.153385 | |||
0.049439 | 0.144359 | ||
0.052895 | 0.050305 | 0.150764 | |
0.039073 | 0.047118 | 0.052531 | 0.143437 |
Inverse of covariance matrix:
7.973385 | |||
-1.779359 | 8.69056 | ||
-1.892007 | -1.661697 | 8.575636 | |
-0.894576 | -1.761494 | -2.079402 | 8.555558 |
repeat times = 4
chi squared omnibus statistic = 9.52966, P = .0491
stochastic ordering z = -0.688754, one sided P = 0.2455, two sided P = 0.491
Here the multivariate log-ranktest has revealed a statistically significant difference between the treatmentgroups which was not revealed by any of the individual univariatetests. For more detailed discussion of each result parameter see Wei and Lachin(1984).
P values
Copyright © 1990-2006 StatsDirectLimited, all rights reserved
Download a free 10 day StatsDirect trial
Menu location: Analysis_Survival_Cox Regression.
This function fits Cox's proportionalhazards model for survival-time (time-to-event) outcomes on one or morepredictors.
Cox regression (or proportionalhazards regression) is method for investigating the effect of several variablesupon the time a specified event takes to happen. In the context of an outcomesuch as death this is known as Cox regression for survival analysis. The methoddoes not assume any particular "survival model" but it is not trulynon-parametric because it does assume that the effects of the predictorvariables upon survival are constant over time and are additive in one scale.You should not use Cox regression without the guidance of a Statistician.
Provided that the assumptions ofCox regression are met, this function will provide better estimates of survivalprobabilities and cumulative hazard than those provided by the Kaplan-Meierfunction.
Hazard and hazard-ratios
Cumulative hazard at a time t isthe risk of dying between time 0 and time t, and the survivor function at timet is the probability of surviving to time t (see also Kaplan-Meierestimates).
The coefficients in a Coxregression relate to hazard; a positive coefficient indicates a worse prognosisand a negative coefficient indicates a protective effect of the variable withwhich it is associated.
The hazards ratio associated witha predictor variable is given by the exponent of its coefficient; this is givenwith a confidence interval under the "coefficient details" option in StatsDirect. The hazards ratio may also be thought of asthe relative death rate, see Armitage and Berry(1994). The interpretation of the hazards ratio depends upon themeasurement scale of the predictor variable in question, see Sahai and Kurshid(1996) for further information on relative risk of hazards.
Time-dependent and fixedcovariates
In prospective studies, whenindividuals are followed over time, the values of covariates may change withtime. Covariates can thus be divided into fixed and time-dependent. A covariateis time dependent if the difference between its values for two differentsubjects changes with time; e.g. serum cholesterol. A covariate is fixed if itsvalues can not change with time, e.g. sex or race. Lifestyle factors andphysiological measurements such as blood pressure are usually time-dependent.Cumulative exposures such as smoking are also time-dependent but are oftenforced into an imprecise dichotomy, i.e. "exposed" vs."not-exposed" instead of the more meaningful "time of exposure".There are no hard and fast rules about the handling of time dependentcovariates. If you are considering using Cox regression you should seek thehelp of a Statistician, preferably at the design stage of the investigation.
Model analysis and deviance
A test of the overall statisticalsignificance of the model is given under the "model analysis" option.Here the likelihood chi-square statistic is calculated by comparing thedeviance (- 2 * log likelihood) of your model, with all of the covariates youhave specified, against the model with all covariates dropped. The individualcontribution of covariates to the model can be assessed from the significancetest given with each coefficient in the main output; this assumes a reasonablylarge sample size.
Deviance is minus twice the logof the likelihood ratio for models fitted by maximum likelihood (Hosmer and Lemeshow,1989 and 1999; Cox and Snell, 1989; Pregibon, 1981). The value of adding aparameter to a Cox model is tested by subtracting the deviance of the modelwith the new parameter from the deviance of the model without the newparameter, the difference is then tested against a chi-square distribution withdegrees of freedom equal to the difference between the degrees of freedom ofthe old and new models. The model analysis option tests the model you specifyagainst a model with only one parameter, the intercept; this tests the combinedvalue of the specified predictors/covariates in the model.
Some statistical packages offerstepwise Cox regression that performs systematic tests for differentcombinations of predictors/covariates. Automatic model building procedures suchas these can be misleading as they do not consider the real-world importance ofeach predictor, for this reason StatsDirect does notinclude stepwise selection.
Survival and cumulative hazardrates
The survival/survivorshipfunction and the cumulative hazard function (as discussed under Kaplan-Meier)are calculated relative to the baseline (lowest value of covariates) at eachtime point. Cox regression provides a better estimate of these functions thanthe Kaplan-Meier method when the assumptions of the Cox model are met and thefit of the model is strong.
You are given the option to‘centre continuous covariates’ – this makes survival and hazard functionsrelative to the mean of continuous variables rather than relative to theminimum, which is usually the most meaningful comparison.
If you have binary/dichotomouspredictors in your model you are given the option to calculate survival andcumulative hazards for each variable separately.
Data preparation
·Time-to-event, e.g. time a subject in atrial survived.
·Event / censor code - this must be ³1 (event(s) happened) or 0 (no event atthe end of the study, i.e. "right censored").
·Strata - e.g. centre code for amulti-centre trial. Be careful with your choice of strata; seek the advice of aStatistician.
·Predictwww.lindalemus.com/kuaiji/ors - these are also referred toas covariates, which can be a number of variables that are thought to berelated to the event under study. If a predictor is a classifier variable withmore than two classes (i.e. ordinal or nominal) then you must first use the dummyvariable function to convert it to a series of binary classes.
Technical validation
StatsDirect optimises the log likelihood associatedwith a Cox regression model until the change in log likelihood with iterationsis less than the accuracy that you specify in the dialog box that is displayedjust before the calculation takes place (Lawless, 1982;Kalbfleisch and Prentice, 1980; Harris, 1991; Cox and Oakes, 1984; Le, 1997;Hosmer and Lemeshow, 1999).
The calculation options dialog boxsets a value (default is 10000) for "SPLITTING RATIO"; this is theratio in proportionality constant at a time t above which StatsDirectwill split your data into more strata and calculate an extended likelihoodsolution, see Brysonand Johnson, (1981).
Ties are handled by Breslow's approximation (Breslow, 1974).
Cox-Snell residuals arecalculated as specified by Cox and Oakes (1984).Cox-Snell, Martingale and deviance residuals are calculated as specified by Collett (1994).
Baseline survival and cumulativehazard rates are calculated at each time. Maximum likelihood methods are used,which are iterative when there is more than one death/event at an observed time(Kalbfleisch andPrentice, 1973). Other software may use the less precise Breslow estimates for these functions.
Example
From Armitage and Berry(1994, p. 479).
Test workbook (Survivalworksheet: Stage Group, Time, Censor).
The following data represent thesurvival in days since entry to the trial of patients with diffuse histiocytic lymphoma. Two different groups of patients,those with stage III and those with stage IV disease,are compared.
Stage3: 6, 19, 32, 42, 42, 43*, 94, 126*, 169*, 207, 211*, 227*, 253, 255*, 270*,310*, 316*, 335*, 346*
Stage4: 4, 6, 10, 11, 11, 11, 13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31, 33,34, 35, 39, 40, 41*, 43*, 45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89, 90,93, 104, 110, 134, 137, 160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*,260*, 284*, 290*, 291*, 302*, 304*, 341*, 345*
* = censored data (patient stillalive or died from an unrelated cause)
To analysethese data in StatsDirect you must first prepare themin three workbook columns as shown below:
Stage group | Time | Censor |
1 | 6 | 1 |
1 | 19 | 1 |
1 | 32 | 1 |
1 | 42 | 1 |
1 | 42 | 1 |
1 | 43 | 0 |
1 | 94 | 1 |
1 | 126 | 0 |
1 | 169 | 0 |
1 | 207 | 1 |
1 | 211 | 0 |
1 | 227 | 0 |
1 | 253 | 1 |
1 | 255 | 0 |
1 | 270 | 0 |
1 | 310 | 0 |
1 | 316 | 0 |
1 | 335 | 0 |
1 | 346 | 0 |
2 | 4 | 1 |
2 | 6 | 1 |
2 | 10 | 1 |
2 | 11 | 1 |
2 | 11 | 1 |
2 | 11 | 1 |
2 | 13 | 1 |
2 | 17 | 1 |
2 | 20 | 1 |
2 | 20 | 1 |
2 | 21 | 1 |
2 | 22 | 1 |
2 | 24 | 1 |
2 | 24 | 1 |
2 | 29 | 1 |
2 | 30 | 1 |
2 | 30 | 1 |
2 | 31 | 1 |
2 | 33 | 1 |
2 | 34 | 1 |
2 | 35 | 1 |
2 | 39 | 1 |
2 | 40 | 1 |
2 | 41 | 0 |
2 | 43 | 0 |
2 | 45 | 1 |
2 | 46 | 1 |
2 | 50 | 1 |
2 | 56 | 1 |
2 | 61 | 0 |
2 | 61 | 0 |
2 | 63 | 1 |
2 | 68 | 1 |
2 | 82 | 1 |
2 | 85 | 1 |
2 | 88 | 1 |
2 | 89 | 1 |
2 | 90 | 1 |
2 | 93 | 1 |
2 | 104 | 1 |
2 | 110 | 1 |
2 | 134 | 1 |
2 | 137 | 1 |
2 | 160 | 0 |
2 | 169 | 1 |
2 | 171 | 1 |
2 | 173 | 1 |
2 | 175 | 1 |
2 | 184 | 1 |
2 | 201 | 1 |
2 | 222 | 1 |
2 | 235 | 0 |
2 | 247 | 0 |
2 | 260 | 0 |
2 | 284 | 0 |
2 | 290 | 0 |
2 | 291 | 0 |
2 | 302 | 0 |
2 | 304 | 0 |
2 | 341 | 0 |
2 | 345 | 0 |
Alternatively, open the testworkbook using the file open function of the file menu. Then select Coxregression from the survival analysis section of the analysis menu. Selectthe column marked "Time" when asked for the times, select"Censor" when asked for death/censorship,click on the cancel button when asked about strata and when asked aboutpredictors and select the column marked "Stage group".
For this example:
Cox(proportional hazards) regression
80subjects with 54 events
Deviance(likelihood ratio) chi-square = 7.634383 df= 1 P = 0.0057
Stagegroup b1 = 0.96102 z = 2.492043 P = 0.0127
Coxregression - hazard ratios
Parameter | Hazard ratio | 95% CI |
Stage group | 2.614362 | 1.227756 to 5.566976 |
Parameter | Coefficient | Standard Error |
Stage group | 0.96102 | 0.385636 |
Coxregression - model analysis
Loglikelihood with no covariates = -207.554801
Loglikelihood with all model covariates = -203.737609
Deviance(likelihood ratio) chi-square = 7.634383 df= 1 P = 0.0057
The significance test for thecoefficient b1 tests the null hypothesis that it equals zero and thus that itsexponent equals one. The confidence interval for exp(b1)is therefore the confidence interval for the relative death rate or hazardratio; we may therefore infer with 95% confidence that the death rate fromstage 4 cancers is approximately 3 times, and at least 1.2 times, the risk fromstage 3 cancers.