医学统计学-电子教材:Survival Analysis

来源：南方医科大学精品课程网精品课程网

医学统计学:电子教材 Survival Analysis:ContentSurvivalAnalysisSurvivalanalysisKaplan-MeiersurvivalestimatesSurvivalplotFollow-uplifetableAbridgedlifetableLog-Rank&WilcoxontestsWei-LachintestCoxregressionSurvivalanalysis.1Kaplan-Meiersur

Content

Survival Analysis

Survival analysis

Kaplan-Meier survival estimates

Survival plot

Follow-up life table

Abridged life table

Log-Rank & Wilcoxon tests

Wei-Lachin test

Cox regression

Survival analysis. 1

Kaplan-Meier survival estimates. 1

Survival plot. 10

Follow-up life table. 10

Abridged life table. 13

Log-Rank & Wilcoxon. 19

Wei-Lachin test. 25

Cox regression. 31

Survival analysis

·Kaplan-Meier

·Follow-uplife table

·Abridgedlife table

·Log-rankand Wilcoxon

·Wei-Lachin

·Coxregression

Menu location: Analysis_Survival.

This section provides methods forthe description and comparison of survival experience in different groups.

Note that StatsDirectsurvival analysis functions do not use separate variables for different groups.The groups are indicated by a groupidentifier variable that contains group numbers or text strings, i.e. for 2groups you might have a column of 1 and 2. Each value in the group identifiercolumn identifies its row with respect to time, death and censorship data in adjacent columns.

Download a free 10 day StatsDirect trial

Kaplan-Meiersurvival estimates

Menu location: Analysis_Survival_Kaplan-Meier.

This function estimates survivalrates and hazard from data that may be incomplete.

The survival rate is expressed asthe survivor function (S):

- where t is a time period knownas the survival time, time to failure or time to event (such as death); e.g. 5years in the context of 5 year survival rates. Some texts present S as theestimated probability of surviving to time t for those alive just before tmultiplied by the proportion of subjects surviving to t.

The product limit (PL) method of Kaplan and Meier(1958) is used to estimate S:

- where ti is duration of study at point i,di is number of deaths up to point i and ni is number of individualsat risk just prior to ti. S is based upon theprobability that an individual survives at the end of a time interval, on thecondition that the individual was present at the start of the time interval. Sis the product (P) of these conditional probabilities.

If a subject is last followed upat time ti and then leaves the study for any reason(e.g. lost to follow up) ti is counted as their censorship time.

Assumptions:

·Censored individuals have the sameprospect of survival as those who continue to be followed. This can not betested for and can lead to a bias thatartificially reduces S.

·Survival prospects are the same for early as for late recruits tothe study (can be tested for).

·The event studied(e.g. death) happens at the specified time. Late recording of the event studiedwill cause artificial inflation of S.

The cumulative hazard function(H) is the risk of event (e.g. death) at time t, it isestimated by the method of Peterson (1977)as:

S and H with their standarderrors and confidence intervals can be saved to a workbook for further analysis(see below).

Median and mean survival time

The median survival time iscalculated as the smallest survival time for which the survivor function isless than or equal to 0.5. Some data sets may not get this far, in which casetheir median survival time is not calculated. A confidence interval for themedian survival time is constructed using a robust non-parametric method due toBrookmeyer andCrowley (1982). Another confidence interval for the median survival time isconstructed using a large sample estimate of the density function of thesurvival estimate (Andersen,1993). If there are many tied survival times then the Brookmeyer-Crowleylimits should not be used.

Mean survival time is estimatedas the area under the survival curve. The estimator is based upon the entirerange of data. Note that some software uses only the data up to the lastobserved event; Hosmerand Lemeshow (1999) point out that this biases the estimate of the meandownwards, and they recommend that the entire range of data is used. A largesample method is used to estimate the variance of the mean survival time andthus to construct a confidence interval (Andersen, 1993).

Samples of survival times arefrequently highly skewed, therefore, in survival analysis,the median is generally a better measure of central location than the mean.

Plots

StatsDirect can calculate S and H for more than one group at a time and plotthe survival and hazard curves for the different groups together. Fourdifferent plots are given and certain distributions are indicated if theseplots form a straight line pattern (Lawless, 1982;Kalbfleisch and Prentice, 1980). The plots and their associateddistributions are:

Plot Distribution indicated by a straight line pattern

H vs. t Exponential, throughthe origin with slope l

ln(H) vs. ln(t) Weibull,intercept beta and slope ln(l)

z(S) vs. ln(t) Log-normal

H/t vs. t Linearhazard rate

- wheret is time, ln is natural (base e) logarithm, z(p) isthe p quantile from the standardnormal distribution and l (lambda) is the real probability of event/death at time t.

For survival plots that displayconfidence intervals, save the results of this function to a workbook and usethe Survivalfunction of the graphics menu.

Note that censored times aremarked with a small vertical tick on the survival curve; you have the option toturn this off. If you want to use markers for observed event/death/failuretimes then please check the box when prompted.

Technical validation

The variance of S is estimatedusing the method of Greenwood (1926):

- The confidence interval for thesurvivor function is not calculated directly using Greenwood's variance estimate as this wouldgive impossible results (< 0 or > 1) at extremes of S. The confidenceinterval for S uses an asymptotic maximum likelihood solution by logtransformation as recommended by Kalbfleisch andPrentice (1980).

The cumulative hazard function isestimated as minus the natural logarithm of the product limit estimate of thesurvivor function as above (Peterson, 1977).Note that some statistical software calculates the simpler Nelson-Aalenestimate (Nelson,1972; Aalen, 1978):

A Nelson-Aalen hazard estimatewill always be less than an equivalent Peterson estimate and there is nosubstantial case for using one in favour of theother.

The variance of H hat isestimated as:

Further analysis

S and H do not assume specificdistributions for survival or hazard curves. If survival plots indicatespecific distributions then more powerful estimates of S and H might beachieved by modelling. The commonest model isexponential but Weibull, log-normal, log-logistic andGamma often appear. An expert Statistician and specialist software (e.g. GLIM, MLPand some of the SAS modules) should beemployed to pursue this sort of work. In most situations, however, you shouldconsider improving the estimates of S and H by using Coxregression rather than parametric models.

If H is constant over time then aplot of the natural log of H vs. time will resemble a straight line with slope l. If this istrue then:

Probability of survival beyond t= exponent(-l * t)

- thiseases the calculation of relative risk from the ratio of hazard functions attime t on two survival curves. When the hazard function depends on time thenyou can usually calculate relative risk after fitting Cox'sproportional hazards model. This model assumes that for each group thehazard functions are proportional at each time, itdoes not assume any particular distribution function for the hazard function.Proportional hazards modelling can be very useful, however, most researchers should seek statisticalguidance with this.

Example

Test workbook (Survivalworksheet: Group Surv, Time Surv,Censor Surv).

In a hypothetical example, deathfrom a cancer after exposure to a particular carcinogen was measured in twogroups of rats. Group 1 had a different pre-treatment régime to group 2. Thetime from pre-treatment to death is recorded. If a rat was still living at theend of the experiment or it had died from a different cause then that time isconsidered "censored". A censoredobservation is given the value 0 inthe death/censorship variable to indicate a "non-event".

Group 1: 143, 165, 188, 188, 190,192, 206, 208, 212, 216, 220, 227, 230, 235, 246, 265, 303, 216*, 244*

Group 2: 142, 157, 163, 198, 205,232, 232, 232, 233, 233, 233, 233, 239, 240, 261, 280, 280, 295, 295, 323,204*, 344*

* = censored data

To analysethese data in StatsDirect you must first prepare themin three workbook columns appropriately labelled:

Group Surv	Time Surv	Censor Surv
2	142	1
1	143	1
2	157	1
2	163	1
1	165	1
1	188	1
1	188	1
1	190	1
1	192	1
2	198	1
2	204	0
2	205	1
1	206	1
1	208	1
1	212	1
1	216	0
1	216	1
1	220	1
1	227	1
1	230	1
2	232	1
2	232	1
2	232	1
2	233	1
2	233	1
2	233	1
2	233	1
1	235	1
2	239	1
2	240	1
1	244	0
1	246	1
2	261	1
1	265	1
2	280	1
2	280	1
2	295	1
2	295	1
1	303	1
2	323	1
2	344	0

Alternatively, open the testworkbook using the file open function of the file menu. Then selectKaplan-Meier from the Survival Analysis section of the analysis menu. Selectthe column marked "Group Surv" when askedfor the group identifier, select "Time Surv"when asked for times and "Censor Surv" whenasked for deaths/events. Click on No when you are asked whether or not you wantto save various statistics to the workbook. Click on Yes when you are promptedabout plotting PL estimates.

For this example:

Kaplan-Meier survivalestimates

Group: 1 (Group Surv = 2)

Time	At risk	Dead	Censored	S	SE(S)	H	SE(H)
142	22	1	0	0.954545	0.044409	0.04652	0.046524
157	21	1	0	0.909091	0.061291	0.09531	0.06742
163	20	1	0	0.863636	0.073165	0.146603	0.084717
198	19	1	0	0.818182	0.08223	0.200671	0.100504
204	18	0	1	0.818182	0.08223	0.200671	0.100504
205	17	1	0	0.770053	0.090387	0.261295	0.117378
232	16	3	0	0.625668	0.105069	0.468935	0.16793
233	13	4	0	0.433155	0.108192	0.836659	0.249777
239	9	1	0	0.385027	0.106338	0.954442	0.276184
240	8	1	0	0.336898	0.103365	1.087974	0.306814
261	7	1	0	0.28877	0.099172	1.242125	0.34343
280	6	2	0	0.192513	0.086369	1.64759	0.44864
295	4	2	0	0.096257	0.064663	2.340737	0.671772
323	2	1	0	0.048128	0.046941	3.033884	0.975335
344	1	0	1	0.048128	0.046941	3.033884	0.975335

Median survival time = 233

Andersen 95% CI for mediansurvival time = 231.898503 to 234.101497

Brookmeyer-Crowley 95% CI for median survival time = 232 to 240

Mean survival time (95% CI)[limit: 344 on 323] = 241.283422 (219.591463 to 262.975382)

Group: 2 (Group Surv = 1)

Time	At risk	Dead	Censored	S	SE(S)	H	SE(H)
143	19	1	0	0.947368	0.051228	0.054067	0.054074
165	18	1	0	0.894737	0.070406	0.111226	0.078689
188	17	2	0	0.789474	0.093529	0.236389	0.11847
190	15	1	0	0.736842	0.101023	0.305382	0.137102
192	14	1	0	0.684211	0.106639	0.37949	0.155857
206	13	1	0	0.631579	0.110665	0.459532	0.175219
208	12	1	0	0.578947	0.113269	0.546544	0.195646
212	11	1	0	0.526316	0.114549	0.641854	0.217643
216	10	1	1	0.473684	0.114549	0.747214	0.241825
220	8	1	0	0.414474	0.114515	0.880746	0.276291
227	7	1	0	0.355263	0.112426	1.034896	0.316459
230	6	1	0	0.296053	0.108162	1.217218	0.365349
235	5	1	0	0.236842	0.10145	1.440362	0.428345
244	4	0	1	0.236842	0.10145	1.440362	0.428345
246	3	1	0	0.157895	0.093431	1.845827	0.591732
265	2	1	0	0.078947	0.072792	2.538974	0.922034
303	1	1	0	0	*	infinity	*

Median survival time = 216

Andersen 95% CI for mediansurvival time = 199.619628 to 232.380372

Brookmeyer-Crowley 95% CI for median survival time = 192 to 230

Mean survival time (95% CI) =218.684211 (200.363485 to 237.004936)

Below is the classical"survival plot" showing how survival declines with time. Theapproximate linearity of the log hazard vs. log time plot below indicates a Weibull distribution of survival.

At this point you might want torun a formal hypothesis test to see if there is any statistical evidence fortwo or more survival curves being different. This can be achieved usingsensitive parametric methods if you have fitted a particular distribution curveto your data. More often you would use the Log-rankand Wilcoxon tests which do not assume any particular distribution of thesurvivor function.

confidenceintervals

Download a free 10 day StatsDirect trial

Survivalplot

Menu location: Graphics_Survival.

This provides a step plot fordisplaying survival curves. It is intended for use with variables for Time onthe x (horizontal) axis and S (the Kaplan-Meier product limit estimate ofsurvival / survivor function) on the Y (vertical) axis. You can displaymultiple series, each with a different marker style. Confidence intervals for Scan be displayed.

Note that censored times aremarked with a small vertical tick on the survival curve.

This is a good accompaniment to apresentation of survival analysis that compares survival (or time to event)data in different groups. See Kaplan-Meierfor more information on generating S.

Download a free 10 day StatsDirect trial

Follow-up lifetable

Menu location: Analysis_Survival_Follow-Up Life Table.

This function provides afollow-up life table that displays the survival experience of a cohort.

The table is constructed by thefollowing definitions:

Interval For a Berkson and Gage survival table thisis the survival times in intervals.

For anabridgedlife table this is ages in groups.

Deaths Number of individuals who die in the interval. [dx]

Withdrawn Number of individualswithdrawn or lost to follow up in the interval.[wx]

AtRisk Number of individuals alive at the startof the interval. [nx]

Adj. atrisk Adjusted number at risk (half ofwithdrawals of current interval subtracted). [n'x]

P(death) Probability that an individual who survived the last intervalwill die in the current interval. [qx]

P(survival) Probability that an individual who survived the last intervalwill survive the current interval. [px]

% Survivors (lx) Probability of anindividual surviving beyond the current interval.

Proportion of survivors after the current interval.

Life table survival rate.

Var(lx%) Estimated variance of lx.

*% CI forlx% *% confidence interval for lx%.

- where lx is the product of all pxbefore x.

Technical validation

The Berksonand Gage method is used to construct the basic table (Berkson and Gage,1950; Armitage and Berry, 1994; Altman, 1991; Lawless, 1982; Kalbfleisch andPrentice, 1980; Le, 1997). The confidence interval for lx is not a simpleapplication of the estimated variance for Ix, insteadit uses a maximum likelihood solution from an asymptotic distribution by thetransformation of lx suggested by Kalbfleisch andPrentice (1980). This treatment of lx avoids impossible values (i.e. >1or <0).

Example

From Armitage and Berry(1994, p. 473).

Test workbook (Survivalworksheet: Year, Died, Withdrawn).

The following data represent thesurvival of a 374 patients who had one type of surgery for a particularmalignancy.

Years since operation	Died in this interval	Lost to follow up
1	90	0
2	76	0
3	51	0
4	25	12
5	20	5
6	7	9
7	4	9
8	1	3
9	3	5
10	2	5

To analysethese data in StatsDirect you must first prepare themin three workbook columns appropriately labelled.Alternatively, open the test workbook using the file open function of the filemenu. Then select Simple Life Table from the survival analysis section of theanalysis menu. Selectthe column marked "Year" when asked for the times, select"Died" when asked for deaths and "Withdrawn" when asked forwithdrawals. Select 374 (total deaths and withdrawals) as thenumber alive at the start.

For this example:

Follow-up life table

Interval	Deaths	Withdrawn	At risk	Adj. at risk	P(death)
0 to 1	90	0	374	374	0.240642
1 to 2	76	0	284	284	0.267606
2 to 3	51	0	208	208	0.245192
3 to 4	25	12	157	151	0.165563
4 to 5	20	5	120	117.5	0.170213
5 to 6	7	9	95	90.5	0.077348
6 to 7	4	9	79	74.5	0.053691
7 to 8	1	3	66	64.5	0.015504
8 to 9	3	5	62	59.5	0.05042
9 to 10	2	5	54	51.5	0.038835
10 up	21	26	47	*	*

Interval	P(survival)	Survivors (lx%)	SD of lx%	95% CI for lx%
0 to 1	0.759358	100	*	* to *
1 to 2	0.732394	75.935829	10.57424	71.271289 to 79.951252
2 to 3	0.754808	55.614973	7.87331	50.428392 to 60.482341
3 to 4	0.834437	41.97861	7.003571	36.945565 to 46.922332
4 to 5	0.829787	35.028509	6.747202	30.200182 to 39.889161
5 to 6	0.922652	29.066209	6.651959	24.47156 to 33.805
6 to 7	0.946309	26.817994	6.659494	22.322081 to 31.504059
7 to 8	0.984496	25.378102	6.700832	20.935141 to 30.043836
8 to 9	0.94958	24.984643	6.720449	20.552912 to 29.648834
9 to 10	0.961165	23.724913	6.803396	19.323326 to 28.39237
10 up	*	22.803557	6.886886	18.417247 to 27.483099

We conclude with 95% confidencethat the true population survival rate 5 years after the surgical operationstudied is between 24.5% and 33.8% for people diagnosed as having this cancer.

confidenceintervals

Download a free 10 day StatsDirect trial

Abridgedlife table

Menu location: Analysis_Survival_Abridged life table

This function provides a currentlife table (actuarial table) that displays the survival experience of a givenpopulation in abridged form.

The table is constructed by thefollowing definitions of Greenwood (1922)and Chiang (1984):

- where qihat is the probability that an individual will die in the ithinterval, ni is the length of the interval, Mi is thedeath rate in the interval (i.e. the number of individuals dying in theinterval [Di] divided by the mid-year population [Pi], which is the number ofyears lived in the interval by those alive at the start of the interval, i.e.it is the person-time denominator for the rate), and aiis the fraction of the last age interval of life.

To explain ai:When a person dies at a certain age they have lived only a fraction of theinterval in which their age at death sits, the average of all of thesefractions of the interval for all people dying in the interval is call thefraction of the last age interval of life, ai. Infantdeaths tend to occur early in the first year of life (which is the usual firstage interval for abridged life tables). The ai valuefor this interval is around 0.1 in developed countries and higher where infant mortality ratesare higher. The values for young childhood intervals are around 0.4 and foradult intervals are around 0.5. The proper values for aican be calculated from the full death records. If the full records are notavailable then the WHO guidelines are to use the following aivalues for the first interval given the following infant mortality rates:

Infant mortality rate per 1000	ai
< 20	0.09
20 - 40	0.15
40 - 60	0.23
> 60	0.30

The rest of the calculationsproceed using the following formulae on a theoretical standard startingpopulation of 100,000 (the radix value) living at the start. In other words, weare constructing an artificial cohort of 100,000 and overlaying currentmortality experience on them in order to work out life expectancies.

- where w is the number ofintervals, di is the number out of the artificialcohort dying in the ith interval, liis the number out of the artificial cohort alive at the start of the interval,Li is the number of years lived in the interval by the artificial cohort, Ti isthe total number of years lived by those individuals from the artificial cohortattaining the age that starts the interval, and ei isthe observed expectation of life at the age that starts the interval.

Note that the value for the lastinterval length is not important, since this is calculated as an open intervalas above. When preparing your data you will therefore have one less row in theinterval column than in the columns for mid-year population in the interval andthe deaths in the interval. The conventional interval pattern is:

Interval length	Interval
1	0 to 1
4	1 to 4
5	5 to 9
5	10 to 14
5	15 to 19
5	20 to 24
5	25 to 29
5	30 to 34
5	35 to 39
5	40 to 44
5	45 to 49
5	50 to 54
5	55 to 59
5	60 to 64
5	65 to 69
5	70 to 74
5	75 to 79
5	80 to 84
	85 up

- whichis extended to 90 nowadays.

Standard errors and confidenceintervals for q and e are calculated using the formulae given by Chiang (1984):

- where S squared e hat alpha isthe variance of the expectation of life at the age of the start of the intervalalpha, and S squared q hat i is the variance of theprobability of death for the ith interval.

If you want to test whether ornot the probability of death in one age interval is statistically significantlydifferent from another interval, or compare the probability of death in a givenage interval from two different populations (e.g. male vs. female), then youcan use the following formulae:

- whereZ is a standard normal test statistic and SE is the standard error of thedifference between the two (ith vs. jth) probabilities of death that you are comparing.

Comparison of two expectation oflife statistics can be made in a similar way to the above, but the standarderror for the difference between two e statistics is simply the square root ofthe sum of the squared standard errors of the e statistics being compared.

Adjusting life expectancy fora given utility

You can specify a weightingvariable for utility to be applied to each interval. This is used, for examplein the calculation of health adjusted life expectancy (HALE) by assuming thatthere is more health utility (sometimes defined by absence of disability) insome periods of life than in others. Wolfson (1996)describes the principles of health adjusted life expectancy.

StatsDirect simply multiplies Ti (the total number of years lived by thoseindividuals from the artificial cohort attaining the age that starts theinterval) by the given ith utility weight, thendivides as usual by li (the number out of theartificial cohort alive at the start of the interval) in order to computeadjusted life expectancy.

Data preparation

Prepare your data in four columnsas follows:

1. Lengthof age interval (w-1 rows corresponding to w intervals as described above)

2. Mid-year population, or number of years lived in the interval by those alive at its start (w rows)

3. Deathsin interval (w rows)

4. Fraction(a) of last age interval of life (w-1 rows)

5. (Utilityweight [optional], e.g. proportion of the interval of life spent withoutdisability in a given population)

If the fraction 'a' is notprovided then it is assumed to be 0.1 for the infant interval, 0.4 for theearly childhood interval and 0.5 for all other intervals. You should endeavour to supply the best estimate of 'a' possible.

Example

From Chiang (1984, p141):The total population of Californiain 1970.

Test workbook (Survivalworksheet: Interval, Population, Deaths, Fraction a).

Abridged life table

Interval	Population	Deaths	Death rate
0 to 1	340483	6234	0.018309
1 to 4	1302198	1049	0.000806
5 to 9	1918117	723	0.000377
10 to 14	1963681	735	0.000374
15 to 19	1817379	2054	0.00113
20 to 24	1740966	2702	0.001552
25 to 29	1457614	2071	0.001421
30 to 34	1219389	1964	0.001611
35 to 39	1149999	2588	0.00225
40 to 44	1208550	4114	0.003404
45 to 49	1245903	6722	0.005395
50 to 54	1083852	8948	0.008256
55 to 59	933244	11942	0.012796
60 to 64	770770	14309	0.018565
65 to 69	620805	17088	0.027526
70 to 74	484431	19149	0.039529
75 to 79	342097	21325	0.062336
80 to 84	210953	20129	0.095419
85 up	142691	22483	0.157564

Interval	Probability of dying [qx]	SE of qx	95% CI for qx
0 to 1	0.018009	0.000226	0.017566 to 0.018452
1 to 4	0.003216	0.000099	0.003022 to 0.00341
5 to 9	0.001883	0.00007	0.001746 to 0.00202
10 to 14	0.00187	0.000069	0.001735 to 0.002005
15 to 19	0.005638	0.000124	0.005395 to 0.005881
20 to 24	0.007729	0.000148	0.007439 to 0.00802
25 to 29	0.007079	0.000155	0.006776 to 0.007383
30 to 34	0.008022	0.00018	0.007669 to 0.008376
35 to 39	0.011193	0.000219	0.010764 to 0.011622
40 to 44	0.016888	0.000261	0.016376 to 0.0174
45 to 49	0.026639	0.000321	0.02601 to 0.027267
50 to 54	0.040493	0.000419	0.039671 to 0.041315
55 to 59	0.062075	0.00055	0.060997 to 0.063153
60 to 64	0.088863	0.000709	0.087474 to 0.090253
65 to 69	0.128933	0.000921	0.127129 to 0.130737
70 to 74	0.180519	0.001181	0.178204 to 0.182833
75 to 79	0.270386	0.001582	0.267286 to 0.273486
80 to 84	0.385206	0.002129	0.381034 to 0.389379
85 up	1	*	* to *

Interval	Living at start [lx]	Dying [dx]	Fraction of last interval of life [ax]
0 to 1	100000	1801	0.09
1 to 4	98199	316	0.41
5 to 9	97883	184	0.44
10 to 14	97699	183	0.54
15 to 19	97516	550	0.59
20 to 24	96966	749	0.49
25 to 29	96217	681	0.51
30 to 34	95536	766	0.52
35 to 39	94769	1061	0.53
40 to 44	93709	1583	0.54
45 to 49	92126	2454	0.53
50 to 54	89672	3631	0.53
55 to 59	86041	5341	0.52
60 to 64	80700	7171	0.52
65 to 69	73529	9480	0.51
70 to 74	64048	11562	0.52
75 to 79	52486	14192	0.51
80 to 84	38295	14751	0.5
85 up	23543	23543	*

Interval	Years in interval [Lx]	Years beyond start of interval [Tx]
0 to 1	98361	7195231
1 to 4	392051	7096870
5 to 9	488900	6704819
10 to 14	488075	6215919
15 to 19	486454	5727844
20 to 24	482921	5241390
25 to 29	479416	4758468
30 to 34	475840	4279052
35 to 39	471354	3803213
40 to 44	464903	3331858
45 to 49	454863	2866955
50 to 54	439827	2412091
55 to 59	417386	1972264
60 to 64	386289	1554878
65 to 69	344417	1168590
70 to 74	292493	824173
75 to 79	227663	531680
80 to 84	154596	304017
85 up	149421	149421

Interval	Expectation of life [ex]	SE of ex	95% CI for ex
0 to 1	71.952313	0.037362	71.879085 to 72.025541
1 to 4	72.270232	0.034115	72.203367 to 72.337097
5 to 9	68.498121	0.033492	68.432478 to 68.563764
10 to 14	63.623174	0.033231	63.558043 to 63.688305
15 to 19	58.737306	0.033025	58.672578 to 58.802034
20 to 24	54.053615	0.032466	53.989981 to 54.117248
25 to 29	49.45559	0.031785	49.393293 to 49.517888
30 to 34	44.790023	0.031151	44.728969 to 44.851077
35 to 39	40.131217	0.030436	40.071563 to 40.190871
40 to 44	35.555493	0.029616	35.497446 to 35.613539
45 to 49	31.119893	0.028788	31.06347 to 31.176317
50 to 54	26.899049	0.027963	26.844242 to 26.953856
55 to 59	22.922407	0.02697	22.869548 to 22.975266
60 to 64	19.267406	0.025794	19.216851 to 19.31796
65 to 69	15.892984	0.024469	15.845026 to 15.940942
70 to 74	12.867973	0.022957	12.822978 to 12.912969
75 to 79	10.129843	0.021419	10.087862 to 10.171824
80 to 84	7.938844	0.018833	7.901931 to 7.975756
85 up	6.346617	*	* to *

Median expectation of life (ageat which half of original cohort survives) = 75.876035

Download a free 10 day StatsDirect trial

Log-Rank& Wilcoxon

Menu location: Analysis_Survival_Log-Rank & Wilcoxon.

This function provides methodsfor comparing two or more survival curves where some of the observations may becensored and where the overall grouping maybe stratified. The methods are nonparametric in that they do not makeassumptions about the distributions of survival estimates.

In the absence of censorship(e.g. loss to follow up, alive at end of study) the methods presented herereduce to a Mann-Whitney(two sample Wilcoxon) testfor two groups of survival times and a Kruskal-Wallistest for more than two groups of survival times. StatsDirectgives a comprehensive set of tests for the comparison of survival data that maybe censored (Taroneand Ware, 1977; Kalbfleisch and Prentice, 1980; Cox and Oakes, 1984; Le, 1997).

The null hypothesis tested hereis that the risk of death/event is the same in all groups.

Peto's log-rank test is generally the most appropriate method but thePrentice modified Wilcoxon test is more sensitivewhen the ratio of hazards is higher at early survival times than at late ones (Peto and Peto, 1972;Kalbfleisch and Prentice, 1980). The log-rank test is similar to the Mantel-Haenszeltest and some authors refer to it as the Cox-Mantel test (Mantel and Haenszel,1959; Cox, 1972).

Strata

An optional variable, strata,allows you to sub-classify the groups specified in the group identifiervariable and to test the significance of this sub-classification (Armitage and Berry,1994; Lawless, 1982; Kalbfleisch and Prentice, 1980).

Wilcoxon weights

StatsDirect gives you a choice of three different weighting methods for the generalised Wilcoxon test, these are Peto-Prentice, Gehan-Breslow and Tarone-Ware.The Peto-Prentice method is generally more robust thanthe others but the Gehan statistic is calculatedroutinely by many statistical software packages (Breslow, 1974;Tarone and Ware, 1977; Kalbfleisch and Prentice, 1980; Miller, 1981; Hosmer andLemeshow 1999). You should seek statistical guidance if you plan to use anyweighting method other than Peto-Prentice.

Hazard-ratios

An approximate confidenceinterval for the log hazard-ratio is calculated using the following estimate ofstandard error (se):

- where ei is the extent of exposure to risk of death(sometimes called expected deaths) for group iof k at the jth distinct observed time (sometimes called expected deaths) for group iof k (Armitageand Berry, 1994).

An exact conditional maximumlikelihood estimate of the hazard ratio is optionally given. The exact estimateand its confidence interval (Fisher or mid-P) should be routinely used inpreference to the above approximation. The exponents of Coxregression parameters are also exact estimators of the hazard ratio, butplease note that they are not exact if Breslow'smethod has been used to correct for ties in the regression. Please consult witha statistician if you are considering using Cox regression.

Trend test

If you have more than two groupsthen StatsDirect will calculate a variant of thelog-rank test for trend. If you choose not to enter group scores then they areallocated as 1,2,3 ... n in group order (Armitage and Berry,1994; Lawless, 1982; Kalbfleisch and Prentice, 1980).

Technical validation

The generaltest statistic is calculated around a hypergeometricdistribution of the number of events at distinct event times:

- where theweight wj for the log-rank test is equal to 1,and wj for the generalisedWilcoxon test is ni(Gehan-Breslow method); for the Tarone-Waremethod wj is the square root of ni; and for the Peto-Prenticemethod wj is the Kaplan-Meier survivorfunction multiplied by (ni divided by ni +1). eijis the expectation of death in group i at the jth distinct observed time where djevents/deaths occurred. nijis the number at risk in group i just beforethe jth distinct observed time. The teststatistic for equality of survival across the k groups (populationssampled) is approximately chi-square distributed on k-1 degrees offreedom. The test statistic for monotone trend is approximately chi-squaredistributed on 1 degree of freedom. c is avector of scores that are either defined by the user or allocated as 1 to k.

Variance isestimated by the method that Peto (1977) refers to as"exact".

Thestratified test statistic is expressed as (Kalbfleischand Prentice, 1980):

- where the statistics defined above are calculated withinstrata then summed across strata prior to the generalisedinverse and transpose matrix operations.

Example

From Armitage and Berry(1994, p. 479).

Test workbook (Survivalworksheet: Stage Group, Time, Censor).

The following data represent thesurvival in days since entry to the trial of patients with diffuswww.lindalemus.com/rencai/e histiocytic lymphoma. Two different groups of patients,those with stage III and those with stage IV disease,are compared.

Stage 3: 6, 19, 32, 42, 42, 43*,94, 126*, 169*, 207, 211*, 227*, 253, 255*, 270*, 310*, 316*, 335*, 346*

Stage 4: 4, 6, 10, 11, 11, 11,13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31, 33, 34, 35, 39, 40, 41*, 43*,45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89, 90, 93, 104, 110, 134, 137,160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*, 260*, 284*, 290*, 291*,302*, 304*, 341*, 345*

* = censored data (patient stillalive or died from an unrelated cause)

To analysethese data in StatsDirect you must first prepare themin three workbook columns as shown below:

Stage group	Time	Censor
1	6	1
1	19	1
1	32	1
1	42	1
1	42	1
1	43	0
1	94	1
1	126	0
1	169	0
1	207	1
1	211	0
1	227	0
1	253	1
1	255	0
1	270	0
1	310	0
1	316	0
1	335	0
1	346	0
2	4	1
2	6	1
2	10	1
2	11	1
2	11	1
2	11	1
2	13	1
2	17	1
2	20	1
2	20	1
2	21	1
2	22	1
2	24	1
2	24	1
2	29	1
2	30	1
2	30	1
2	31	1
2	33	1
2	34	1
2	35	1
2	39	1
2	40	1
2	41	0
2	43	0
2	45	1
2	46	1
2	50	1
2	56	1
2	61	0
2	61	0
2	63	1
2	68	1
2	82	1
2	85	1
2	88	1
2	89	1
2	90	1
2	93	1
2	104	1
2	110	1
2	134	1
2	137	1
2	160	0
2	169	1
2	171	1
2	173	1
2	175	1
2	184	1
2	201	1
2	222	1
2	235	0
2	247	0
2	260	0
2	284	0
2	290	0
2	291	0
2	302	0
2	304	0
2	341	0
2	345	0

Alternatively, open the testworkbook using the file open function of the file menu. Then select Log-rank& Wilcoxon from the Survival Analysis section ofthe analysis menu. Selectthe column marked "Stage group" when asked for the group identifier,select "Time" when asked for times and "Censor" for censorship. Click on the cancel button whenasked about strata.

For this example:

Logrank and Wilcoxon tests

Log Rank (Peto):

For group 1 (Stage group = 1)

Observed deaths = 8

Extent of exposure to risk ofdeath = 16.687031

Relative rate = 0.479414

For group 2 (Stage group = 2)

Observed deaths = 46

Extent of exposure to risk ofdeath = 37.312969

Relative rate = 1.232815

test statistics:

-8.687031, 8.687031

variance-covariance matrix:

0.088912	-11.24706
-11.24706	11.24706

Chi-square for equivalence ofdeath rates = 6.70971 P = 0.0096

Hazard Ratio, (approximate 95%confidence interval)

Group 1 vs. Group 2 = 0.388878,(0.218343 to 0.692607)

Conditional maximum likelihoodestimates:

Hazard Ratio = 0.381485

Exact Fisher 95% confidenceinterval = 0.154582 to 0.822411

Exact Fisher one sided P =0.0051, two sided P = 0.0104

Exact mid-P 95% confidenceinterval = 0.167398 to 0.783785

Exact mid-P one sided P = 0.0034,two sided P = 0.0068

Generalised Wilcoxon (Peto-Prentice):

test statistics:

-5.19836, 5.19836

variance-covariance matrix:

0.201506	-4.962627
-4.962627	4.962627

Chi-square for equivalence ofdeath rates = 5.44529 P = 0.0196

Both log-rank and Wilcoxon tests demonstrated a statistically significantdifference in survival experience between stage 3 andstage 4 patients in this study.

Stratified example

From Peto et al. (1977):

Group	Trial Time	Censorship	Strat
1	8	1	1
1	8	1	2
2	13	1	1
2	18	1	1
2	23	1	1
1	52	1	1
1	63	1	1
1	63	1	1
2	70	1	2
2	70	1	2
2	180	1	2
2	195	1	2
2	210	1	2
1	220	1	2
1	365	0	2
2	632	1	2
2	700	1	2
1	852	0	2
2	1296	1	2
1	1296	0	2
1	1328	0	2
1	1460	0	2
1	1976	0	2
2	1990	0	2
2	2240	0	2

Censorship 1 = death event

Censorship 0 = lost to follow-up

Stratum 1 = renal impairment

Stratum 2 = no renal impairment

The table above shows you how toprepare data for a stratified log-rank test in StatsDirect.This example is worked through in the second of two classic papers by Richard Peto and colleagues (Peto et al., 1977,1976). Please note that StatsDirect uses the moreaccurate variance formulae mentioned in the statistical notes section at theend of Peto etal. (1977).

P values

Download a free 10 day StatsDirect trial

Wei-Lachin test

Menu location: Analysis_Survival_Wei-Lachin.

This function gives a two sampledistribution free method for the comparison of two multivariate distributionsof survival (time-to-event) data that may be censored(incomplete, e.g. alive at end of study or lost to follow up). Multivariatemethods such as this should be used only with expert statistical guidance.

Wei and Lachingeneralise the Log-rank and Gehangeneralised Wilcoxon tests(using a random censorship model) for multivariate survival data with two maingroups. (Makuch and Escobar,1991; Wei and Lachin, 1984).

Data preparation

StatsDirect asks you for a group identifier, this could be a column of 1 and 2representing the two groups. You then select k pairs of survival time (time-to-event)and censorship columns for k repeat times. Censored data are coded as 0 anduncensored data are coded as 1.

Repeat times may representseparate factors or the observation of the same factor repeated on k occasions.For example, time to develop symptoms could be analysedfor k different symptoms in a group of patients treated with drug x andcompared with a group of patients not treated with drug x.

Missing data can be code eitherby entering a missing data symbol * as the time, or by setting censored equalto 0 and time less than the minimum uncensored time in your data set.

For further details please referto Makuch andEscobar (1991) and Wei and Lachin(1984).

Technical Validation

Wei and Lachin's multivariate tests are calculated for the case totwo multivariate distributions, and the intermediate univariatestatistics are given. The algorithm used for the method is that given by Makuch and Escobar (1991).

The generalunivariate statistic for comparing the time to event(of component type k out of m multivariate components) of the twogroups is calculated as:

- where n1is the number of event times per component in group 1; n2 is the numberof event times per component in group 2; n is the total number of eventtimes per component; rik is the number at riskat time t(i) in the kthcomponent; D is equal to 0 if an observation is censoredor 1 otherwise; eik is the expected proportion ofevents in group i for the kthcomponent; and wj is equal to 1 for thelog-rank method or (r1k+r2k)/n for the Gehan-Breslow generalised Wilcoxon method.

The univariate statistic for the kthcomponent of the multivariate survival data is calculated as:

- where skkcaret is the kth diagonal element of theestimated variance-covariance matrix that is calculated as described by Makuch and Escobar (1991).

An omnibustest that the two multivariate distributions are equal is calculated as:

- where T' is the transpose of the vector of univariate test statistics and S-1 is the generalised inverse of the estimated variance-covariancematrix.

Astochastic ordering test statistic is calculated as:

Note thatthe P value given with the stochastic ordering (linear combination) statisticis two sided, some authors prefer one sided inference (Davis, 1994). If you make a one sided inference then youare considering only ascending or only descending ordering, and you areassuming that observing an order in the opposite direction to that expectedwould be unimportant to your conclusions.

The teststatistics are all asymptotically normally distributed.

Example

From Makuch and Escobar(1991).

Test workbook (Survival worksheet:Treatment Gp, Time m1, Censor m1, Time m2, Censor m2,Time m3, Censor m3, Time m4, Censor m4).

The following data represent thetimes in days it took in vitro cultures of lymphocytes to reach a level of p24antigen expression. The cultures where taken from patientsinfected with HIV-1 who had advanced AIDS or AIDS related complex. Theidea was that patients whose cultures took a short time to express p24 antigenhad a greater load of HIV-1. The two groups represented patients on twodifferent treatments. The culture was run for 30 days and specimens whichremained negative or which became contaminated were called censored (=0). Thetests were run over four 30 day periods.

Treatment Gp	time m1	censor m1	time m2	censor m2	time m3	censor m3	time m4	censor m4
1	8	1	0	0	25	0	21	1
1	6	1	4	1	5	1	5	1
1	6	1	5	1	28	0	18	1
1	14	0	35	0	23	1	19	0
1	7	1	0	0	13	1	0	0
1	5	1	4	1	27	1	8	1
1	5	1	21	0	6	1	14	1
1	6	1	10	1	14	1	18	1
1	7	1	4	1	15	1	8	1
1	6	1	5	1	5	1	5	1
1	4	1	5	1	6	1	3	1
1	5	1	4	1	7	1	5	1
1	21	0	5	1	0	0	6	1
1	13	1	27	0	21	0	8	1
1	4	1	27	0	7	1	6	1
1	6	1	3	1	7	1	8	1
1	6	1	0	0	5	1	5	1
1	6	1	0	0	4	1	6	1
1	7	1	9	1	6	1	7	1
1	8	1	15	1	8	1	0	0
1	18	0	27	0	18	0	9	1
1	16	1	14	1	14	1	6	1
1	15	1	9	1	12	1	12	1
2	4	1	5	1	4	1	3	1
2	8	1	22	1	25	0	0	0
2	6	1	6	1	8	1	5	1
2	7	1	10	1	10	1	18	1
2	5	1	14	1	17	0	6	1
2	3	1	5	1	8	1	6	1
2	6	1	11	1	6	1	13	1
2	6	1	0	0	15	1	7	1
2	6	1	12	1	19	1	8	1
2	6	1	25	0	0	0	22	0
2	4	1	7	1	5	1	7	1
2	5	1	7	1	4	1	6	1
2	3	1	9	1	7	1	6	1
2	9	1	17	1	0	0	21	0
2	6	1	4	1	8	1	14	1
2	5	1	5	1	7	1	16	0
2	12	1	18	0	14	1	0	0
2	9	1	11	1	15	1	18	0
2	6	1	5	1	9	1	0	0
2	18	0	8	1	10	1	13	1
2	4	1	4	1	5	1	10	1
2	3	1	10	1	0	0	21	0
2	8	1	7	1	10	1	12	1
2	3	1	6	1	7	1	9	1

To analysethese data in StatsDirect you must first prepare themin 9 workbook columns as shown above. Alternatively, open the test workbookusing the file open function of the file menu. Then select Wei-Lachin from the Survival Analysis section of the analysismenu. Selectthe column marked "Treatment GP" when asked for the group identifier.Next, enter the number of repeat times as four. Select "time m1" and"censor m1" for time and censorshipfor repeat time one. Repeat this selection process for the other three repeattimes.

For this example:

Wei-LachinAnalysis

Univariate Generalised Wilcoxon(Gehan)

total cases = 47 by group = 23 24

Observed failures by group = 2023

repeat time = 1

Wei-Lachint = -0.527597

Wei-Lachinvariance = 0.077575

z = -1.89427

chi-square = 3.588261, P = .0582

Observed failures by group = 1421

repeat time = 2

Wei-Lachint = 0.077588

Wei-Lachinvariance = 0.056161

z = 0.327397

chi-square = 0.107189, P = .7434

Observed failures by group = 1819

repeat time = 3

Wei-Lachint = -0.11483

Wei-Lachinvariance = 0.060918

z = -0.465244

chi-square = 0.216452, P = .6418

Observed failures by group = 2016

repeat time = 4

Wei-Lachint = 0.335179

Wei-Lachinvariance = 0.056281

z = 1.412849

chi-square = 1.996143, P = .1577

Multivariate Generalised Wilcoxon (Gehan)

Covariance matrix:

0.077575
0.026009	0.056161
0.035568	0.020484	0.060918
0.023525	0.016862	0.026842	0.056281

Inverse of covariance matrix:

19.204259
-5.078483	22.22316
-8.40436	-3.176864	25.857118
-2.497583	-3.020025	-7.867237	23.468861

repeat times = 4

chi squared omnibus statistic = 9.242916 P = .0553

stochastic ordering z = -0.30981 one sided P = 0.3784, two sided P = 0.7567

Univariate Log-Rank

total cases = 47 by group = 23 24

Observed failures by group = 2023

repeat time = 1

Wei-Lachint = -0.716191

Wei-Lachinvariance = 0.153385

z = -1.828676

chi-square = 3.344058, P = .0674

Observed failures by group = 14 21

repeat time = 2

Wei-Lachint = -0.277786

Wei-Lachinvariance = 0.144359

z = -0.731119

chi-square = 0.534536, P = .4647

Observed failures by group = 1819

repeat time = 3

Wei-Lachint = -0.372015

Wei-Lachinvariance = 0.150764

z = -0.9581

chi-square = 0.917956, P = .338

Observed failures by group = 2016

repeat time = 4

Wei-Lachint = 0.619506

Wei-Lachinvariance = 0.143437

z = 1.635743

chi-square = 2.675657, P = .1019

Multivariate Log-Rank

Covariance matrix:

0.153385
0.049439	0.144359
0.052895	0.050305	0.150764
0.039073	0.047118	0.052531	0.143437

Inverse of covariance matrix:

7.973385
-1.779359	8.69056
-1.892007	-1.661697	8.575636
-0.894576	-1.761494	-2.079402	8.555558

repeat times = 4

chi squared omnibus statistic = 9.52966, P = .0491

stochastic ordering z = -0.688754, one sided P = 0.2455, two sided P = 0.491

Here the multivariate log-ranktest has revealed a statistically significant difference between the treatmentgroups which was not revealed by any of the individual univariatetests. For more detailed discussion of each result parameter see Wei and Lachin(1984).

P values

Download a free 10 day StatsDirect trial

Coxregression

Menu location: Analysis_Survival_Cox Regression.

This function fits Cox's proportionalhazards model for survival-time (time-to-event) outcomes on one or morepredictors.

Cox regression (or proportionalhazards regression) is method for investigating the effect of several variablesupon the time a specified event takes to happen. In the context of an outcomesuch as death this is known as Cox regression for survival analysis. The methoddoes not assume any particular "survival model" but it is not trulynon-parametric because it does assume that the effects of the predictorvariables upon survival are constant over time and are additive in one scale.You should not use Cox regression without the guidance of a Statistician.

Provided that the assumptions ofCox regression are met, this function will provide better estimates of survivalprobabilities and cumulative hazard than those provided by the Kaplan-Meierfunction.

Hazard and hazard-ratios

Cumulative hazard at a time t isthe risk of dying between time 0 and time t, and the survivor function at timet is the probability of surviving to time t (see also Kaplan-Meierestimates).

The coefficients in a Coxregression relate to hazard; a positive coefficient indicates a worse prognosisand a negative coefficient indicates a protective effect of the variable withwhich it is associated.

The hazards ratio associated witha predictor variable is given by the exponent of its coefficient; this is givenwith a confidence interval under the "coefficient details" option in StatsDirect. The hazards ratio may also be thought of asthe relative death rate, see Armitage and Berry(1994). The interpretation of the hazards ratio depends upon themeasurement scale of the predictor variable in question, see Sahai and Kurshid(1996) for further information on relative risk of hazards.

Time-dependent and fixedcovariates

In prospective studies, whenindividuals are followed over time, the values of covariates may change withtime. Covariates can thus be divided into fixed and time-dependent. A covariateis time dependent if the difference between its values for two differentsubjects changes with time; e.g. serum cholesterol. A covariate is fixed if itsvalues can not change with time, e.g. sex or race. Lifestyle factors andphysiological measurements such as blood pressure are usually time-dependent.Cumulative exposures such as smoking are also time-dependent but are oftenforced into an imprecise dichotomy, i.e. "exposed" vs."not-exposed" instead of the more meaningful "time of exposure".There are no hard and fast rules about the handling of time dependentcovariates. If you are considering using Cox regression you should seek thehelp of a Statistician, preferably at the design stage of the investigation.

Model analysis and deviance

A test of the overall statisticalsignificance of the model is given under the "model analysis" option.Here the likelihood chi-square statistic is calculated by comparing thedeviance (- 2 * log likelihood) of your model, with all of the covariates youhave specified, against the model with all covariates dropped. The individualcontribution of covariates to the model can be assessed from the significancetest given with each coefficient in the main output; this assumes a reasonablylarge sample size.

Deviance is minus twice the logof the likelihood ratio for models fitted by maximum likelihood (Hosmer and Lemeshow,1989 and 1999; Cox and Snell, 1989; Pregibon, 1981). The value of adding aparameter to a Cox model is tested by subtracting the deviance of the modelwith the new parameter from the deviance of the model without the newparameter, the difference is then tested against a chi-square distribution withdegrees of freedom equal to the difference between the degrees of freedom ofthe old and new models. The model analysis option tests the model you specifyagainst a model with only one parameter, the intercept; this tests the combinedvalue of the specified predictors/covariates in the model.

Some statistical packages offerstepwise Cox regression that performs systematic tests for differentcombinations of predictors/covariates. Automatic model building procedures suchas these can be misleading as they do not consider the real-world importance ofeach predictor, for this reason StatsDirect does notinclude stepwise selection.

Survival and cumulative hazardrates

The survival/survivorshipfunction and the cumulative hazard function (as discussed under Kaplan-Meier)are calculated relative to the baseline (lowest value of covariates) at eachtime point. Cox regression provides a better estimate of these functions thanthe Kaplan-Meier method when the assumptions of the Cox model are met and thefit of the model is strong.

You are given the option to‘centre continuous covariates’ – this makes survival and hazard functionsrelative to the mean of continuous variables rather than relative to theminimum, which is usually the most meaningful comparison.

If you have binary/dichotomouspredictors in your model you are given the option to calculate survival andcumulative hazards for each variable separately.

Data preparation

·Time-to-event, e.g. time a subject in atrial survived.

·Event / censor code - this must be ³1 (event(s) happened) or 0 (no event atthe end of the study, i.e. "right censored").

·Strata - e.g. centre code for amulti-centre trial. Be careful with your choice of strata; seek the advice of aStatistician.

·Predictwww.lindalemus.com/kuaiji/ors - these are also referred toas covariates, which can be a number of variables that are thought to berelated to the event under study. If a predictor is a classifier variable withmore than two classes (i.e. ordinal or nominal) then you must first use the dummyvariable function to convert it to a series of binary classes.

Technical validation

StatsDirect optimises the log likelihood associatedwith a Cox regression model until the change in log likelihood with iterationsis less than the accuracy that you specify in the dialog box that is displayedjust before the calculation takes place (Lawless, 1982;Kalbfleisch and Prentice, 1980; Harris, 1991; Cox and Oakes, 1984; Le, 1997;Hosmer and Lemeshow, 1999).

The calculation options dialog boxsets a value (default is 10000) for "SPLITTING RATIO"; this is theratio in proportionality constant at a time t above which StatsDirectwill split your data into more strata and calculate an extended likelihoodsolution, see Brysonand Johnson, (1981).

Ties are handled by Breslow's approximation (Breslow, 1974).

Cox-Snell residuals arecalculated as specified by Cox and Oakes (1984).Cox-Snell, Martingale and deviance residuals are calculated as specified by Collett (1994).

Baseline survival and cumulativehazard rates are calculated at each time. Maximum likelihood methods are used,which are iterative when there is more than one death/event at an observed time(Kalbfleisch andPrentice, 1973). Other software may use the less precise Breslow estimates for these functions.

Example

From Armitage and Berry(1994, p. 479).

Test workbook (Survivalworksheet: Stage Group, Time, Censor).

The following data represent thesurvival in days since entry to the trial of patients with diffuse histiocytic lymphoma. Two different groups of patients,those with stage III and those with stage IV disease,are compared.

Stage3: 6, 19, 32, 42, 42, 43*, 94, 126*, 169*, 207, 211*, 227*, 253, 255*, 270*,310*, 316*, 335*, 346*

Stage4: 4, 6, 10, 11, 11, 11, 13, 17, 20, 20, 21, 22, 24, 24, 29, 30, 30, 31, 33,34, 35, 39, 40, 41*, 43*, 45, 46, 50, 56, 61*, 61*, 63, 68, 82, 85, 88, 89, 90,93, 104, 110, 134, 137, 160*, 169, 171, 173, 175, 184, 201, 222, 235*, 247*,260*, 284*, 290*, 291*, 302*, 304*, 341*, 345*

* = censored data (patient stillalive or died from an unrelated cause)

To analysethese data in StatsDirect you must first prepare themin three workbook columns as shown below:

Stage group	Time	Censor
1	6	1
1	19	1
1	32	1
1	42	1
1	42	1
1	43	0
1	94	1
1	126	0
1	169	0
1	207	1
1	211	0
1	227	0
1	253	1
1	255	0
1	270	0
1	310	0
1	316	0
1	335	0
1	346	0
2	4	1
2	6	1
2	10	1
2	11	1
2	11	1
2	11	1
2	13	1
2	17	1
2	20	1
2	20	1
2	21	1
2	22	1
2	24	1
2	24	1
2	29	1
2	30	1
2	30	1
2	31	1
2	33	1
2	34	1
2	35	1
2	39	1
2	40	1
2	41	0
2	43	0
2	45	1
2	46	1
2	50	1
2	56	1
2	61	0
2	61	0
2	63	1
2	68	1
2	82	1
2	85	1
2	88	1
2	89	1
2	90	1
2	93	1
2	104	1
2	110	1
2	134	1
2	137	1
2	160	0
2	169	1
2	171	1
2	173	1
2	175	1
2	184	1
2	201	1
2	222	1
2	235	0
2	247	0
2	260	0
2	284	0
2	290	0
2	291	0
2	302	0
2	304	0
2	341	0
2	345	0

Alternatively, open the testworkbook using the file open function of the file menu. Then select Coxregression from the survival analysis section of the analysis menu. Selectthe column marked "Time" when asked for the times, select"Censor" when asked for death/censorship,click on the cancel button when asked about strata and when asked aboutpredictors and select the column marked "Stage group".

For this example:

Cox(proportional hazards) regression

80subjects with 54 events

Deviance(likelihood ratio) chi-square = 7.634383 df= 1 P = 0.0057

Stagegroup b1 = 0.96102 z = 2.492043 P = 0.0127

Coxregression - hazard ratios

Parameter	Hazard ratio	95% CI
Stage group	2.614362	1.227756 to 5.566976

Parameter	Coefficient	Standard Error
Stage group	0.96102	0.385636

Coxregression - model analysis

Loglikelihood with no covariates = -207.554801

Loglikelihood with all model covariates = -203.737609

Deviance(likelihood ratio) chi-square = 7.634383 df= 1 P = 0.0057

The significance test for thecoefficient b1 tests the null hypothesis that it equals zero and thus that itsexponent equals one. The confidence interval for exp(b1)is therefore the confidence interval for the relative death rate or hazardratio; we may therefore infer with 95% confidence that the death rate fromstage 4 cancers is approximately 3 times, and at least 1.2 times, the risk fromstage 3 cancers.

...

南方医科大学医学考试网

棓丙酯注射液

含当归制剂---产舒康颗粒---阿魏酸的含量测

(24S)-β-Methylcholest-8(14)-enol

促肝细胞生长素注射液

特布他林

利福平